On Sunday 15 January 2006 06:27, Jon Masters wrote:
> On 1/14/06, Kent Ryhorchuk <kryhorchuk@yahoo.com> wrote:
> > I've got it working on a 2GB Samsung NAND flash. This
> > flash supports interleaved operation (two chip
> > selects) and cached programming (odd and even banks on
> > each chip). The host CPU is a 180MHz ARM9 that has a
> > dedicated smart media interface with HW ECC
> > calculation and DMA.
>
> Are you using the DMA engine? It won't use itself :-)

Put a scope on the NAND chip's #CE #AL #CL lines and check that 
you are driving the chip as best you can. I have found, for 
example, that the embedded DMA controller in a Sharp SoC ARM can 
generate tighter bus i/o read cycles than the CPU -- often DMA 
controllers are designed to perform short bursts of cycles 
efficiently.  So even if the CPU just spins, waiting for the
DMA to complete, it's a win.

One might be able to do this from the CPU with the cache enabled 
for the appropriate I/O space, but caching will break normal 
NAND IO (like page addressing and status polling).  I have done 
this for NOR read, but then NOR is much like RAM. Of course with 
ARM you could double map the NAND i/o space, one non-cached and 
one cached, and see if a cached-line load generates tighter read 
cycles.

Another technique that worked for me on the MPC5200 (PPC) was to 
issue 32-bit reads to the (8-bit) NAND chip, the memory 
controller converted these to a burst of 4 back-to-back byte 
reads) and I got a 70% boost.  Whatever you do, this kind of 
thing gets very hardware specific -- but I have attained much 
improved NAND read performance on various platforms/CPUs using 
these techiniques.

The trick is figuring out how to get the CPU/memory-controller in
your design to drive the NAND chip as close to spec as possible.
With regular CPU byte reads, it may be far from optimal.

-imcd