[Yaffs] Garbage collection issue ==> fixed

Charles Manning manningc2@actrix.gen.nz
Wed, 30 Mar 2005 10:57:54 +1200


After some off-list discussion and bug hunting, this problem was tracked =
to=20
the page counter bitfields in yaffs_BlockInfo being too narrow for Mike's=
=20
usage. The bitfields have been widened up in the latest CVS.

http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs/yaffs_guts.h.diff?r1=3D=
1.18&r2=3D1.19

This should be benign to almost everyone. The only people being hurt were=
=20
those that used 128 or more chunks per block. This normally means people=20
using YAFFS on NOR flash. Mike is using 496 chunks per block. He is now a=
=20
happy camper.

-- Charles




On Tuesday 15 March 2005 11:50, Michael Erickson wrote:
> Charles,
>
> Thanks so much for getting back to me. I should have sent this out
> sooner and I do apologize. I was out of the office all of last week at
> Embedded Systems Conference in San Francisco and spent the entire week
> before that getting ready for the show.
>
> Now I'm back to working on this issue.
>
> I checked and my YAFFS device is being set as follows:
>
> - block size is   =3D 0x40000
> - nbytesPerChunk  =3D 512
> - nChunksPerBlock =3D 496 ( 0x40000 / (512 + 16) )
> - nReservedBlocks =3D 3
> - startBlock      =3D 1
> - endBlock        =3D 31
> - useNANDECC      =3D 0
> - nShortOpCaches  =3D 0
>
> I enabled all sorts of debug output throughout the driver. It appears
> that garbage collection *should* be working fine. When a garbage
> collection cycle happens, it appears that YAFFS grabs the first block o=
f
> flash. It copies the data from the block to another section of flash,
> and then erases the entire block. Subsequent garbage collections start
> with the next sequential block of flash.
>
> Two things appear a little bit fishy to me:
>
> 1) The entire block isn't copied before erasing.
>
> I put messages in the code that say things like, "copying page %d" in
> the function yaffs_GarbageCollectBlock(). The first block that gets
> collected is block one and it starts by copying page 496. Then, pages
> are copied in succession up to 735. Notice, that it didn't copy the
> entire block's worth of pages. Maybe this is normal. Perhaps YAFFS spli=
t
> the file data up between different blocks of flash. You will have to
> tell me. I would have expected it to copy all the way up to page 992
> (the last page in the block). That is, of course, assuming that YAFFS
> would put as much data as possible into a block.
>
> 2) dev->nErasedBlocks doesn't appear to get updated.
>
> The first time the garbage collector runs, I see this message:
>
> 	"yaffs: GC erasedBlocks 30 aggressive 0"
>
> After that, *every* time it runs, I see the same message:
>
> 	"yaffs: GC erasedBlocks 29 aggressive 0"
>
> I would expect to see the "erasedBlocks" output increase or decrease.
>
>
> I guess right now I'm just looking for some advice on how to proceed. A=
s
> I mentioned before, I *only* seem to have problems when the garbage
> collector runs. Turning it off lets me burn huge files into the
> filesystem without trouble.
>
> The problem caused by the garbage collection seems to be loss of data
> within the file. To try and help me figure out what was going on, I
> created a large text file that just goes something like this:
>
> line #0
> line #1
> line #2
>   ....
>   ....
> line #102397
> line #102398
> line #102399
>
> I then copy this file from a FAT formatted Compact Flash card into the
> YAFFS partition.
>
> When I look at the size of the file after the copy, YAFFS reports to me
> what would be the correct size. However, the md5sum of the file in the
> YAFFS partition does not match that of the file in the FAT partition.
> Further more, when I textually dump the file to the serial port and
> capture it, I can see that huge chunks of data are missing in the YAFFS
> version of the file. The dumped file looks something like:
>
> line #0
> line #1
>   ....
>   ....
> line #33307
> line #33308
> line #3330ne #43891
> line #43892
> line #43893
>   ....
>
> There are a couple of more gaps in the data like that. Might this have
> something to do with those "missing" blocks of data that I observed not
> being copied before the block erase?
>
> Again, if I disable all garbage collection, I can burn the exact same
> file with no trouble. The md5sum will match, dumping it to the serial
> port and comparing it on my desktop will show me that the file was trul=
y
> copied without error.
>
> Any help or advice on where to look next would be greatly appreciated.
>
> Best regards,
> 	--mike
>
> Charles Manning wrote:
> > Mike
> >
> > You're definitely on the right track.
> >
> > When GC goes wrong you can bet on it being a geometry issue.
> >
> >>Eventually, I determined that this was happening when the garbage
> >>collector ran. To verify, I modified the yaffs_CheckGarbageCollection=
()
> >>function to just return YAFFS_OK without doing any garbage collection=
 of
> >>any kind. As soon as I did this, all of my problems went away.
> >>
> >>I do not believe that this is a bug in the YAFFS code. Rather, I thin=
k
> >>the problem has to do with my definition of a block size and what is
> >>physically hooked up on my board.
> >>
> >>I have two NOR (StrataFlash) devices hooked up together to look like =
one
> >>device. Is there anything special that I need to do with this situati=
on?
> >>
> >>Currently, each individual chip has a block size of 0x20000 bytes. So=
, I
> >>have been reporting to YAFFS that the block size of the device is
> >>0x40000 bytes. That way, when YAFFS wants to erase a block, I should =
be
> >>erasing one block from each chip and there won't be any overlap.
> >
> > First off, please be warned that YAFFS was not really designed for NO=
R,
> > and in particular does not handle erase suspend, but still is being u=
sed
> > for quite a few shipping products.
> >
> > You're doing the right thing with the two blocks being treated as one
> > large block (ie. your 2x128kB blocks look like one 256 kB block).
> >
> > Now NOR (typically) does not have spare/oob bytes and pages so you wi=
ll
> > have to emulate this. Each page  (a page =3D=3D a chunk) is 512bytes =
data +
> > 16 bytes spare =3D 528 bytes.
> > So each block should have 256kB/528 =3D 496 pages.
> >
> > Your flash access code should then be doing a mapping to make sure th=
e
> > data and spare eareas are all addressed correctly.
> >
> > Check that your device, once initialised, has
> >
> > nChunksPerBlock =3D 496
> >
> > I've seen this approach work fine on systems with 64kB blocks and 128=
kB
> > blocks. I have not yet seen it tried on 256kB blocks. It should all w=
ork
> > OK since I don't think there any limitations that will hurt.
> >
> >
> > -- CHarles