Re: [Yaffs] Garbage collection issue ==> fixed

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: Michael Erickson
CC: yaffs
Old-Topics: Re: [Yaffs] Garbage collection issue.
Subject: Re: [Yaffs] Garbage collection issue ==> fixed
After some off-list discussion and bug hunting, this problem was tracked to
the page counter bitfields in yaffs_BlockInfo being too narrow for Mike's
usage. The bitfields have been widened up in the latest CVS.

http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs/yaffs_guts.h.diff?r1=1.18&r2=1.19

This should be benign to almost everyone. The only people being hurt were
those that used 128 or more chunks per block. This normally means people
using YAFFS on NOR flash. Mike is using 496 chunks per block. He is now a
happy camper.

-- Charles




On Tuesday 15 March 2005 11:50, Michael Erickson wrote:
> Charles,
>
> Thanks so much for getting back to me. I should have sent this out
> sooner and I do apologize. I was out of the office all of last week at
> Embedded Systems Conference in San Francisco and spent the entire week
> before that getting ready for the show.
>
> Now I'm back to working on this issue.
>
> I checked and my YAFFS device is being set as follows:
>
> - block size is   = 0x40000
> - nbytesPerChunk  = 512
> - nChunksPerBlock = 496 ( 0x40000 / (512 + 16) )
> - nReservedBlocks = 3
> - startBlock      = 1
> - endBlock        = 31
> - useNANDECC      = 0
> - nShortOpCaches  = 0

>
> I enabled all sorts of debug output throughout the driver. It appears
> that garbage collection *should* be working fine. When a garbage
> collection cycle happens, it appears that YAFFS grabs the first block of
> flash. It copies the data from the block to another section of flash,
> and then erases the entire block. Subsequent garbage collections start
> with the next sequential block of flash.
>
> Two things appear a little bit fishy to me:
>
> 1) The entire block isn't copied before erasing.
>
> I put messages in the code that say things like, "copying page %d" in
> the function yaffs_GarbageCollectBlock(). The first block that gets
> collected is block one and it starts by copying page 496. Then, pages
> are copied in succession up to 735. Notice, that it didn't copy the
> entire block's worth of pages. Maybe this is normal. Perhaps YAFFS split
> the file data up between different blocks of flash. You will have to
> tell me. I would have expected it to copy all the way up to page 992
> (the last page in the block). That is, of course, assuming that YAFFS
> would put as much data as possible into a block.
>
> 2) dev->nErasedBlocks doesn't appear to get updated.
>
> The first time the garbage collector runs, I see this message:
>
>     "yaffs: GC erasedBlocks 30 aggressive 0"

>
> After that, *every* time it runs, I see the same message:
>
>     "yaffs: GC erasedBlocks 29 aggressive 0"

>
> I would expect to see the "erasedBlocks" output increase or decrease.
>
>
> I guess right now I'm just looking for some advice on how to proceed. As
> I mentioned before, I *only* seem to have problems when the garbage
> collector runs. Turning it off lets me burn huge files into the
> filesystem without trouble.
>
> The problem caused by the garbage collection seems to be loss of data
> within the file. To try and help me figure out what was going on, I
> created a large text file that just goes something like this:
>
> line #0
> line #1
> line #2
> ....
> ....
> line #102397
> line #102398
> line #102399
>
> I then copy this file from a FAT formatted Compact Flash card into the
> YAFFS partition.
>
> When I look at the size of the file after the copy, YAFFS reports to me
> what would be the correct size. However, the md5sum of the file in the
> YAFFS partition does not match that of the file in the FAT partition.
> Further more, when I textually dump the file to the serial port and
> capture it, I can see that huge chunks of data are missing in the YAFFS
> version of the file. The dumped file looks something like:
>
> line #0
> line #1
> ....
> ....
> line #33307
> line #33308
> line #3330ne #43891
> line #43892
> line #43893
> ....
>
> There are a couple of more gaps in the data like that. Might this have
> something to do with those "missing" blocks of data that I observed not
> being copied before the block erase?
>
> Again, if I disable all garbage collection, I can burn the exact same
> file with no trouble. The md5sum will match, dumping it to the serial
> port and comparing it on my desktop will show me that the file was truly
> copied without error.
>
> Any help or advice on where to look next would be greatly appreciated.
>
> Best regards,
>     --mike

>
> Charles Manning wrote:
> > Mike
> >
> > You're definitely on the right track.
> >
> > When GC goes wrong you can bet on it being a geometry issue.
> >
> >>Eventually, I determined that this was happening when the garbage
> >>collector ran. To verify, I modified the yaffs_CheckGarbageCollection()
> >>function to just return YAFFS_OK without doing any garbage collection of
> >>any kind. As soon as I did this, all of my problems went away.
> >>
> >>I do not believe that this is a bug in the YAFFS code. Rather, I think
> >>the problem has to do with my definition of a block size and what is
> >>physically hooked up on my board.
> >>
> >>I have two NOR (StrataFlash) devices hooked up together to look like one
> >>device. Is there anything special that I need to do with this situation?
> >>
> >>Currently, each individual chip has a block size of 0x20000 bytes. So, I
> >>have been reporting to YAFFS that the block size of the device is
> >>0x40000 bytes. That way, when YAFFS wants to erase a block, I should be
> >>erasing one block from each chip and there won't be any overlap.
> >
> > First off, please be warned that YAFFS was not really designed for NOR,
> > and in particular does not handle erase suspend, but still is being used
> > for quite a few shipping products.
> >
> > You're doing the right thing with the two blocks being treated as one
> > large block (ie. your 2x128kB blocks look like one 256 kB block).
> >
> > Now NOR (typically) does not have spare/oob bytes and pages so you will
> > have to emulate this. Each page (a page == a chunk) is 512bytes data +
> > 16 bytes spare = 528 bytes.
> > So each block should have 256kB/528 = 496 pages.
> >
> > Your flash access code should then be doing a mapping to make sure the
> > data and spare eareas are all addressed correctly.
> >
> > Check that your device, once initialised, has
> >
> > nChunksPerBlock = 496
> >
> > I've seen this approach work fine on systems with 64kB blocks and 128kB
> > blocks. I have not yet seen it tried on 256kB blocks. It should all work
> > OK since I don't think there any limitations that will hurt.
> >
> >
> > -- CHarles