On Thursday 17 May 2012 22:29:42 peterlingoal wrote: > After spending sometime looking around in my corrupted NAND, I think I am > clear what's going on there: > > There's some *outdated* checkpoint block in the bad blocks portion, and the > real good one is located at a later block. During mounting, yaffs firstly > found the *outdate* checkpoint block and loaded from there. That's why > loading from checkpoint will always result a corrupted FS, even after > re-scanning all the blocks with no-checkpoing-read. > > now the question part: > > 1. why in the first place there's some checkpoint block 'left over' in > the bad blocks? shall they be erased? It is generally a bad idea to erase bad blocks. > 2. While looking for a checkpoint block, shall the block status be > checked? Or is there any better way to handle this situation? I simply > used mtd->block_isbad and continue searching and it seemed working. That should be happening. I'll fix it if that is broken. Now my question :-): Are you using an old version of yaffs or the latest? There are various checksums on the checkpoint data which should fail if old data is found. > > regards, > Peter > > On Mon, May 7, 2012 at 3:08 AM, Charles Manning wrote: > > On Friday 04 May 2012 00:30:55 peterlingoal wrote: > > > Hi Charles, > > > > > > Thanks for the reply. > > > > > > I am quite confused about the bad block management methodology, seems > > > > both > > > > > MTD and yaffs2 have some kind of bad block control. The problem of my > > > > case > > > > > is, after some period of usage, the yaffs2 file system on some NAND > > > begin to fail. Remounting with ignoring checkpoint could recover the > > > file > > > > system, > > > > > but for once only. The file system is still boomed after reboot and > > > mount (with checkpoint). > > > > > > I tried to read the codes of yaffs2 about scanning if checkpoint is > > > ignored, and got confused. Seems yaffs2 driver is querying status of > > > each block (in function yaffs2_scan_backwards). My question is: > > > > I suggest you read the HowYaffsWords doc. You can find that on > > yaffs.netor find the openoffice doc on the yaffs git. > > > > > 1. what does function yaffs2_scan_backwards do? > > > > This function scans the nand partition if there is no checkpoint. It > > reads the > > tags and builds up the file system state. > > > > > 2. MTD keeps a BBT (in NAND in my case), how does yaffs2 module > > > > obtains > > > > > the BBT information? Why rescan from backward is needed in my case > > > in order to recover a file system. > > > > Yaffs calls the MTD function to determine if a block is good or bad. > > Yaffs does not know or care if mtd used a bad block table or not. > > > > > 3. After recovering the system, seems the bad block information is > > > not saved. So re-scan is still needed after a reboot. This is my guess, > > > please correct me if I am wrong. > > > > > > Also I am using a quite old version of yaffs2 ( back in 2010). What's > > > the most recommended stable version of yaffs2, > > > > I suggest using a more recent version. I would recommend using the > > current HEAD. > > > > > and the kernel MTD driver > > > version? > > > > Sorry I don't keep current with all mtd changes and cant't advise that > > off the > > top of my head. > > > > > To cut some boot up time I am saving BBT on NAND and reuse it > > > after reboot, will this make any negative impact? > > > > I don't see that this will cause any problems. yaffs does not care how or > > if > > you store bbt info. > > > > > I am interested in block > > > summaries, but I would like to stick to checkpoint at the moment. > > > > If you use the new code you will get summaries as part of the > > improvement. > > > > > I am new to kernel level debugging, so I am quite lost here. Any help > > > is appreciated. Thanks! > > > > We've all been there. > > > > > regards, > > > Peter > > > > > > On Mon, Apr 30, 2012 at 7:41 AM, Charles Manning > > > > wrote: > > > > On Saturday 28 April 2012 05:26:23 Peter Lin wrote: > > > > > I have several NANDs that yaffs2 module would consider itself > > > > > > > > successfully > > > > > > > > > recovered from check pointing and skip scanning, but the filesystem > > > > is > > > > > > not > > > > > > > > > usable. Mounting with option no-checkpoint-read could recover the > > > > > filesystem. > > > > > > > > > > I understand that bad block management shall be provided from MTD > > > > > layer, and rescanning fixing the problem proved MTD is doing his > > > > > job. But I do have some questions: > > > > > > > > > > 1. why in the first place the check point restoring succeeded but > > > > left > > > > > > > a corrupted filesystem? > > > > > > > > It is impossible to say with so little info. > > > > > > > > > 2. What would happen if a used block become a bad > > > > > block? > > > > > > > > That block will not be scanned. But blocks don't just"go bad". We > > > > have > > > > to > > > > > > mark > > > > them as bad, That normally means we have timne to extract the useful > > > > data > > > > > > first. > > > > > > > > > will the whole filesystem got crazy? > > > > > > > > No. Yaffs uses a log structure with tags. That means there is no > > > > "master > > > > > > table" or such which holds all the information. > > > > > > > > > Any way to recover from it? > > > > > > > > > > 3. > > > > > Any way to check or indicate an inconsistence in the filesystem, so > > > > the > > > > > > > mounting script could try with the option no-checkpoint-read? > > > > > > > > There is no such provision at present. Since there is no scanning if > > > > the > > > > > > checkpoint works, it is really hard to see how you would decise that > > > > the > > > > > > checkpoint was bad. > > > > > > > > If you are having problems with checkpoint, then consider just > > > > turning > > > > it > > > > > > off. > > > > Since block summaries were introduced, the boot speed up benefits of > > > > checkpointing are not as dramatic as they were. > > > > > > > > > Thanks for your work and help. Please let me know if there's any > > > > > mistake > > > > > > > > in > > > > > > > > > my understanding. > > > > > > > > > > regards, > > > > > Peter > > > > > > > > > > does the official kernel has this function enabled or is there any > > > > > option that controls it? > > > > > > > > > > On 2010-03-04 20:55, Charles Manning wrote: > > > > > > On Friday 05 March 2010 07:14:59 Shivdas Gujare wrote: > > > > > > > Hi Charles, > > > > > > > > > > > > > > Thanks lot for your help. > > > > > > > > > > > > > > On Wed, Mar 3, 2010 at 12:34 PM, Charles Manning > > > > > > > > > > > > > > wrote: > > > > > > > > On Wednesday 03 March 2010 23:33:31 Sven Van Asbroeck wrote: > > > > > > > >> Hello Shivdas, > > > > > > > >> > > > > > > > >> > So, what does actually "check pointing" saves while > > > > > > > >> > unmount? > > > > > > > >> > > > > > > > >> It's my understanding that the check point consists of the > > > > > > > >> RAM > > > > > > > > data > > > > > > > > > > > >> structure which is assembled when a yaffs partition is > > > > scanned. > > > > > > > > > >> It consists of meta-information associated with each chunk > > > > > > > >> and block. If you'd like to know more, I recommend reading > > > > > > > >> the > > > > 'How > > > > > > > > > >> Yaffs works' document, which is available in CVS. > > > > > > > > > > > > > > > > A full scan builds up a set of data structures that define > > > > > > > > the file system state. A checkpoint captures a reduced > > > > > > > > version of that, > > > > > > > > enough > > > > > > > > > > > > to reconstitute the main part of the state and the rest can > > > > > > > > be > > > > > > > > built > > > > > > > > > > > > up on a lazy basis. > > > > > > > > > > > > > > > >> > and Is it > > > > > > > >> > safe to use check-pointing always in final product? > > > > > > > >> > > > > > > > >> According to Charles, checkpointing is designed to be used > > > > > > > >> in the way you describe. To my knowledge, no open > > > > > > > >> checkpointing issues exist, but you should search the > > > > > > > >> archives. If you are concerned about the checkpoint > > > > > > > >> diverging from the > > > > > > > >> meta-information on flash, you could a) disable > > > > > > > >> checkpointing altogether, or b) submit a > > > > > > > > patch > > > > > > > > > > > >> implementing a checkpoint counter ;-) > > > > > > > > > > > > > > > > You can also choose to mount ignoring checkpointing with > > > > > > > > > > > > > > > > mount -t yaffs2 -o"no-checkpoint-read" .. > > > > > > > > > > > > > > This is not the option for me, since in final product, end user > > > > > > > > should > > > > > > > > > > > not be able > > > > > > > to change system data (i.e. mount flag's.) Or I can't change it > > > > > > > > unless > > > > > > > > > > > rootfs is flashed > > > > > > > on device, since yaffs2/nand partitions are mounted from rcS > > > > > > > script. > > > > > > > > > > > > You don't need to do this. Just leave checkpointing on. > > > > > > > > > > > > -- CHarles > > > > > > > > > > > > > > > > > > -- Charles > > > > > > > > > > -Peter > > > > > _______________________________________________ > > > > > yaffs mailing list > > > > > yaffs@lists.aleph1.co.uk > > > > > http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs > > > > _______________________________________________ > > yaffs mailing list > > yaffs@lists.aleph1.co.uk > > http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs