It sounds like NAND driver problem. I think you may first replay your test case on mtd level. i.e. You may do the verification on a NAND page to have a check. On Fri, Dec 16, 2011 at 2:38 AM, William Juul wrote: > We have now done some further investigations in this matter; and its getting > even more peculiar. > > We can now reproduce this error condition and here is what we do: > 1) Erase nand in U-boot > 2) Write lots of files to yaffs FS from U-boot > 3) Boot to linux (using the files just written to nand) > 4) Wait 10 seconds > 5) reboot > 6) in linux check SHA1 sum of file with object_id 264 > > If we do this excactly, everything is fine; but if we skip 4) or reduce that > delay, we get a missing chunk (with chunkid in the range 30-60) in the file > mentioned in 6) > If we check the SHA1 sum before rebooting in step 5) it is correct (not > depending on any delay) > > The reboot is done properly, and trace shows us that yaffs_do_sync_fs is > being called and that yaffs background thread is being shut down. > > The yaffs version we are using in U-boot is from 2010-04-26. > > Any ideas? > > Best regards > William > > On Wed, Nov 23, 2011 at 00:14, Charles Manning > wrote: >> >> On Wednesday 23 November 2011 11:36:00 William Juul wrote: >> > Hello, we have been using yaffs on a PPC running linux for several >> > years. >> > We have multiple boards and a complete install base of tens of >> > thousands. >> > And in our QA lab we have several hundred devices that are upgraded >> > several >> > times an hour 24/7. >> > >> > We are currently on kernel 3.0.4/3.0.7 and yaffs as of august 15th. >> > >> > So far so good, and thanks by the way :-) >> > >> > Now for the problem. >> > >> > We do from time to time (read seldom), experience file corruption. To >> > try >> > and find out of this I have written a utility that I can use to analyze >> > a >> > yaffs image we have "dd'ed" from the NAND. >> > >> > At least one occurence had a missing chunk (chunkid going from 51 to 53, >> > skipping 52) in the middle of a large file (several MB). The missing >> > chunk >> > was in the middle of a sequence of chunks in the middle of a block (as >> > seen >> > on the physical NAND). The missing chunk could not be found anywhere on >> > the >> > flash. Not even when looking for OOB data with several bit errors. >> > From the linux file system point of view, the file has correct size, but >> > when read the missing chunk has its data replaced with all zeroes. There >> > is >> > no error or warning during read of this file. >> > >> > After some code inspection I can explain what happens during read: >> > in  "yaffs_rd_data_obj(...)" there is a comment saying "get sane (zero) >> > data if you read a hole" followed by a memset(buffer, >> > 0, in->my_dev->data_bytes_per_chunk); >> > "yaffs_rd_data_obj(...)" returns 0 when this error occurs, but the >> > return >> > value from this function is never used or checked. >> > >> > I would have thought that yaffs should have notified the user of this >> > error >> > in such a way that the user read() resulted in EIO. >> > Why is it not so? >> >> yaffs does not write the holes in files with zeros. >> >> eg consider the following sequence: >> write 1MB of data >> seek to 2MB >> write 1MB of data >> >> You will have a 3MB file of which there is only 2 MB or actual data and >> there >> is a 1MB hole in the middle. >> >> Since yaffs does not record holes, it cannot tell the difference between a >> missing page due to some error or a valid hole. It therefore does not >> report >> EIO. >> >> > >> > I still do not understand what happens during write, and as I stated >> > this >> > happens very seldom. It can be due to a hard to trigger bug in HW, the >> > driver, MTD or Yaffs; or any combination of these alternatives. >> > Have anyone else experienced anything similar? >> > Do you have any suggestions on how to debug this further or how to work >> > around this? >> >> This sounds pretty strange and I I was to guess I would say it is most >> likely >> due to some problem in the driver. >> >> > As it happens, it seems most/all of these (few) occurences happen during >> > upgrade of our application (which is fair due to our use case), so we >> > can >> > to some degree work around it by verifying the checksum of each file >> > after >> > installation and rewrite if necessary. But I would really like a cleaner >> > solution. >> >> Verification of files during an upgrade is always a good idea. >> I would also recommend that you drop the cache before you verify so that >> you >> verify against the flash and not what is in the VFS cache. >> >> ie. >> write upgrade files to yaffs >> sync >> echo 3>  /proc/sys/vm/drop_caches >> verify files >> >> -- Charles >> >> > > > > -- > ---------------------------------------------- > William Juul > Gullhaugveien 53 > N-1354 Bærums Verk, Norway > > Tel: +47 67 56 16 67    Mob: +47 95 79 32 53 > > william@juul.no > www.juul.no > ---------------------------------------------- > > _______________________________________________ > yaffs mailing list > yaffs@lists.aleph1.co.uk > http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs > -- Regards, Shizheng