Hello, we have been using yaffs on a PPC running linux for several years.
We have multiple boards and a complete install base of tens of thousands.
And in our QA lab we have several hundred devices that are upgraded several times an hour 24/7.

We are currently on kernel 3.0.4/3.0.7 and yaffs as of august 15th.

So far so good, and thanks by the way :-)

Now for the problem.

We do from time to time (read seldom), experience file corruption. To try and find out of this I have written a utility that I can use to analyze a yaffs image we have "dd'ed" from the NAND.

At least one occurence had a missing chunk (chunkid going from 51 to 53, skipping 52) in the middle of a large file (several MB). The missing chunk was in the middle of a sequence of chunks in the middle of a block (as seen on the physical NAND). The missing chunk could not be found anywhere on the flash. Not even when looking for OOB data with several bit errors.
From the linux file system point of view, the file has correct size, but when read the missing chunk has its data replaced with all zeroes. There is no error or warning during read of this file.

After some code inspection I can explain what happens during read:
in  "yaffs_rd_data_obj(...)" there is a comment saying "get sane (zero) data if you read a hole" followed by a memset(buffer, 0, in->my_dev->data_bytes_per_chunk);
"yaffs_rd_data_obj(...)" returns 0 when this error occurs, but the return value from this function is never used or checked.

I would have thought that yaffs should have notified the user of this error in such a way that the user read() resulted in EIO.
Why is it not so?

I still do not understand what happens during write, and as I stated this happens very seldom. It can be due to a hard to trigger bug in HW, the driver, MTD or Yaffs; or any combination of these alternatives.
Have anyone else experienced anything similar?
Do you have any suggestions on how to debug this further or how to work around this?

As it happens, it seems most/all of these (few) occurences happen during upgrade of our application (which is fair due to our use case), so we can to some degree work around it by verifying the checksum of each file after installation and rewrite if necessary. But I would really like a cleaner solution.

Best regards
William Juul
--
----------------------------------------------
William Juul
Gullhaugveien 53
N-1354 Bærums Verk, Norway

Tel: +47 67 56 16 67    Mob: +47 95 79 32 53

william@juul.no
www.juul.no
----------------------------------------------