[Yaffs] File corruption - read and write problems

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
Delete this message
Reply to this message
Author: William Juul
Date:  
To: yaffs
Subject: [Yaffs] File corruption - read and write problems
Hello, we have been using yaffs on a PPC running linux for several years.
We have multiple boards and a complete install base of tens of thousands.
And in our QA lab we have several hundred devices that are upgraded several
times an hour 24/7.

We are currently on kernel 3.0.4/3.0.7 and yaffs as of august 15th.

So far so good, and thanks by the way :-)

Now for the problem.

We do from time to time (read seldom), experience file corruption. To try
and find out of this I have written a utility that I can use to analyze a
yaffs image we have "dd'ed" from the NAND.

At least one occurence had a missing chunk (chunkid going from 51 to 53,
skipping 52) in the middle of a large file (several MB). The missing chunk
was in the middle of a sequence of chunks in the middle of a block (as seen
on the physical NAND). The missing chunk could not be found anywhere on the
flash. Not even when looking for OOB data with several bit errors.
>From the linux file system point of view, the file has correct size, but

when read the missing chunk has its data replaced with all zeroes. There is
no error or warning during read of this file.

After some code inspection I can explain what happens during read:
in "yaffs_rd_data_obj(...)" there is a comment saying "get sane (zero)
data if you read a hole" followed by a memset(buffer,
0, in->my_dev->data_bytes_per_chunk);
"yaffs_rd_data_obj(...)" returns 0 when this error occurs, but the return
value from this function is never used or checked.

I would have thought that yaffs should have notified the user of this error
in such a way that the user read() resulted in EIO.
Why is it not so?

I still do not understand what happens during write, and as I stated this
happens very seldom. It can be due to a hard to trigger bug in HW, the
driver, MTD or Yaffs; or any combination of these alternatives.
Have anyone else experienced anything similar?
Do you have any suggestions on how to debug this further or how to work
around this?

As it happens, it seems most/all of these (few) occurences happen during
upgrade of our application (which is fair due to our use case), so we can
to some degree work around it by verifying the checksum of each file after
installation and rewrite if necessary. But I would really like a cleaner
solution.

Best regards
William Juul
--
----------------------------------------------
William Juul
Gullhaugveien 53
N-1354 Bærums Verk, Norway

Tel: +47 67 56 16 67    Mob: +47 95 79 32 53



www.juul.no
----------------------------------------------