Re: [Yaffs] File corruption - read and write problems

Attachments:
Message as email (text/plain) (text/html)

Author: William Juul
Date:
To: Charles Manning
CC: yaffs
Subject: Re: [Yaffs] File corruption - read and write problems

We have now done some further investigations in this matter; and its
getting even more peculiar.

We can now reproduce this error condition and here is what we do:
1) Erase nand in U-boot
2) Write lots of files to yaffs FS from U-boot
3) Boot to linux (using the files just written to nand)
4) Wait 10 seconds
5) reboot
6) in linux check SHA1 sum of file with object_id 264

If we do this excactly, everything is fine; but if we skip 4) or reduce
that delay, we get a missing chunk (with chunkid in the range 30-60) in the
file mentioned in 6)
If we check the SHA1 sum before rebooting in step 5) it is correct (not
depending on any delay)

The reboot is done properly, and trace shows us that yaffs_do_sync_fs is
being called and that yaffs background thread is being shut down.

The yaffs version we are using in U-boot is from 2010-04-26.

Any ideas?

Best regards
William

On Wed, Nov 23, 2011 at 00:14, Charles Manning <manningc2@actrix.gen.nz>wrote:

> On Wednesday 23 November 2011 11:36:00 William Juul wrote:
> > Hello, we have been using yaffs on a PPC running linux for several years.
> > We have multiple boards and a complete install base of tens of thousands.
> > And in our QA lab we have several hundred devices that are upgraded
> several
> > times an hour 24/7.
> >
> > We are currently on kernel 3.0.4/3.0.7 and yaffs as of august 15th.
> >
> > So far so good, and thanks by the way :-)
> >
> > Now for the problem.
> >
> > We do from time to time (read seldom), experience file corruption. To try
> > and find out of this I have written a utility that I can use to analyze a
> > yaffs image we have "dd'ed" from the NAND.
> >
> > At least one occurence had a missing chunk (chunkid going from 51 to 53,
> > skipping 52) in the middle of a large file (several MB). The missing
> chunk
> > was in the middle of a sequence of chunks in the middle of a block (as
> seen
> > on the physical NAND). The missing chunk could not be found anywhere on
> the
> > flash. Not even when looking for OOB data with several bit errors.
> > From the linux file system point of view, the file has correct size, but
> > when read the missing chunk has its data replaced with all zeroes. There
> is
> > no error or warning during read of this file.
> >
> > After some code inspection I can explain what happens during read:
> > in "yaffs_rd_data_obj(...)" there is a comment saying "get sane (zero)
> > data if you read a hole" followed by a memset(buffer,
> > 0, in->my_dev->data_bytes_per_chunk);
> > "yaffs_rd_data_obj(...)" returns 0 when this error occurs, but the return
> > value from this function is never used or checked.
> >
> > I would have thought that yaffs should have notified the user of this
> error
> > in such a way that the user read() resulted in EIO.
> > Why is it not so?
>
> yaffs does not write the holes in files with zeros.
>
> eg consider the following sequence:
> write 1MB of data
> seek to 2MB
> write 1MB of data
>
> You will have a 3MB file of which there is only 2 MB or actual data and
> there
> is a 1MB hole in the middle.
>
> Since yaffs does not record holes, it cannot tell the difference between a
> missing page due to some error or a valid hole. It therefore does not
> report
> EIO.
>
> >
> > I still do not understand what happens during write, and as I stated this
> > happens very seldom. It can be due to a hard to trigger bug in HW, the
> > driver, MTD or Yaffs; or any combination of these alternatives.
> > Have anyone else experienced anything similar?
> > Do you have any suggestions on how to debug this further or how to work
> > around this?
>
> This sounds pretty strange and I I was to guess I would say it is most
> likely
> due to some problem in the driver.
>
> > As it happens, it seems most/all of these (few) occurences happen during
> > upgrade of our application (which is fair due to our use case), so we can
> > to some degree work around it by verifying the checksum of each file
> after
> > installation and rewrite if necessary. But I would really like a cleaner
> > solution.
>
> Verification of files during an upgrade is always a good idea.
> I would also recommend that you drop the cache before you verify so that
> you
> verify against the flash and not what is in the VFS cache.
>
> ie.
> write upgrade files to yaffs
> sync
> echo 3> /proc/sys/vm/drop_caches
> verify files
>
> -- Charles
>
>
>

--
----------------------------------------------
William Juul
Gullhaugveien 53
N-1354 Bærums Verk, Norway

Tel: +47 67 56 16 67    Mob: +47 95 79 32 53

william@juul.no
www.juul.no
----------------------------------------------

This message is part of the following thread:
	the complete thread tree sorted by date
	Charles Manning at
	zheng shi at