Re: [Yaffs] Weirndess testing YAFFS2 with large files - md5s…

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: yaffs
Subject: Re: [Yaffs] Weirndess testing YAFFS2 with large files - md5sums don't match when copied.
On Tuesday 26 January 2010 19:24:19 Peter Barada wrote:
> On Tue, 2010-01-26 at 12:46 +1300, Charles Manning wrote:
> > Hello Peter
> >
> > On Tuesday 26 January 2010 09:39:46 Peter Barada wrote:
> > > I've run into a problem using the latest YAFFS code on linux-2.6.28-rc8
> > > using today's YAFFS CVS code.
> >
> > Does the test work with older yaffs?
>
> No, it did not - however it exhibited some random behavior. Previous
> version was pulled from 20090909 and my thought was with the current
> changes to handle large file yaffs_Tnode handling it would help -
> version strings are:


What changes were those? There have been no changes wrt file size in the tnode
trees for ages. As far as yaffs is concerned a 30MB file is tiny. If you were
dealing with file sizes around the 2^31 integer roll-over or something then I
could understand bugs creeping in.

>
> peter@blitz:~/work/logic/eps_svn/software/products/linux/LTIB/trunk/ltib-20
>091102-som/rpm/BUILD/linux-2.6.28-rc8$ grep '\$Id:' fs/yaffs2/*.[hc]
> fs/yaffs2/yaffs_checkptrw.c:    "$Id: yaffs_checkptrw.c,v 1.20 2009-09-09
> 03:03:01 charles Exp $";
> fs/yaffs2/yaffs_ecc.c:    "$Id: yaffs_ecc.c,v 1.11 2009-03-06 17:20:50
> wookey Exp $";
> fs/yaffs2/yaffs_fs.c:    "$Id: yaffs_fs.c,v 1.82 2009-09-18 00:39:21
> charles Exp $";
> fs/yaffs2/yaffs_guts.c:    "$Id: yaffs_guts.c,v 1.89 2009-09-09 00:56:53
> charles Exp $";
> fs/yaffs2/yaffs_mtdif1.c:const char *yaffs_mtdif1_c_version = "$Id:
> yaffs_mtdif1.c,v 1.11 2009-09-09 03:03:01 charles Exp $";
> fs/yaffs2/yaffs_mtdif2.c:    "$Id: yaffs_mtdif2.c,v 1.23 2009-03-06
> 17:20:53 wookey Exp $";
> fs/yaffs2/yaffs_mtdif.c:    "$Id: yaffs_mtdif.c,v 1.22 2009-03-06 17:20:51
> wookey Exp $";
> fs/yaffs2/yaffs_nand.c:    "$Id: yaffs_nand.c,v 1.11 2009-09-09 03:03:01
> charles Exp $";

>
> I'll go back and re-test with that version to grnerate the output. The
> original test did the dd command with "dd if-/dev/urandom of=somefile.$i
> count=0 bs=0 skip=30M" to seek out 30MB after the open and then close
> the file - initially I thought the test was off as that dd command
> wouldn't generate any actual data on an EXT3 device.
>
> The MTD layer has performed flawlessly with the previous version(s),

Previous versions of what? yaffs? linux?
> so
> I'm not thinking the MTD ECC handling itself is in error - I can add
> code to dump it if finds an ECC error on read; I noticed that the
> current code doesn't verify the data written if
> "CONFIG_YAFFS_ALWAYS_CHECK_CHUNK_ERASED" is not set - do you have a
> development patch that enables the readback to verify the chunk is
> written correctly so I can test that my MTD layer is still operating
> correctly?

No I don't have a patch like that but it would be helpful if yaffs did
verification to help check the mtd layer more effectively.

I'll look at adding that.


<snip>

> > > With this code, I'm seeing 30MB files that are created have mismatching
> > > checksums while running the attached test script. The output from the
> > > test looks like:
> > >
> > > OMAP-35x# . /media/mmcblk0p1/x
> > > Create 30M file and get
> > > md5sum
> > > 30720+0 records
> > > in
> > > 30720+0 records
> > > out
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > **>> Block 710 needs
> > > retiring
> > > **>> yaffs write required 2
> > > attempts
> > > **>> Block 710
> > > retired
> > > Block 710 is in state 9 after gc, should be
> > > erased
> > > Calculate md5sums for copied
> > > files
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.2
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 8c8d5a7974d0b9da747bc59edd1991f6
> > > somefile.4
> > > execute sync and resee where a logical checkin fcalculate
> > > md5sums
> > > save exit: isCheckpointed
> > > 1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.2
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.4
> > > Delete one of the
> > > files
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 3cb7668eb7760202d96970a6a9a3361f
> > > somefile.4
> > > recopy the deleted
> > > file
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.4
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.7
> > > Creating test folder and some junk files in that
> > > folder
> > > 1+0 records
> > > in
> > > 1+0 records
> > > out
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.1
> > > md5sums of all files in test
> > > folder
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.1
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.2
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.3
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.4
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.5
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.6
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.7
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.8
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.9
> > > execute sync and recalculate
> > > md5sums
> > > save exit: isCheckpointed
> > > 1
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.1
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.2
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.3
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.4
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.5
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.6
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.7
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.8
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.9
> > > Remove some files and recreate
> > > them
> > > Calculate md5sums for 30M files
> > > again
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.4
> > > 1fee3f481bfa5cf3403efe9e481a0374
> > > somefile.7
> > > execute sync and recalculate
> > > md5sums
> > > save exit: isCheckpointed
> > > 1
> > > a7b6ccfa31115aa75a0fdca07073293d
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 1abb3d578e2d129341df26916090b869
> > > somefile.4
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.7
> > > OMAP-35x#
> > >
> > > In the output, note that the md5sum of "somefile.*" should all match.
> > >
> > > Anyone seen anything like this before? Test attached.
> >
> > I just ran the test on both 2.6.24-xxx and 2.6.31-xxx using nandsim on a
> > PC and had no problems. Here's one run:
> >
> > root@linux-dual-head:/mnt# ~charles/Dropbox/yaffs-30M-test
> > Create 30M file and get md5sum
> > 30720+0 records in
> > 30720+0 records out
> > 31457280 bytes (31 MB) copied, 7.72198 s, 4.1 MB/s
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > Calculate md5sums for copied files
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > dc7fd1b9553217a9a1becbb101271eab somefile.2
> > dc7fd1b9553217a9a1becbb101271eab somefile.3
> > dc7fd1b9553217a9a1becbb101271eab somefile.4
> > execute sync and recalculate md5sums
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > dc7fd1b9553217a9a1becbb101271eab somefile.2
> > dc7fd1b9553217a9a1becbb101271eab somefile.3
> > dc7fd1b9553217a9a1becbb101271eab somefile.4
> > Delete one of the files
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > dc7fd1b9553217a9a1becbb101271eab somefile.3
> > dc7fd1b9553217a9a1becbb101271eab somefile.4
> > recopy the deleted file
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > dc7fd1b9553217a9a1becbb101271eab somefile.3
> > dc7fd1b9553217a9a1becbb101271eab somefile.4
> > dc7fd1b9553217a9a1becbb101271eab somefile.7
> > Creating test folder and some junk files in that folder
> > 1+0 records in
> > 1+0 records out
> > 1024 bytes (1.0 kB) copied, 0.000442124 s, 2.3 MB/s
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.1
> > md5sums of all files in test folder
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.1
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.2
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.3
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.4
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.5
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.6
> > 0bc0c6e9588ee2bf6c894see where a logical checkin f63208c5a0e9 junk.7
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.8
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.9
> > execute sync and recalculate md5sums
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.1
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.2
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.3
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.4
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.5
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.6
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.7
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.8
> > 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.9
> > Remove some files and recreate them
> > Calculate md5sums for 30M files again
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > dc7fd1b9553217a9a1becbb101271eab somefile.3
> > dc7fd1b9553217a9a1becbb101271eab somefile.4
> > dc7fd1b9553217a9a1becbb101271eab somefile.7
> > execute sync and recalculate md5sums
> > dc7fd1b9553217a9a1becbb101271eab somefile.1
> > dc7fd1b9553217a9a1becbb101271eab somefile.3
> > dc7fd1b9553217a9a1becbb101271eab somefile.4
> > dc7fd1b9553217a9a1becbb101271eab somefile.7
> >
> >
> > Perhaps the retirement of the blocks indicates that some data was being
> > corrupted.
>
> Could be - I'll re-nuke the flash (since those blocks on this particular
> board should not be bad) and try again. I'm wondering if I'm caught in
> limbo with the particular version of the kernel I have that on the
> OMAP35x exhibits some caching behavior that isn't caught in the changes
> you've made. Unfortunately this is a production release and if you have
> a suggestion on how to go backwards (i.e. undo some of the caching
> changes that I'm caught in the middle of), I'd be appreciative - I'm
> looking for stability, not necessarily efficiency compared to previous
> kernel versions.


Caches are an easy way to get data inconsistency.

Which cache are you talking about here? yaffs should not be changing to
support changes in mtd-level or OMAP-specific caching.

There are two caches that yaffs **should** be aware of and should play nice
with:
* It's own cache. Try disabling that to see if that makes any difference. You
can do that by mounting with -o "no-cache"
* The page cache. There have been some changes in this area recently. fsx
(which really pounds on the page cache interface) runs but you might have
uncovered a hole that fsx does not.

The page cache can be thrown out by
# echo 3 > /proc/sys/vm/drop_caches
which will force yaffs to read all the data back again.

Thus if you do
sync
md5sum foo
echo 3 > /proc/sys/vm/drop_caches
md5sum foo
then it indicates that the data in the cache was inconsistent with the data on
flash.

>
> At some point it would be nice if there were tags on the YAFFS CVS tree
> so I can snap to a known version and apply it to a kernel and walk
> forward or backwards in time to capture logical changes to the YAFFS
> source and test with each.

Tagging each checkin would pollute the tags space pretty quickly.
cvs does not provide a checkin Id like svn or git but you can use -D to fetch
as of a specific date

cvs update -D "2009-10-31"


-- CHarles