Re: [Yaffs] Weirndess testing YAFFS2 with large files - md5s…

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
Delete this message
Reply to this message
Author: Peter Barada
Date:  
To: Charles Manning
CC: yaffs
Subject: Re: [Yaffs] Weirndess testing YAFFS2 with large files - md5sums don't match when copied.
On Tue, 2010-01-26 at 12:46 +1300, Charles Manning wrote:

> Hello Peter
>
> On Tuesday 26 January 2010 09:39:46 Peter Barada wrote:
> > I've run into a problem using the latest YAFFS code on linux-2.6.28-rc8
> > using today's YAFFS CVS code.
>
> Does the test work with older yaffs?




No, it did not - however it exhibited some random behavior.  Previous
version was pulled from 20090909 and my thought was with the current
changes to handle large file yaffs_Tnode handling it would help -
version strings are:

peter@blitz:~/work/logic/eps_svn/software/products/linux/LTIB/trunk/ltib-20091102-som/rpm/BUILD/linux-2.6.28-rc8$ grep '\$Id:' fs/yaffs2/*.[hc]
fs/yaffs2/yaffs_checkptrw.c:    "$Id: yaffs_checkptrw.c,v 1.20 2009-09-09
03:03:01 charles Exp $";
fs/yaffs2/yaffs_ecc.c:    "$Id: yaffs_ecc.c,v 1.11 2009-03-06 17:20:50
wookey Exp $";
fs/yaffs2/yaffs_fs.c:    "$Id: yaffs_fs.c,v 1.82 2009-09-18 00:39:21
charles Exp $";
fs/yaffs2/yaffs_guts.c:    "$Id: yaffs_guts.c,v 1.89 2009-09-09 00:56:53
charles Exp $";
fs/yaffs2/yaffs_mtdif1.c:const char *yaffs_mtdif1_c_version = "$Id:
yaffs_mtdif1.c,v 1.11 2009-09-09 03:03:01 charles Exp $";
fs/yaffs2/yaffs_mtdif2.c:    "$Id: yaffs_mtdif2.c,v 1.23 2009-03-06
17:20:53 wookey Exp $";
fs/yaffs2/yaffs_mtdif.c:    "$Id: yaffs_mtdif.c,v 1.22 2009-03-06 17:20:51
wookey Exp $";
fs/yaffs2/yaffs_nand.c:    "$Id: yaffs_nand.c,v 1.11 2009-09-09 03:03:01
charles Exp $";

I'll go back and re-test with that version to grnerate the output.  The
original test did the dd command with "dd if-/dev/urandom of=somefile.$i
count=0 bs=0 skip=30M" to seek out 30MB after the open and then close
the file - initially I thought the test was off as that dd command
wouldn't generate any actual data on an EXT3 device.

The MTD layer has performed flawlessly with the previous version(s), so
I'm not thinking the MTD ECC handling itself is in error - I can add
code to dump it if finds an ECC error on read; I noticed that the
current code doesn't verify the data written if
"CONFIG_YAFFS_ALWAYS_CHECK_CHUNK_ERASED" is not set - do you have a
development patch that enables the readback to verify the chunk is
written correctly so I can test that my MTD layer is still operating
correctly?


> >
> > First off I had to modify the code in  yaffs_write_begin to only call
> > grab_cache_pages_write_begin if the kernel version is newer than 2.6.28
> > since 2.6.28-rc8 does *not* have __grab_cache_page whereas 2.6.28 proper
> > does:
> >
> > #if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 28)
> >     pg = grab_cache_page_write_begin(mapping, index, flags);
> > #else
> >     pg = __grab_cache_page(mapping, index);
> > #endif
> 
> That is an unfortunate problem due to there being no way to conditionally 
> compile against rc and other sub-version markers. I originally had this the 
> way you have it now, but changed it.


I understand - too bad the kernel doesn't have "FULL_KERNEL_VERSION(2, 6, 28, 8)" available to

identify exactly when things changed....


> >
> > Second I added "dev->nPageWrites++;" to
> > nandmtd2_ReadChunkWithTagsFromNAND and "dev->nPageReads++;" to
> > nandmtd2_WriteChunkWithTagsToNAND to track the number of page
> > read/writes.
>
> You should not have to. Those were moved out and placed in yaffs_nand.c




It may have been a holdover from previous versions I had where I
observed nPageWrites and nPageReads stay at zero.


> >
> > With this code, I'm seeing 30MB files that are created have mismatching
> > checksums while running the attached test script. The output from the
> > test looks like:
> >
> > OMAP-35x# . /media/mmcblk0p1/x
> > Create 30M file and get
> > md5sum
> > 30720+0 records
> > in
> > 30720+0 records
> > out
> > 5b04790304a4221f1016a8c310da4746
> > somefile.1
> > **>> Block 710 needs
> > retiring
> > **>> yaffs write required 2
> > attempts
> > **>> Block 710
> > retired
> > Block 710 is in state 9 after gc, should be
> > erased
> > Calculate md5sums for copied
> > files
> > 5b04790304a4221f1016a8c310da4746
> > somefile.1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.2
> > 5b04790304a4221f1016a8c310da4746
> > somefile.3
> > 8c8d5a7974d0b9da747bc59edd1991f6
> > somefile.4
> > execute sync and resee where a logical checkin fcalculate
> > md5sums
> > save exit: isCheckpointed
> > 1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.2
> > 5b04790304a4221f1016a8c310da4746
> > somefile.3
> > 5b04790304a4221f1016a8c310da4746
> > somefile.4
> > Delete one of the
> > files
> > 5b04790304a4221f1016a8c310da4746
> > somefile.1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.3
> > 3cb7668eb7760202d96970a6a9a3361f
> > somefile.4
> > recopy the deleted
> > file
> > f6dba6d5af7a7a89481da1849035a417
> > somefile.1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.3
> > 5b04790304a4221f1016a8c310da4746
> > somefile.4
> > f6dba6d5af7a7a89481da1849035a417
> > somefile.7
> > Creating test folder and some junk files in that
> > folder
> > 1+0 records
> > in
> > 1+0 records
> > out
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.1
> > md5sums of all files in test
> > folder
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.1
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.2
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.3
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.4
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.5
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.6
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.7
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.8
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.9
> > execute sync and recalculate
> > md5sums
> > save exit: isCheckpointed
> > 1
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.1
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.2
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.3
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.4
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.5
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.6
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.7
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.8
> > ae1028b8d6aef86d020c9edfae29ca3d
> > junk.9
> > Remove some files and recreate
> > them
> > Calculate md5sums for 30M files
> > again
> > f6dba6d5af7a7a89481da1849035a417
> > somefile.1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.3
> > 5b04790304a4221f1016a8c310da4746
> > somefile.4
> > 1fee3f481bfa5cf3403efe9e481a0374
> > somefile.7
> > execute sync and recalculate
> > md5sums
> > save exit: isCheckpointed
> > 1
> > a7b6ccfa31115aa75a0fdca07073293d
> > somefile.1
> > 5b04790304a4221f1016a8c310da4746
> > somefile.3
> > 1abb3d578e2d129341df26916090b869
> > somefile.4
> > f6dba6d5af7a7a89481da1849035a417
> > somefile.7
> > OMAP-35x#
> >
> > In the output, note that the md5sum of "somefile.*" should all match.
> >
> > Anyone seen anything like this before? Test attached.
>
> I just ran the test on both 2.6.24-xxx and 2.6.31-xxx using nandsim on a PC
> and had no problems. Here's one run:
>
> root@linux-dual-head:/mnt# ~charles/Dropbox/yaffs-30M-test
> Create 30M file and get md5sum
> 30720+0 records in
> 30720+0 records out
> 31457280 bytes (31 MB) copied, 7.72198 s, 4.1 MB/s
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> Calculate md5sums for copied files
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> dc7fd1b9553217a9a1becbb101271eab somefile.2
> dc7fd1b9553217a9a1becbb101271eab somefile.3
> dc7fd1b9553217a9a1becbb101271eab somefile.4
> execute sync and recalculate md5sums
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> dc7fd1b9553217a9a1becbb101271eab somefile.2
> dc7fd1b9553217a9a1becbb101271eab somefile.3
> dc7fd1b9553217a9a1becbb101271eab somefile.4
> Delete one of the files
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> dc7fd1b9553217a9a1becbb101271eab somefile.3
> dc7fd1b9553217a9a1becbb101271eab somefile.4
> recopy the deleted file
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> dc7fd1b9553217a9a1becbb101271eab somefile.3
> dc7fd1b9553217a9a1becbb101271eab somefile.4
> dc7fd1b9553217a9a1becbb101271eab somefile.7
> Creating test folder and some junk files in that folder
> 1+0 records in
> 1+0 records out
> 1024 bytes (1.0 kB) copied, 0.000442124 s, 2.3 MB/s
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.1
> md5sums of all files in test folder
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.1
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.2
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.3
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.4
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.5
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.6
> 0bc0c6e9588ee2bf6c894see where a logical checkin f63208c5a0e9 junk.7
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.8
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.9
> execute sync and recalculate md5sums
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.1
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.2
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.3
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.4
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.5
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.6
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.7
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.8
> 0bc0c6e9588ee2bf6c89463208c5a0e9 junk.9
> Remove some files and recreate them
> Calculate md5sums for 30M files again
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> dc7fd1b9553217a9a1becbb101271eab somefile.3
> dc7fd1b9553217a9a1becbb101271eab somefile.4
> dc7fd1b9553217a9a1becbb101271eab somefile.7
> execute sync and recalculate md5sums
> dc7fd1b9553217a9a1becbb101271eab somefile.1
> dc7fd1b9553217a9a1becbb101271eab somefile.3
> dc7fd1b9553217a9a1becbb101271eab somefile.4
> dc7fd1b9553217a9a1becbb101271eab somefile.7
>
>
> Perhaps the retirement of the blocks indicates that some data was being
> corrupted.



Could be - I'll re-nuke the flash (since those blocks on this particular
board should not be bad) and try again. I'm wondering if I'm caught in
limbo with the particular version of the kernel I have that on the
OMAP35x exhibits some caching behavior that isn't caught in the changes
you've made. Unfortunately this is a production release and if you have
a suggestion on how to go backwards (i.e. undo some of the caching
changes that I'm caught in the middle of), I'd be appreciative - I'm
looking for stability, not necessarily efficiency compared to previous
kernel versions.

At some point it would be nice if there were tags on the YAFFS CVS tree
so I can snap to a known version and apply it to a kernel and walk
forward or backwards in time to capture logical changes to the YAFFS
source and test with each.

Thanks in advance for any suggestions!

> -- Charles
>
>
>
>
> _______________________________________________
> yaffs mailing list
>
> http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs