On Tue, 2010-01-26 at 12:46 +1300, Charles Manning wrote:
Hello Peter

On Tuesday 26 January 2010 09:39:46 Peter Barada wrote:
> I've run into a problem using the latest YAFFS code on linux-2.6.28-rc8
> using today's YAFFS CVS code.

Does the test work with older yaffs?

No, it did not - however it exhibited some random behavior.  Previous version was pulled from 20090909 and my thought was with the current changes to handle large file yaffs_Tnode handling it would help - version strings are:

peter@blitz:~/work/logic/eps_svn/software/products/linux/LTIB/trunk/ltib-20091102-som/rpm/BUILD/linux-2.6.28-rc8$ grep '\$Id:' fs/yaffs2/*.[hc]
fs/yaffs2/yaffs_checkptrw.c: "$Id: yaffs_checkptrw.c,v 1.20 2009-09-09 03:03:01 charles Exp $";
fs/yaffs2/yaffs_ecc.c: "$Id: yaffs_ecc.c,v 1.11 2009-03-06 17:20:50 wookey Exp $";
fs/yaffs2/yaffs_fs.c:    "$Id: yaffs_fs.c,v 1.82 2009-09-18 00:39:21 charles Exp $";
fs/yaffs2/yaffs_guts.c:    "$Id: yaffs_guts.c,v 1.89 2009-09-09 00:56:53 charles Exp $";
fs/yaffs2/yaffs_mtdif1.c:const char *yaffs_mtdif1_c_version = "$Id: yaffs_mtdif1.c,v 1.11 2009-09-09 03:03:01 charles Exp $";
fs/yaffs2/yaffs_mtdif2.c: "$Id: yaffs_mtdif2.c,v 1.23 2009-03-06 17:20:53 wookey Exp $";
fs/yaffs2/yaffs_mtdif.c: "$Id: yaffs_mtdif.c,v 1.22 2009-03-06 17:20:51 wookey Exp $";
fs/yaffs2/yaffs_nand.c: "$Id: yaffs_nand.c,v 1.11 2009-09-09 03:03:01 charles Exp $";

I'll go back and re-test with that version to grnerate the output.  The original test did the dd command with "dd if-/dev/urandom of=somefile.$i count=0 bs=0 skip=30M" to seek out 30MB after the open and then close the file - initially I thought the test was off as that dd command wouldn't generate any actual data on an EXT3 device.

The MTD layer has performed flawlessly with the previous version(s), so I'm not thinking the MTD ECC handling itself is in error - I can add code to dump it if finds an ECC error on read; I noticed that the current code doesn't verify the data written if "CONFIG_YAFFS_ALWAYS_CHECK_CHUNK_ERASED" is not set - do you have a development patch that enables the readback to verify the chunk is written correctly so I can test that my MTD layer is still operating correctly?

>
> First off I had to modify the code in  yaffs_write_begin to only call
> grab_cache_pages_write_begin if the kernel version is newer than 2.6.28
> since 2.6.28-rc8 does *not* have __grab_cache_page whereas 2.6.28 proper
> does:
>
> #if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 28)
> 	pg = grab_cache_page_write_begin(mapping, index, flags);
> #else
> 	pg = __grab_cache_page(mapping, index);
> #endif

That is an unfortunate problem due to there being no way to conditionally 
compile against rc and other sub-version markers. I originally had this the 
way you have it now, but changed it.
I understand - too bad the kernel doesn't have "FULL_KERNEL_VERSION(2, 6, 28, 8)" available to
identify exactly when things changed....

>
> Second I added "dev->nPageWrites++;" to
> nandmtd2_ReadChunkWithTagsFromNAND and "dev->nPageReads++;" to
> nandmtd2_WriteChunkWithTagsToNAND to track the number of page
> read/writes.

You should not have to. Those were moved out and placed in yaffs_nand.c

It may have been a holdover from previous versions I had where I observed nPageWrites and nPageReads stay at zero.

>
> With this code, I'm seeing 30MB files that are created have mismatching
> checksums while running the attached test script.  The output from the
> test looks like:
>
> OMAP-35x# . /media/mmcblk0p1/x
> Create 30M file and get
> md5sum
> 30720+0 records
> in
> 30720+0 records
> out
> 5b04790304a4221f1016a8c310da4746
> somefile.1
> **>> Block 710 needs
> retiring
> **>> yaffs write required 2
> attempts
> **>> Block 710
> retired
> Block 710 is in state 9 after gc, should be
> erased
> Calculate md5sums for copied
> files
> 5b04790304a4221f1016a8c310da4746
> somefile.1
> 5b04790304a4221f1016a8c310da4746
> somefile.2
> 5b04790304a4221f1016a8c310da4746
> somefile.3
> 8c8d5a7974d0b9da747bc59edd1991f6
> somefile.4
> execute sync and resee where a logical checkin fcalculate
> md5sums
> save exit: isCheckpointed
> 1
> 5b04790304a4221f1016a8c310da4746
> somefile.1
> 5b04790304a4221f1016a8c310da4746
> somefile.2
> 5b04790304a4221f1016a8c310da4746
> somefile.3
> 5b04790304a4221f1016a8c310da4746
> somefile.4
> Delete one of the
> files
> 5b04790304a4221f1016a8c310da4746
> somefile.1
> 5b04790304a4221f1016a8c310da4746
> somefile.3
> 3cb7668eb7760202d96970a6a9a3361f
> somefile.4
> recopy the deleted
> file
> f6dba6d5af7a7a89481da1849035a417
> somefile.1
> 5b04790304a4221f1016a8c310da4746
> somefile.3
> 5b04790304a4221f1016a8c310da4746
> somefile.4
> f6dba6d5af7a7a89481da1849035a417
> somefile.7
> Creating test folder and some junk files in that
> folder
> 1+0 records
> in
> 1+0 records
> out
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.1
> md5sums of all files in test
> folder
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.1
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.2
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.3
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.4
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.5
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.6
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.7
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.8
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.9
> execute sync and recalculate
> md5sums
> save exit: isCheckpointed
> 1
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.1
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.2
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.3
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.4
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.5
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.6
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.7
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.8
> ae1028b8d6aef86d020c9edfae29ca3d
> junk.9
> Remove some files and recreate
> them
> Calculate md5sums for 30M files
> again
> f6dba6d5af7a7a89481da1849035a417
> somefile.1
> 5b04790304a4221f1016a8c310da4746
> somefile.3
> 5b04790304a4221f1016a8c310da4746
> somefile.4
> 1fee3f481bfa5cf3403efe9e481a0374
> somefile.7
> execute sync and recalculate
> md5sums
> save exit: isCheckpointed
> 1
> a7b6ccfa31115aa75a0fdca07073293d
> somefile.1
> 5b04790304a4221f1016a8c310da4746
> somefile.3
> 1abb3d578e2d129341df26916090b869
> somefile.4
> f6dba6d5af7a7a89481da1849035a417
> somefile.7
> OMAP-35x#
>
> In the output, note that the md5sum of "somefile.*" should all match.
>
> Anyone seen anything like this before?  Test attached.

I just ran the test on both 2.6.24-xxx and 2.6.31-xxx using nandsim on a PC 
and had no problems. Here's one run:

root@linux-dual-head:/mnt# ~charles/Dropbox/yaffs-30M-test
Create 30M file and get md5sum
30720+0 records in
30720+0 records out
31457280 bytes (31 MB) copied, 7.72198 s, 4.1 MB/s
dc7fd1b9553217a9a1becbb101271eab  somefile.1
Calculate md5sums for copied files
dc7fd1b9553217a9a1becbb101271eab  somefile.1
dc7fd1b9553217a9a1becbb101271eab  somefile.2
dc7fd1b9553217a9a1becbb101271eab  somefile.3
dc7fd1b9553217a9a1becbb101271eab  somefile.4
execute sync and recalculate md5sums
dc7fd1b9553217a9a1becbb101271eab  somefile.1
dc7fd1b9553217a9a1becbb101271eab  somefile.2
dc7fd1b9553217a9a1becbb101271eab  somefile.3
dc7fd1b9553217a9a1becbb101271eab  somefile.4
Delete one of the files
dc7fd1b9553217a9a1becbb101271eab  somefile.1
dc7fd1b9553217a9a1becbb101271eab  somefile.3
dc7fd1b9553217a9a1becbb101271eab  somefile.4
recopy the deleted file
dc7fd1b9553217a9a1becbb101271eab  somefile.1
dc7fd1b9553217a9a1becbb101271eab  somefile.3
dc7fd1b9553217a9a1becbb101271eab  somefile.4
dc7fd1b9553217a9a1becbb101271eab  somefile.7
Creating test folder and some junk files in that folder
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000442124 s, 2.3 MB/s
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.1
md5sums of all files in test folder
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.1
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.2
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.3
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.4
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.5
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.6
0bc0c6e9588ee2bf6c894see where a logical checkin f63208c5a0e9  junk.7
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.8
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.9
execute sync and recalculate md5sums
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.1
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.2
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.3
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.4
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.5
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.6
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.7
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.8
0bc0c6e9588ee2bf6c89463208c5a0e9  junk.9
Remove some files and recreate them
Calculate md5sums for 30M files again
dc7fd1b9553217a9a1becbb101271eab  somefile.1
dc7fd1b9553217a9a1becbb101271eab  somefile.3
dc7fd1b9553217a9a1becbb101271eab  somefile.4
dc7fd1b9553217a9a1becbb101271eab  somefile.7
execute sync and recalculate md5sums
dc7fd1b9553217a9a1becbb101271eab  somefile.1
dc7fd1b9553217a9a1becbb101271eab  somefile.3
dc7fd1b9553217a9a1becbb101271eab  somefile.4
dc7fd1b9553217a9a1becbb101271eab  somefile.7


Perhaps the retirement of the blocks indicates that some data was being 
corrupted.

Could be - I'll re-nuke the flash (since those blocks on this particular board should not be bad) and try again.  I'm wondering if I'm caught in limbo with the particular version of the kernel I have that on the OMAP35x exhibits some caching behavior that isn't caught in the changes you've made.  Unfortunately this is a production release and if you have a suggestion on how to go backwards (i.e. undo some of the caching changes that I'm caught in the middle of), I'd be appreciative - I'm looking for stability, not necessarily efficiency compared to previous kernel versions.

At some point it would be nice if there were tags on the YAFFS CVS tree so I can snap to a known version and apply it to a kernel and walk forward or backwards in time to capture logical changes to the YAFFS source and test with each.

Thanks in advance for any suggestions!
-- Charles




_______________________________________________
yaffs mailing list
yaffs@lists.aleph1.co.uk
http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs