On Friday 07 August 2009 08:02:41 Andrew McKay wrote: > Hi Charles, > > > First off, please pick up the latest from cvs. The most recent patches > > should not be material to the issues you are dealing with but they might > > be confusing things a bit. > > I have compiled in the latest version of YAFFS as of what was in the public > repository Aug 6th and loaded it on my board. I've been testing the > filesystem by untarring an archive and repeatedly copying the directory to > new directories, removing some of them, and then continuing copying. > Things seems to be working fine. When I went to go remove everything from > NAND and give the board to one of the other developers I got some more bad > messages from YAFFS. > Did you do any reboots during the above? Rebooting forces rescanning which will test the ECC. > /mnt/nand # \rm -rf * > **>> Erasure failed 3515 > **>> Block 3515 retired > Block 3515 is in state 9 after gc, should be erased > **>> Erasure failed 4201 > **>> Block 4201 retired > Block 4201 is in state 9 after gc, should be erased > **>> Erasure failed 8045 > **>> Block 8045 retired > Block 8045 is in state 9 after gc, should be erased It looks like the erasure command failed. Try instrumenting the erase function in the mtd. Was it just those few blocks kicking up a problem or was there a whole slew of them? If it was just a few then those might be real bad blocks and then the above is OK. From the /proc/yaffs it looks like many erasures worked and only a few failed. That indicates that the mtd did not tell yaffs these were bad blocks. I would inspect the bad block marking and ID strategy and make sure it is working OK. > > /proc/yaffs is reporting: > /mnt/nand # cat /proc/yaffs > YAFFS built:Aug 6 2009 12:52:21 > $Id: yaffs_fs.c,v 1.81 2009-05-26 01:22:44 charles Exp $ > $Id: yaffs_guts.c,v 1.87 2009-07-29 04:30:24 charles Exp $ > > Device 0 "NAND 1GiB 3,3V 8-bit" > startBlock......... 0 > endBlock........... 8191 > totalBytesPerChunk. 4096 > nDataBytesPerChunk. 4096 > chunkGroupBits..... 0 > chunkGroupSize..... 1 > nErasedBlocks...... 3834 > nReservedBlocks.... 5 > blocksInCheckpoint. 0 > nTnodesCreated..... 41600 > nFreeTnodes........ 26625 > nObjectsCreated.... 71100 > nFreeObjects....... 45012 > nFreeChunks........ 456051 > nPageWrites........ 0 > nPageReads......... 0 > nBlockErasures..... 4733 > nGCCopies.......... 776 > garbageCollections. 1513 > passiveGCs......... 1513 > nRetriedWrites..... 0 > nShortOpCaches..... 10 > nRetireBlocks...... 3 > eccFixed........... 0 > eccUnfixed......... 0 > tagsEccFixed....... 0 > tagsEccUnfixed..... 0 > cacheHits.......... 0 > nDeletedFiles...... 0 > nUnlinkedFiles..... 89898 > nBackgroudDeletions 0 > useNANDECC......... 1 > isYaffs2........... 1 > inbandTags......... 0 > > I ran nanddump from mtd-utils, and it is reporting that there were a bunch > of ECC errors while reading NAND: > > /mnt/zen/mtd-utils # ./nanddump /dev/mtd12 > /dev/null > ECC failed: 6144 > ECC corrected: 0 > Number of bad blocks: 13 > Number of bbt blocks: 0 > Block size 262144, page size 4096, OOB size 128 > Dumping data starting at 0x00000000 and ending at 0x80000000... Many ECC errors suggest that your mtd is trying to use the same oob bytes for both data and ECC and/or bad block markers When yaffs reads/writes spare bytes it just passes a contiguous buffer (say yyyyyyyy) Now let's say the mtd is using 6 bytes for ECC (shon as e) and 2 bytes for the bad block table shown by b The actual oob placement might end up being bbyyyyyeeeyyyyyee or maybe bbeeeeeeyyyyyyyyy or whatever and it is the job of the mtd to sort this out. > > Do you have anymore hints? I'm running out of ideas. I'm starting to > wonder if there is something fishy about these NAND parts (MT2916G08DAA). > I am testing across 3 boards, so it isn't limited to one particular part or > board. This part has been EOL'd by Micron and there is a recommended > replacement. I wonder if the replacement will show the same issues. > > Our NAND driver is pretty simple and mainly uses the functions defined by > nand_base, and has worked fine with ST NAND parts. When I was testing the > block device directly with dd, I wasn't able to turn up any obvious issues, > though I'm going to look into that a bit further. > > Thanks, > Andrew McKay