Re: [Yaffs] Yaffs2 erasure issue on MT29 NAND part

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: Andrew McKay
CC: yaffs
Subject: Re: [Yaffs] Yaffs2 erasure issue on MT29 NAND part
On Friday 07 August 2009 08:02:41 Andrew McKay wrote:
> Hi Charles,
>
> > First off, please pick up the latest from cvs. The most recent patches
> > should not be material to the issues you are dealing with but they might
> > be confusing things a bit.
>
> I have compiled in the latest version of YAFFS as of what was in the public
> repository Aug 6th and loaded it on my board. I've been testing the
> filesystem by untarring an archive and repeatedly copying the directory to
> new directories, removing some of them, and then continuing copying.
> Things seems to be working fine. When I went to go remove everything from
> NAND and give the board to one of the other developers I got some more bad
> messages from YAFFS.
>

Did you do any reboots during the above? Rebooting forces rescanning which
will test the ECC.

> /mnt/nand # \rm -rf *
> **>> Erasure failed 3515
> **>> Block 3515 retired
> Block 3515 is in state 9 after gc, should be erased
> **>> Erasure failed 4201
> **>> Block 4201 retired
> Block 4201 is in state 9 after gc, should be erased
> **>> Erasure failed 8045
> **>> Block 8045 retired
> Block 8045 is in state 9 after gc, should be erased



It looks like the erasure command failed. Try instrumenting the erase function
in the mtd.

Was it just those few blocks kicking up a problem or was there a whole slew of
them? If it was just a few then those might be real bad blocks and then the
above is OK.

From the /proc/yaffs it looks like many erasures worked and only a few
failed. That indicates that the mtd did not tell yaffs these were bad blocks.

I would inspect the bad block marking and ID strategy and make sure it is
working OK.




>
> /proc/yaffs is reporting:
> /mnt/nand # cat /proc/yaffs
> YAFFS built:Aug 6 2009 12:52:21
> $Id: yaffs_fs.c,v 1.81 2009-05-26 01:22:44 charles Exp $
> $Id: yaffs_guts.c,v 1.87 2009-07-29 04:30:24 charles Exp $
>
> Device 0 "NAND 1GiB 3,3V 8-bit"
> startBlock......... 0
> endBlock........... 8191
> totalBytesPerChunk. 4096
> nDataBytesPerChunk. 4096
> chunkGroupBits..... 0
> chunkGroupSize..... 1
> nErasedBlocks...... 3834
> nReservedBlocks.... 5
> blocksInCheckpoint. 0
> nTnodesCreated..... 41600
> nFreeTnodes........ 26625
> nObjectsCreated.... 71100
> nFreeObjects....... 45012
> nFreeChunks........ 456051
> nPageWrites........ 0
> nPageReads......... 0
> nBlockErasures..... 4733
> nGCCopies.......... 776
> garbageCollections. 1513
> passiveGCs......... 1513
> nRetriedWrites..... 0
> nShortOpCaches..... 10
> nRetireBlocks...... 3
> eccFixed........... 0
> eccUnfixed......... 0
> tagsEccFixed....... 0
> tagsEccUnfixed..... 0
> cacheHits.......... 0
> nDeletedFiles...... 0
> nUnlinkedFiles..... 89898
> nBackgroudDeletions 0
> useNANDECC......... 1
> isYaffs2........... 1
> inbandTags......... 0
>
> I ran nanddump from mtd-utils, and it is reporting that there were a bunch
> of ECC errors while reading NAND:
>
> /mnt/zen/mtd-utils # ./nanddump /dev/mtd12 > /dev/null
> ECC failed: 6144
> ECC corrected: 0
> Number of bad blocks: 13
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 128
> Dumping data starting at 0x00000000 and ending at 0x80000000...


Many ECC errors suggest that your mtd is trying to use the same oob bytes for
both data and ECC and/or bad block markers

When yaffs reads/writes spare bytes it just passes a contiguous buffer (say
yyyyyyyy)

Now let's say the mtd is using 6 bytes for ECC (shon as e) and 2 bytes for the
bad block table shown by b

The actual oob placement might end up being
bbyyyyyeeeyyyyyee
or maybe
bbeeeeeeyyyyyyyyy
or whatever
and it is the job of the mtd to sort this out.

>
> Do you have anymore hints? I'm running out of ideas. I'm starting to
> wonder if there is something fishy about these NAND parts (MT2916G08DAA).
> I am testing across 3 boards, so it isn't limited to one particular part or
> board. This part has been EOL'd by Micron and there is a recommended
> replacement. I wonder if the replacement will show the same issues.
>
> Our NAND driver is pretty simple and mainly uses the functions defined by
> nand_base, and has worked fine with ST NAND parts. When I was testing the
> block device directly with dd, I wasn't able to turn up any obvious issues,
> though I'm going to look into that a bit further.
>
> Thanks,
> Andrew McKay