Hi Andrew
Thanks for the detailed investigation.
Please don't trash this board! This sounds like a good candidate for testing
something I'm working on.
Bad block management is both a headache and a performance issue, so I'm
looking at the mods needed to make yaffs (maybe only yaffs2 mode) work well
without bad block marking. Instead of having mtd point out bad blocks, yaffs
would figure it out as things go by watching what fails.
-- Charles
On Tuesday 11 August 2009 06:39:23 Andrew McKay wrote:
> Hey Charles,
>
> I did end up finding a strange issue with the NAND part itself from a bunch
> of testing I did this weekend.
>
> >> I'm going to go back to testing the NAND directly with the MTD layer as
> >> see if I can get the NAND to do strange things from there. I'm also
> >> going to look into back porting newer MTD code into our 2.6.20.4 kernel
> >> to see if that fixes the problem. I've mentioned some of my issues on
> >> the MTD mailing list but haven't really gotten a response on that end.
> >
> > Let us know how you get on. This is interesting for everyone.
>
> With my logic analyzer I verified that the deletion process is working.
> When YAFFS (or MTD) claims that an erase failed, the part is saying that
> the erase failed. However after the failure this block is no longer
> readable or writable. I testing doing both with mtd-utils applications.
> This of course means that when MTD tries to mark the block bad, it can't.
> We might be able to fix the issue if we move to using a flash written bad
> block table. Currently we just scan the part for bad blocks on boot and
> use a RAM based bad block table.
>
> I have verified timing of all the NAND control lines and used a scope to
> verify the signals are clean and don't have excessive overshoot or
> undershoot. I can't see anything wrong at our end. We're trying to source
> different 16Gbit parts, and I'm trying to get into contact with a Micron
> FAE to see if they have seen this issue.
>
> Here's the email I sent off to the MTD mailing list detailing the test and
> the problems I saw.
>
> ---------------------------------------------------------------------------
>---- Hey guys,
>
> I'm having issues with MT29F16G08DAA parts with MTD on Linux. I have found
> a strange issue with block erasure failures. The part seems to get in a
> state where if a block fails erasure, all pages with-in that block
> (including the OOB area) will read all 0x00. I realize that internally the
> part may write all bits to 0 to prevent over erasure of already erased
> bits, however if the device is powercycled, the data that was in that block
> before the erase is still there.
>
> Here is a dump of what I'm seeing.
>
> /mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12
> ECC failed: 0
> ECC corrected: 0
> Number of bad blocks: 204
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 128
> Dumping data starting at 0x29a00000 and ending at 0x29a01000...
> 0x29a00000: 03 00 00 00 09 75 00 00 ff ff 2e 73 76 6e 00 00
> 0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00100: 00 00 00 00 00 00 00 00 00 00 ff ff ed 41 00 00
> 0x29a00110: 00 00 00 00 00 00 00 00 0a d6 7c 4a 0b d6 7c 4a
> 0x29a00120: 0b d6 7c 4a ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> [SNIP]
> 0x29a00f40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00f50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00f60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00f70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00f90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00fd0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00fe0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x29a00ff0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> OOB Data: ff ff cb 19 00 00 1d 75 00 30 09 75 00 80 00 00
> OOB Data: 00 00 25 7c 05 c4 06 00 00 00 f9 ff ff ff ff ff
> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> OOB Data: ff 00 c3 a9 6a 67 ff ff ff ff ff ff ff ff ff ff
> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
>
> /mnt/zen/mtd-utils # ./flash_erase /dev/mtd12 0x29a00000
> Erase Total 1 Unnand_erase: start = 0x29a00000, len = 262144
> its
> Performing Flash Erase of length 262144 at offset
> 0x29a00000single_erase_cmd nand_erase: Failed erase, page 0x00029a00
>
> MTD Erase failure: Input/output error
>
>
> /mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12
> ECC failed: 0
> ECC corrected:ECC: BAD
> 0
> Number of baECC: BAD
> d blocks: 204
> NECC: BAD
> umber of bbt bloECC: BAD
> cks: 0
> Block siECC: BAD
> ze 262144, page ECC: BAD
> size 4096, OOB sECC: BAD
> ize 128
> DumpingECC: BAD
> data starting aECC: BAD
> t 0x29a00000 andECC: BAD
> ending at 0x29aECC: BAD
> 01000...
> ECC: BAD
> ECC: BAD
> ECC: BAD
> ECC: BAD
> ECC: BAD
> ECC: 16 uncorrectable bitflip(s) at offset 0x29a00000
> 0x29a00000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [SNIP]
> 0x29a00f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x29a00ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> /mnt/zen/mtd-utils #
>
> The big issue here is that I've also testing writing to this block after a
> failed erasure, and I can't seem to write to it either. I tried forcing
> the block to all zeros after the failed erase, and on reboot the previous
> data is still in the block. This means that when MTD is told to mark this
> block as bad, the write of the first two bytes in the first two pages of
> the block also fails. Therefore the page never gets marked bad. Eventually
> an erase will work and the block goes back to a "working" state. However
> it'll end up in this strange state again at some point.
>
> Has anyone seen this happen before with NAND parts? Is there a way to
> avoid this? I have tried issuing a RESET command to the part after a
> failed erase, but all the pages in the block stay in this strange state.
> The only thing that seems to recover the device is to power cycle it.
>
> I suppose I could change to using a BBT that is written to the NAND device,
> hopefully then the bad blocks would be kept track of. Though I've never
> had an issue with marking blocks bad by writing the first two bytes of the
> first two pages to 0x00 and have Linux build a BBT on the fly during
> boot-up.
> ---------------------------------------------------------------------------
>----
>
>
>
> Andrew McKay
> Iders Inc.