Hey Charles, I did end up finding a strange issue with the NAND part itself from a bunch of testing I did this weekend. > >> I'm going to go back to testing the NAND directly with the MTD layer as see >> if I can get the NAND to do strange things from there. I'm also going to >> look into back porting newer MTD code into our 2.6.20.4 kernel to see if >> that fixes the problem. I've mentioned some of my issues on the MTD >> mailing list but haven't really gotten a response on that end. > > Let us know how you get on. This is interesting for everyone. > With my logic analyzer I verified that the deletion process is working. When YAFFS (or MTD) claims that an erase failed, the part is saying that the erase failed. However after the failure this block is no longer readable or writable. I testing doing both with mtd-utils applications. This of course means that when MTD tries to mark the block bad, it can't. We might be able to fix the issue if we move to using a flash written bad block table. Currently we just scan the part for bad blocks on boot and use a RAM based bad block table. I have verified timing of all the NAND control lines and used a scope to verify the signals are clean and don't have excessive overshoot or undershoot. I can't see anything wrong at our end. We're trying to source different 16Gbit parts, and I'm trying to get into contact with a Micron FAE to see if they have seen this issue. Here's the email I sent off to the MTD mailing list detailing the test and the problems I saw. ------------------------------------------------------------------------------- Hey guys, I'm having issues with MT29F16G08DAA parts with MTD on Linux. I have found a strange issue with block erasure failures. The part seems to get in a state where if a block fails erasure, all pages with-in that block (including the OOB area) will read all 0x00. I realize that internally the part may write all bits to 0 to prevent over erasure of already erased bits, however if the device is powercycled, the data that was in that block before the erase is still there. Here is a dump of what I'm seeing. /mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12 ECC failed: 0 ECC corrected: 0 Number of bad blocks: 204 Number of bbt blocks: 0 Block size 262144, page size 4096, OOB size 128 Dumping data starting at 0x29a00000 and ending at 0x29a01000... 0x29a00000: 03 00 00 00 09 75 00 00 ff ff 2e 73 76 6e 00 00 0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00100: 00 00 00 00 00 00 00 00 00 00 ff ff ed 41 00 00 0x29a00110: 00 00 00 00 00 00 00 00 0a d6 7c 4a 0b d6 7c 4a 0x29a00120: 0b d6 7c 4a ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [SNIP] 0x29a00f40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00f50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00f60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00f70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00f90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00fd0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00fe0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x29a00ff0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff OOB Data: ff ff cb 19 00 00 1d 75 00 30 09 75 00 80 00 00 OOB Data: 00 00 25 7c 05 c4 06 00 00 00 f9 ff ff ff ff ff OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff OOB Data: ff 00 c3 a9 6a 67 ff ff ff ff ff ff ff ff ff ff OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff /mnt/zen/mtd-utils # ./flash_erase /dev/mtd12 0x29a00000 Erase Total 1 Unnand_erase: start = 0x29a00000, len = 262144 its Performing Flash Erase of length 262144 at offset 0x29a00000single_erase_cmd nand_erase: Failed erase, page 0x00029a00 MTD Erase failure: Input/output error /mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12 ECC failed: 0 ECC corrected:ECC: BAD 0 Number of baECC: BAD d blocks: 204 NECC: BAD umber of bbt bloECC: BAD cks: 0 Block siECC: BAD ze 262144, page ECC: BAD size 4096, OOB sECC: BAD ize 128 DumpingECC: BAD data starting aECC: BAD t 0x29a00000 andECC: BAD ending at 0x29aECC: BAD 01000... ECC: BAD ECC: BAD ECC: BAD ECC: BAD ECC: BAD ECC: 16 uncorrectable bitflip(s) at offset 0x29a00000 0x29a00000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [SNIP] 0x29a00f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x29a00ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 /mnt/zen/mtd-utils # The big issue here is that I've also testing writing to this block after a failed erasure, and I can't seem to write to it either. I tried forcing the block to all zeros after the failed erase, and on reboot the previous data is still in the block. This means that when MTD is told to mark this block as bad, the write of the first two bytes in the first two pages of the block also fails. Therefore the page never gets marked bad. Eventually an erase will work and the block goes back to a "working" state. However it'll end up in this strange state again at some point. Has anyone seen this happen before with NAND parts? Is there a way to avoid this? I have tried issuing a RESET command to the part after a failed erase, but all the pages in the block stay in this strange state. The only thing that seems to recover the device is to power cycle it. I suppose I could change to using a BBT that is written to the NAND device, hopefully then the bad blocks would be kept track of. Though I've never had an issue with marking blocks bad by writing the first two bytes of the first two pages to 0x00 and have Linux build a BBT on the fly during boot-up. ------------------------------------------------------------------------------- Andrew McKay Iders Inc.