I'll see what I can do. These boards are suppose to ship off to a customer of ours, but I'll see if I can snag one and keep it stashed away for testing purposes. Andrew On Tue, 11 Aug 2009 10:54:46 +1200, Charles Manning wrote: > Hi Andrew > > Thanks for the detailed investigation. > > Please don't trash this board! This sounds like a good candidate for > testing > something I'm working on. > > Bad block management is both a headache and a performance issue, so I'm > looking at the mods needed to make yaffs (maybe only yaffs2 mode) work > well > without bad block marking. Instead of having mtd point out bad blocks, > yaffs > would figure it out as things go by watching what fails. > > -- Charles > > > > On Tuesday 11 August 2009 06:39:23 Andrew McKay wrote: >> Hey Charles, >> >> I did end up finding a strange issue with the NAND part itself from a > bunch >> of testing I did this weekend. >> >> >> I'm going to go back to testing the NAND directly with the MTD layer > as >> >> see if I can get the NAND to do strange things from there. I'm also >> >> going to look into back porting newer MTD code into our 2.6.20.4 > kernel >> >> to see if that fixes the problem. I've mentioned some of my issues > on >> >> the MTD mailing list but haven't really gotten a response on that > end. >> > >> > Let us know how you get on. This is interesting for everyone. >> >> With my logic analyzer I verified that the deletion process is working. >> When YAFFS (or MTD) claims that an erase failed, the part is saying that >> the erase failed. However after the failure this block is no longer >> readable or writable. I testing doing both with mtd-utils applications. >> This of course means that when MTD tries to mark the block bad, it > can't. >> We might be able to fix the issue if we move to using a flash written > bad >> block table. Currently we just scan the part for bad blocks on boot and >> use a RAM based bad block table. >> >> I have verified timing of all the NAND control lines and used a scope to >> verify the signals are clean and don't have excessive overshoot or >> undershoot. I can't see anything wrong at our end. We're trying to > source >> different 16Gbit parts, and I'm trying to get into contact with a Micron >> FAE to see if they have seen this issue. >> >> Here's the email I sent off to the MTD mailing list detailing the test > and >> the problems I saw. >> >> > --------------------------------------------------------------------------- >>---- Hey guys, >> >> I'm having issues with MT29F16G08DAA parts with MTD on Linux. I have > found >> a strange issue with block erasure failures. The part seems to get in a >> state where if a block fails erasure, all pages with-in that block >> (including the OOB area) will read all 0x00. I realize that internally > the >> part may write all bits to 0 to prevent over erasure of already erased >> bits, however if the device is powercycled, the data that was in that > block >> before the erase is still there. >> >> Here is a dump of what I'm seeing. >> >> /mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12 >> ECC failed: 0 >> ECC corrected: 0 >> Number of bad blocks: 204 >> Number of bbt blocks: 0 >> Block size 262144, page size 4096, OOB size 128 >> Dumping data starting at 0x29a00000 and ending at 0x29a01000... >> 0x29a00000: 03 00 00 00 09 75 00 00 ff ff 2e 73 76 6e 00 00 >> 0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00100: 00 00 00 00 00 00 00 00 00 00 ff ff ed 41 00 00 >> 0x29a00110: 00 00 00 00 00 00 00 00 0a d6 7c 4a 0b d6 7c 4a >> 0x29a00120: 0b d6 7c 4a ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> [SNIP] >> 0x29a00f40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00f50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00f60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00f70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00f90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00fd0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00fe0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> 0x29a00ff0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff ff cb 19 00 00 1d 75 00 30 09 75 00 80 00 00 >> OOB Data: 00 00 25 7c 05 c4 06 00 00 00 f9 ff ff ff ff ff >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff 00 c3 a9 6a 67 ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> >> >> /mnt/zen/mtd-utils # ./flash_erase /dev/mtd12 0x29a00000 >> Erase Total 1 Unnand_erase: start = 0x29a00000, len = 262144 >> its >> Performing Flash Erase of length 262144 at offset >> 0x29a00000single_erase_cmd nand_erase: Failed erase, page 0x00029a00 >> >> MTD Erase failure: Input/output error >> >> >> /mnt/zen/mtd-utils # ./nanddump -s 0x29a00000 -l 0x1000 -p /dev/mtd12 >> ECC failed: 0 >> ECC corrected:ECC: BAD >> 0 >> Number of baECC: BAD >> d blocks: 204 >> NECC: BAD >> umber of bbt bloECC: BAD >> cks: 0 >> Block siECC: BAD >> ze 262144, page ECC: BAD >> size 4096, OOB sECC: BAD >> ize 128 >> DumpingECC: BAD >> data starting aECC: BAD >> t 0x29a00000 andECC: BAD >> ending at 0x29aECC: BAD >> 01000... >> ECC: BAD >> ECC: BAD >> ECC: BAD >> ECC: BAD >> ECC: BAD >> ECC: 16 uncorrectable bitflip(s) at offset 0x29a00000 >> 0x29a00000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> [SNIP] >> 0x29a00f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x29a00ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> /mnt/zen/mtd-utils # >> >> The big issue here is that I've also testing writing to this block after > a >> failed erasure, and I can't seem to write to it either. I tried forcing >> the block to all zeros after the failed erase, and on reboot the > previous >> data is still in the block. This means that when MTD is told to mark > this >> block as bad, the write of the first two bytes in the first two pages of >> the block also fails. Therefore the page never gets marked bad. > Eventually >> an erase will work and the block goes back to a "working" state. > However >> it'll end up in this strange state again at some point. >> >> Has anyone seen this happen before with NAND parts? Is there a way to >> avoid this? I have tried issuing a RESET command to the part after a >> failed erase, but all the pages in the block stay in this strange state. >> The only thing that seems to recover the device is to power cycle it. >> >> I suppose I could change to using a BBT that is written to the NAND > device, >> hopefully then the bad blocks would be kept track of. Though I've never >> had an issue with marking blocks bad by writing the first two bytes of > the >> first two pages to 0x00 and have Linux build a BBT on the fly during >> boot-up. >> > --------------------------------------------------------------------------- >>---- >> >> >> >> Andrew McKay >> Iders Inc.