Hi Ian,
Thanks for taking the time to discuss this - I appreciate it!
>> [yaffs] retires the block at erase time
>I don't know about the real marking of bad blocks. We have
>actually disabled this in some versions of products where we
>were bitten by transient write errors causing large number of
>blocks to be persistently marked bad (OOB) and taken out of
>service.
>
... Meaning that in your experience it's OK to just defer to the "write fail" mechanism - if it fails a write to page <n> this time, then after erasure either it will fail the write to page <n> again, or if the write to page <n> happens to succeed then the data in page <n> is reliably OK. Right? (with some trepitation)
>> Is this right? If so, it seems OK as long as bad pages within
>> an eraseblock does not imply unreliability of other pages
>> within the same eraseblock.
>
>The logic around declaring a block truly bad and broken is
>lacking (both Yaffs and MTD). IRCC, NAND vendors recommend that
>blocks should be erased when there are write/read errors, and
>only marked bad if the erase fails, and then perhaps only after
>several attempts. Neither Yaffs nor MTD to this.
>
[and from a later message]
>We'd need the NAND vendors to reveal that, but I think it
>reasonable to suspect that if a block is improperly erased that
>any data subsequently written to that block is liable to
>failure. But if an individual page is bad because of, say,
>power loss at the time of the write, that the other pages within
>that block would be solid. But this is JUST A GUESS.
OK, the whole concept is a bit scary. But I guess an erase fail is more probable in a questionable eraseblock than a write fail of a member page before erasure and subsequent unreliable write success after erasure.
If this is the case, then we're left with a discussion of how aggressive we should be about permanently retiring stuff, which is really just a discussion about how quickly the flash "wears out". That's not a big issue - but the possibility of writing data which later proves to be unreliable is.
-Scott