On Thursday 20 January 2005 23:02, Jacob Dall wrote:
> Hello yaffers,
>
> I've a few questions regarding why yaffs' bad block management is designed
> the way it is.
>
> According to Toshiba, NAND failures can be distinguished as "permanent
> failures" or "soft errors"
>
> 1) Permanent failures: this error occurs when programming or erasing, and
> can be detected by reading the status register after operation.
>
> 2) Soft errors: this error occurs during a program, but can only be
> detected by reads. The error is cleared by a block erase.
>
> Now, upon read, if yaffs detects an unfixable ECC error in a page, the
> block holding that page is marked as bad. According to 2) it would be ok to
> just mark the page as discarded and let the garbage collector do its job -
> or have I missed something?
This mechanism was designed before Toshiba shared their wonderful document
with the world. I have considered changing this, but it has never been a very
high priority and it does put data at risk.
The "soft errors" are typically write disturb failures that can (hopefully)
be fixed by ECC. My concern is that if a block displays write disturb
problems then perhaps it is "going bad". ECC can only fix single bit errors.
I don't want to wait until it has "gone bad" and lost data before I retire
it. I'd prefer to retire dodgy looking blocks earlier.
>
> In yaffs, a block is marked bad by writing 0 to byte 517 in page 0 / 1 in
> the block. Why wasn't it decided to use another value (for instance, like
> SmartMedia's 0xF0). Then it would have been possible to destinguish initial
> bad blocks from operational bad blocks.
This was considered. However I decided to use 0x00 because this would have
the most likelihood of programming a block where the bits don't "stick"well.
A sparse bit pattern is less likely to program than all 0s.
THis could be changed quite easily.
Generally the factory marked bad blocks are not just marked with this byte.
Mostly the whole OOB area or even the whole block is marked zero. THis
generally makes it easy enough to distnguish factor marked from YAFFS-marked
bad blocks.
>
> I've an issue with some of my devices - bad blocks is increased very
> rapidly. Beyond the fact that it's due to ECC read errors, I'm yet to
> discover the root of the problem.
I've done extensive lifetime testing on some devices. One test I did wrote
approx 130GB stuff, read and verified it with not one ECC failure or bit
getting munged.
Some other people doing lifetime testing have expressed concern because they
lose 1-2% of flash during the lifetime of a device.
What do you mean by rapidly? I assume it is far worse than either of these!
If you're using Linuxx, then the most likely cuases of the problem are a miss
match between the ECC strategy you're using in YAFFS and what you have
configured in mtd.
>
> I'm not blaming yaffs - I'm sure the problem is to be found else where, but
> I'm thinking really hard of making those changes to yaffs, making me able
> to get back to the state when the NAND was first taken into use.
>
> Please let me know your reasons / thoughts...
Being able to change the bad block marker would help you with bench testing
until you have fixed the real problem.
There are two things you could try:
1) In yaffs_RetireBlock, change the blockstatus to some easy to detect value
that has at least two zero bits (eg. 0xFC).
2) Or even turn off the writing of bad block markers completely. This would
cause problems in the file system state, but that probably does not matter
for you at the moment.
Of course I'm assuming you just want to do these changes while you find and
fix the real problem. I would not suggest shipping product with either of
these changes.
>
>
> Thanks and regards,
> Jacob Dall
>
> FYI: the 'According to Toshiba' stuff was taken from a document named 'NAND
> Flash Application Design Guide'
Great doc. Should be required reading for anyone working with NAND.
>
>
> _______________________________________________
> yaffs mailing list
> yaffs@stoneboat.aleph1.co.uk
> http://stoneboat.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs