[Yaffs] bit error rates]

Fri Feb 10 16:56:16 GMT 2006

>>
>>I think that's a bad idea. The block should be marked as bad. It's not
>>worth losing data just to save out on theorectically marking a good
>>block bad - it doesn't seem to happen in practice. I'd rather lose all
>>of the good blocks than lose any data, so would many other people.
>>
> But if you loose ALL good blocks you loose also your data!  ;-)

I tend to agree with you.  Early retirement of blocks with correctable ECC 
also doesn't seem as clearly a good idea to me.

Even if new flash chips may not exhibit this failure mode often in 
"accelerated lifetime" testing, thats not to say another generation of 
future NAND flash chips won't (or old ones resurrected in future surplus 
markets).    Since this behavior is already documented by Toshiba you can be 
certain that if a future NAND flash manufacturing process has a side effect 
of this happening more often, the manufacturers will use it in a heartbeat 
to keep their yields high.  Also, one version of "accelerated lifetime" 
testing may not quite mesh with what actually happens with real devices over 
the course of several years of deployment in the field.   The environments 
people place their embedded designs in can be quite surprisingly hostile.

>
>>
>>>That's because Toshiba document says about soft errors: "This condition
>>>is cleared by a block erase".
>>>
>>
>>Sure. But it might be indicative of a problem nonetheless.
>>
> Is this statement based on any documentation or on your personal 
> experience?
> Toshiba document says also "Although random bit errors may occur during 
> use,
> this does not necessarily mean that a block is bad"
> However if this is not true can you point me to other documentation 
> talking about
> the relationship between random bit errors and permanent block failures?
>
> Sorry, I don't want to raise a flame war, I just want to understand YAFFS 
> bad block
> marking policy, and if there is a better solution.
> Excuse me for my English, it's not my natural language.

I would at least hope for a knob so that one could at least turn off this 
"early retirement" of good blocks.

Even if its rare, the probabilistic chance of it happening tracks with the 
amount of  reading (not writing) you do on the flash and I could have an 
application that accumulates a large number of bad blocks this way by 
reading flash at full speed for many years straight. (a not-so-rare embedded 
design)   I would hope to not have to worry about the flash wearing out with 
100% read loads.  Its harder to do accelerated lifetime analysis on that 
since that application is already reading at close to max speed.

This issue chimes uniquely with me since we've seen a small number of our 
boards running YAFFS and NAND flash returned to us with large amounts of 
marked bad blocks that weren't actually bad.  I'm not sure if the YAFFS 
policy of retiring blocks early was to blame, but its the most likely 
explanation yet.   Needless to say, our customers weren't impressed since 
they believed their application had very little flash writing (but a lot of 
flash reading) going on.

//Jesse Off