On Thursday 16 February 2006 02:25, Jon Masters wrote:
> On 2/10/06, Charles Manning <manningc2@actrix.gen.nz> wrote:
> > I think an interrupted erase is probably more likely to cause
> > problems, but again this is just a hunch.
>
> I wonder how we could implement logic to detect this.
>
> > Dealing to an interrupted write is relatively straight forward. It
> > will always be the last page written before the system went
> > down. Most of the time (except for the last page written to a
> > block), we can detect the last page because it is the  last page
> > in the currently allocated block.
>
> I don't think this is currently testing on mount though.

That is correct, it is not being done at present. I was thinking as to how it 
might be done.
>
> > It would be nice to improve this, but as Jon sayas, I think data
> > integrity should always come first!
>
> Other people seem to disagree with my previous suggestions and I'm not
> saying I can't be wrong in the matter :-) But I've not seen excessive
> numbers of blocks being marked bad (except when fixing the OOB
> code...) with read ECC failures. I accept though that this might just
> be good old fashioned paranoia so if one of the vendor folks on this
> list can comment, it would really help.

Some people have reported seeing a large number of blocks (~30-50%) being 
retired on some devices. That's obviously not a GoodThing, but I'd like to 
see what % of units failed. Then, how does one measure and evaluate this?

To my mind, if you ship 1000 units and half of them lose 30-50% of their 
blocks in a year of normal use, that's probably a BadThing. If this only 
happens on 1% of shipped units it might be an OKThing (depending on your 
perspective).

However, losing data is also a BadThing.

It's one of those rock-and-hard-place sandwich choices. Any mods will be 
configurable to allow current semantics.

-- Charles