On Thursday 16 February 2006 02:25, Jon Masters wrote: > On 2/10/06, Charles Manning wrote: > > I think an interrupted erase is probably more likely to cause > > problems, but again this is just a hunch. > > I wonder how we could implement logic to detect this. > > > Dealing to an interrupted write is relatively straight forward. It > > will always be the last page written before the system went > > down. Most of the time (except for the last page written to a > > block), we can detect the last page because it is the last page > > in the currently allocated block. > > I don't think this is currently testing on mount though. That is correct, it is not being done at present. I was thinking as to how it might be done. > > > It would be nice to improve this, but as Jon sayas, I think data > > integrity should always come first! > > Other people seem to disagree with my previous suggestions and I'm not > saying I can't be wrong in the matter :-) But I've not seen excessive > numbers of blocks being marked bad (except when fixing the OOB > code...) with read ECC failures. I accept though that this might just > be good old fashioned paranoia so if one of the vendor folks on this > list can comment, it would really help. Some people have reported seeing a large number of blocks (~30-50%) being retired on some devices. That's obviously not a GoodThing, but I'd like to see what % of units failed. Then, how does one measure and evaluate this? To my mind, if you ship 1000 units and half of them lose 30-50% of their blocks in a year of normal use, that's probably a BadThing. If this only happens on 1% of shipped units it might be an OKThing (depending on your perspective). However, losing data is also a BadThing. It's one of those rock-and-hard-place sandwich choices. Any mods will be configurable to allow current semantics. -- Charles