(This is a resend of a message I sent last week, but I wasn't subscribed to the list at the time so it ended up in list moderator approval land. Apologies in advance if the original message does eventually show up as a duplicate, but I suspect they are buried in a bucket of endless spam and will never be heard from again...) As part of testing yaffs2 with mtd nandsim with various error simulation options turned on, we discovered some issues with the error handling of yaffs. I've posted some simple patches to correct those issues and now yaffs appears to be correctly doing ecc and scrubbing blocks when corrected errors are detected. However, I noticed that the actual scrubbing (e.g. prioritized garbage collection) is usually deferred till the next write operation. Given that our flash usage patterns vary considerably and may be free of writing for very long periods of time, I thought it would be wise in our case to trigger garbage collection after reads as well. That appears to work fine; I'm sure it degrades performance to some degree but it seems acceptable. Perhaps a conditional triggering of the gc based on the prioritized flag would be better, but anyway. When I ran this test on nandsim with bitflips=1 (which assures a constant stream of single bit errors, basically insane pathological conditions), the expected behavior resulted -- blocks were being rewritten and moved around like crazy just by reading a file. However, during and extended run of this process memory usage steadily grew until the oom killer eventually started going ballistic on everything in sight, and the system ground to a total halt. I'm not sure if the problem is actually a memory leak of some kind in yaffs_CheckGarbageCollection or if it's an artifact of the different context in which I'm having it called (from yaffs_ReadChunkDataFromObject), but I thought I'd mention it anyway for the record. Also, another observation (I think this was noted recently on the list already) is that a MTD -EBADMSG result (or YAFFS_ECC_RESULT_UNFIXED) doesn't appear to translate into an error condition at the userspace level -- from what I can tell, bad data is returned to userspace with no indication of its badness. Obviously we would all prefer that bad data never happen at all, but pretending that bad data is good seems perhaps a little too zealous. :-) In practice most of the time if we have bad data on our flash it's disasterous anyway and it doesn't really matter much in the end if it's returned as an EIO error or as bad data to userspace, but for some bits, like configuration data, we could take reasonable steps (e.g. restoring defaults) if we can detect bad data, whereas the results of processing bad data is undefined. One final point related to the last one, as far as I can tell yaffs will in most places process tag data from blocks where the tag ecc has failed, and this appears to sometimes lead to system hangs. I think it would be desirable to avoid handling the tag data entirely in this case, since we know it to be corrupt in some way. I'm not sure exactly what you can do with a chunk in such a case; presumably some kind of recovery would be required given that some chunk/object ID is basically going away but there is no reliable way to know which ID it is based on inspecting the (invalid in some way) metadata itself. That sounds like it might be a significant project, though as my understanding of yaffs internals is quite limited I don't really know for sure. Thanks, -Yeasah Pell