>> > Can you reproduce the problem? Does the corruption hit the same >> > file? Is it similar in other files? Do you know it's not a NAND >> > or MTD problem -- i.e a corrupted write or a bad device. Have >> > you seen this problem on other instances of the h/w. etc. >> >>That's the only device I've seen it with - out of 20-30 pieces having >>had the same "treatment" :-) >>And no I haven't tried that device any more - I didn't want to ruin the >>possibility to analyse what has happened... >> >>And I don't know if it's a NAND or MTD problem - I was hoping that some >>could guide me... >> >>Can this occur, say, with a bad NAND? Would YAFFS/MTD puke up with a lot >>of checksum errors? > > > A few things that I can think of: > > 1) A gross NAND failure. YAFFS/mtd are not magic and need reasonably reliable > media to do anything. ECC can fix for single bit errors, but nothing more. If > can't fix gross NAND errors any more than ReiserFS can work with a disk with > a 6 inch nail through it. > > 2) Iffy timing. CHeck you NAND access timing. Marginal timing has a habit of > making some parts work OK and others not. > > 3) Check that the ECC code is actually working OK. A poor ECC implementation > could cause more damage than it fixes. > > 4) Bad block handling. If a bad block is not being flagged correctly then you > could end up retrying it on every mount. That would be a problem. I haven't had the time to dig further into to this - we've been strugling with other critical issues - namely bad powerup and most noticeable of all: Memory failures! Some of our boards crashes and in "lightweight" situations the memory is just modified slightly. So for now I put all my faith in this being the reason for this systematic bit-changing... But I guess, in order for this to be The Plausible Real Explanation (TM), the bits would have been modified writing the file. However, the error just occurred after some several reboots and additional writes to the NAND. But perhaps, the additional writing could trigger new instructions/code from the altered file (libc.so)?! Does this sound likely? BR, Martin Egholm