On Wednesday 08 February 2006 05:58, Richard A. Smith wrote: > Charles Manning wrote: > > The best guide I have read is the Toshiba NAND flash applications design > > guide, available at various locations including > > http://www.edn.com/contents/images/ToshibaNANDFlash1.pdf > > Thank you! that was _exactly_ what I needed. It answered all my > questions and even some I had not though of yet. > > > I don't believe that there is any "read disturb". Once written, AFAIK > > only other writes are likely to mess things up. > > Nope. See page 22 of the doc you pointed me to. > > Read Disturb — In this failure mode, a read operation can disturb the > memory contents causing a “1” to change to > a “0.” The bit error occurs on another page in the block, not the page > being read. > > Its non-permanent though an erase will fix it and _really_ unlikely. > > The ROM section of the document discussed that in their testing it was > 3ppm over 10 years. So 3 blocks out of every million blocks will have a > 1 bit error in 10 years. > > As you said the program-disturb is more common. Although still pretty > rare. 1E-10 or 1 bit per 10 billion Thanx for the correction. All of these failures get handles by ECC, but ECC is limited to 1 bit /256 bytes. NAND is getting more and more reliable, IMHO. Most devices will work fine (except for the factory marked bad blocks). Some will lose some blocks in the first pass or two (marginal blocks missed by the factory check, perhaps), then settle down to a long and useful service. Some might take longer to settle down or might lose blocks slowly for a long time. In the 100Gbyte test I mentioned, I saw no blocks go bad. This unit had been used for a couple of weeks toing thrash tests before this and I expect all had blocks had been marked. YAFFS takes a fairly cautious approach to handling bad blocks. If a block fails an ECC test then it's retired at the next garbage collection of that block (ie we suck out the data first). According to folks I have discussed this with at Toshiba, a block is likely to display 1-bit (recoverable) ECC errors long before going truly bad. Therefore I think the above strategyy should work prety well. > > > YAFFS2 does no rewrites (ie only one write per page and no deletion > > markers. > > Is YAFFS2 ready for production? I've been looking through the code and > I see a lot of FIXMEs and TODOs. YAFFS2 has been used in non-Linux systems for at least a year and has been evolving. Some of those evolutions broke a few things, including the deleted hardlink handling (fixed in December 2005). There way a problem where corrupted tags (due to a bad mtd-hook-up) could cause a crash. That was addressed a few days ago and would only have been seen by people with mtd problems. At the moment the main trauma is hooking up to the mtd. Once this is done, YAFFS performs well, IMHO. The yaffs_mtdif2.c code does not work at present with a stock kernel and various efforts are underway to fix this. Sergey posted a patch which almost works (did not fix the problem on the board I'm playing with). I have a unit running YAFFS2 on Linux on my desk right now. It is running fine (read/write/delete/garbage collection/...). It currently has the busybox interaction problem (see http://stoneboat.aleph1.co.uk/pipermail/yaffs/2005q4/001645.html). I have not tried this, and will try to implement a different fix today. I don't consider this a "serious flaw" like data loss or crashing, but it is one that should be fixed Basically, my take on this is that since the beginning on the year there are no serious YAFFS2 problems apart from hooking up to the mtd. YAFFS Direct users don't have this problem. > > The no deletion markers is a bit confusing. I've not yet groked how > YAFFS2 does this. Care to enlighten me? Most of the method is sketched (briefly) in http://www.aleph1.co.uk/yaffs/yaffs2.html YAFFS2 uses the sequence number to determine the flow of time and what has happened to the fs. But in very brief: * When we write a new chunk the previous one is discarded. We can look at the sequence numbers to determine which one is valid and which is not. * Operations such as resizing a file are handled by writing a new file header stating the limits of the file. * We handle file deletion my moving the file to a fake "deleted directory". > > > NAND flash seems to be getting more reliable all the time. I did some > > accelerated lifetime testing where I wrote and verified over 100Gbytes > > of data without a single bit being damaged. > > Good news. I'll bet the 1e-10 error rate is at the max rated operating > temp of the part. So in the normal temp range the error rate is > probably far lower. The other factor is that the error rates are also determined using a maximum number of writes per page (too many trigger write disturbs). Since YAFFS2 only ever does one write per paged it is less likely to cause this. > > > YAFFS direct is vanilla C and should compile fine for just about > > anything. > > Excellent. Has anyone actually done it though? > Hopefully, with no weird niosII compiler bugs or linker problems. Most definitely. YAFFS Direct is being used by a few people in a wide range of applications. I also use YAFFS Direct as the primary test bed for any yaffs_guts work. -- CHarles