Hello Charles, > Hello Lukasz > > On Mon, May 8, 2017 at 8:11 PM, Lukasz Majewski wrote: > > > Hi Charles, > > > > Thanks for your reply. > > > > > On Fri, May 5, 2017 at 10:34 AM, Lukasz Majewski > > > wrote: > > > > > > > Dear All, > > > > > > > > I'm working on embedded system equipped with NAND Flash memory. > > > > > > > > The code is pretty old and corresponds to SHA1: > > > > 60f5ecebdeee37d56f33374c407376f596baa468 > > > > > > > > from: git://www.aleph1.co.uk/yaffs2 > > > > > > > > > > > > From my debugging I do see two bit flips (should be 1s, but I > > > > read 0s) happening in the same chunk of data (0x100), from > > > > which ECC is calculated. > > > > As fair as I know the ECC will be "correct" for two bit flops. > > > > > > > > > > > > I've looked to the yaffs_ecc.* files and found following > > > > comment: > > > > > > > > > > > > /* > > > > * This code implements the ECC algorithm used in SmartMedia. > > > > * > > > > * The ECC comprises 22 bits of parity information and is > > > > stuffed into 3 bytes. > > > > * The two unused bit are set to 1. > > > > * The ECC can correct single bit errors in a 256-byte page of > > > > data. > > > > * Thus, two such ECC blocks are used on a 512-byte NAND page. > > > > * > > > > */ > > > > > > > > So it seems like two bit flops are not detected - only single > > > > bit flop is detected and corrected. > > > > > > > > Is there any way to mitigate this issue? > > > > > > > > > > There are many ways to do ECC in the system. Yaffs provides an ECC > > > function that is really intended for older flash devices > > > (SmartMedia etc - really, really old) running in Yaffs1 mode > > > where 1-bit correcting ECC was the norm. That Yaffs ECC code is > > > also used for protecting very small blocks of unprotected data > > > (eg. tags). > > > > I've looked into the code deeper and: > > > > - I'm using yaffs2 with useNANDECC = 1, so I rely on Linux MTD > > subsystem (nand flash driver -> nand_ecc.c) on calculating ECC > > > > Unfortunately - the Linux version which I do use (2.6.27) only > > supports NAND_ECC_SOFT, which corresponds to Hamming 1bit/256B > > correction. > > > > I think that I will switch to BCH codes 4bit/512B. > > > > That sounds like a good idea if you are getting bit flips. > > > > > > One question - will yaffs2 work with OOB's ECC extended from 24B to > > 28B? > > > > That should not be a problem, so long as you sort out the new oob byte > placement in thr driver. > > If you run out of space in oob to stoire tags then you can also > switch to inband tags mode and the tags get stored in the data area. I see. Thanks for pointing this out. > > > > > > (I'm also wondering how I could setup a test environment to validate > > switch to BCH ECC scheme - use nandsim driver from kernel?) > > > > Or would you recommend something better? > > > > NAND testing is very challenging because it is often hard to get > devices to fail when you want them to. Running stress tests helps. I've backported mtd_test Linux kernel modules for this. Let's see what will happen. > > > > > > > > > In most circumstances (ie. running Yaffs2 mode), the actual ECC > > > that is used is done in the driver and it is not part of Yaffs. > > > > > > > > > > Yes, you are right -> ECC is calculated in NAND driver. However, > > with my kernel - both algorithms are the same.... (1bit/256B ECC) > > => 24B stored in OOB. > > > > > > > > > > > > > > > > IMHO the NAND flash page for this data chunk shall be marked as > > > > BAD -> but we cannot detect such errors. > > > > > > > > Is there any plan to implement new algorithm? > > > > > > > > > > When Yaffs2 was introduced (many years ago now), the biggest > > > motivation was that there are different NAND types with different > > > ECC requirements, different write order requirements etc. > > > > > > ie. the decision was made that it is impossible to try handle all > > > the ECC variants (hw/sw/BCH/...) within Yaffs and it was better > > > to move that into the driver. > > > > > > If you can post a bit more info about the flash parts you are > > > using etc, then I can give a better appraisal. > > > > I'm using Samsung's NAND Flash memory - 128M x 8B. It uses 2KiB > > pages with 128KiB erase blocks. It doesn't support ECC implemented > > in its internal controller (only EDC). > > > > I'm also wondering what would happen if: > > > > - I do store file in the YAFFS2 FS > > > > - This file is RO mostly (some library) > > > > - By torture testing it happens that: > > -- I do have 1 bit flip -> no problem ECC will correct it > > > > -- I do have 2 bit flips in the 256 ECC "covered" data - > > is this data regarded as valid or is eccResult = > > YAFFS_ECC_RESULT_UNFIXED set? > > > > From my systems it seems like the Yaffs2 chunk is > > treated as a correct one (data is read from this file, but checksums > > differs with the factory file). > > > > What can be done in such situation? How to fix it? > > > > Currently Yaffs will return data if an unfixed bitflip happens. The > question is this: Is it better to return data with a bit flip or is it > better to return no data if there is an uncorrectable bit flip? I any way (when ECC is not capable to fix things) we have corrupted, unrecoverable data. > > I am currently working through some changes to allow the user to > decide if they want data or would rather get -EIO. Ok. I see. > > -- Charles Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de