Hello Lukasz
On Mon, May 8, 2017 at 8:11 PM, Lukasz Majewski <
lukma@denx.de> wrote:
> Hi Charles,
>
> Thanks for your reply.
>
> > On Fri, May 5, 2017 at 10:34 AM, Lukasz Majewski <lukma@denx.de>
> > wrote:
> >
> > > Dear All,
> > >
> > > I'm working on embedded system equipped with NAND Flash memory.
> > >
> > > The code is pretty old and corresponds to SHA1:
> > > 60f5ecebdeee37d56f33374c407376f596baa468
> > >
> > > from: git://www.aleph1.co.uk/yaffs2
> > >
> > >
> > > From my debugging I do see two bit flips (should be 1s, but I read
> > > 0s) happening in the same chunk of data (0x100), from which ECC is
> > > calculated.
> > > As fair as I know the ECC will be "correct" for two bit flops.
> > >
> > >
> > > I've looked to the yaffs_ecc.* files and found following comment:
> > >
> > >
> > > /*
> > > * This code implements the ECC algorithm used in SmartMedia.
> > > *
> > > * The ECC comprises 22 bits of parity information and is stuffed
> > > into 3 bytes.
> > > * The two unused bit are set to 1.
> > > * The ECC can correct single bit errors in a 256-byte page of data.
> > > * Thus, two such ECC blocks are used on a 512-byte NAND page.
> > > *
> > > */
> > >
> > > So it seems like two bit flops are not detected - only single bit
> > > flop is detected and corrected.
> > >
> > > Is there any way to mitigate this issue?
> > >
> >
> > There are many ways to do ECC in the system. Yaffs provides an ECC
> > function that is really intended for older flash devices (SmartMedia
> > etc - really, really old) running in Yaffs1 mode where 1-bit
> > correcting ECC was the norm. That Yaffs ECC code is also used for
> > protecting very small blocks of unprotected data (eg. tags).
>
> I've looked into the code deeper and:
>
> - I'm using yaffs2 with useNANDECC = 1, so I rely on Linux MTD
> subsystem (nand flash driver -> nand_ecc.c) on calculating ECC
>
> Unfortunately - the Linux version which I do use (2.6.27) only supports
> NAND_ECC_SOFT, which corresponds to Hamming 1bit/256B correction.
>
> I think that I will switch to BCH codes 4bit/512B.
>
That sounds like a good idea if you are getting bit flips.
>
> One question - will yaffs2 work with OOB's ECC extended from 24B to 28B?
>
That should not be a problem, so long as you sort out the new oob byte
placement in thr driver.
If you run out of space in oob to stoire tags then you can also switch to
inband tags mode and the tags get stored in the data area.
>
> (I'm also wondering how I could setup a test environment to validate
> switch to BCH ECC scheme - use nandsim driver from kernel?)
>
> Or would you recommend something better?
>
NAND testing is very challenging because it is often hard to get devices to
fail when you want them to. Running stress tests helps.
>
> >
> > In most circumstances (ie. running Yaffs2 mode), the actual ECC that
> > is used is done in the driver and it is not part of Yaffs.
> >
> >
>
> Yes, you are right -> ECC is calculated in NAND driver. However, with
> my kernel - both algorithms are the same.... (1bit/256B ECC) => 24B
> stored in OOB.
>
> >
> >
> > >
> > > IMHO the NAND flash page for this data chunk shall be marked as BAD
> > > -> but we cannot detect such errors.
> > >
> > > Is there any plan to implement new algorithm?
> > >
> >
> > When Yaffs2 was introduced (many years ago now), the biggest
> > motivation was that there are different NAND types with different ECC
> > requirements, different write order requirements etc.
> >
> > ie. the decision was made that it is impossible to try handle all the
> > ECC variants (hw/sw/BCH/...) within Yaffs and it was better to move
> > that into the driver.
> >
> > If you can post a bit more info about the flash parts you are using
> > etc, then I can give a better appraisal.
>
> I'm using Samsung's NAND Flash memory - 128M x 8B. It uses 2KiB pages
> with 128KiB erase blocks. It doesn't support ECC implemented in its
> internal controller (only EDC).
>
> I'm also wondering what would happen if:
>
> - I do store file in the YAFFS2 FS
>
> - This file is RO mostly (some library)
>
> - By torture testing it happens that:
> -- I do have 1 bit flip -> no problem ECC will correct it
>
> -- I do have 2 bit flips in the 256 ECC "covered" data -
> is this data regarded as valid or is eccResult =
> YAFFS_ECC_RESULT_UNFIXED set?
>
> From my systems it seems like the Yaffs2 chunk is treated as
> a correct one (data is read from this file, but checksums
> differs with the factory file).
>
> What can be done in such situation? How to fix it?
>
Currently Yaffs will return data if an unfixed bitflip happens. The
question is this: Is it better to return data with a bit flip or is it
better to return no data if there is an uncorrectable bit flip?
I am currently working through some changes to allow the user to decide if
they want data or would rather get -EIO.
-- Charles