Hello Charles,
> Hello Lukasz
>
> On Mon, May 8, 2017 at 8:11 PM, Lukasz Majewski <lukma@denx.de> wrote:
>
> > Hi Charles,
> >
> > Thanks for your reply.
> >
> > > On Fri, May 5, 2017 at 10:34 AM, Lukasz Majewski <lukma@denx.de>
> > > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I'm working on embedded system equipped with NAND Flash memory.
> > > >
> > > > The code is pretty old and corresponds to SHA1:
> > > > 60f5ecebdeee37d56f33374c407376f596baa468
> > > >
> > > > from: git://www.aleph1.co.uk/yaffs2
> > > >
> > > >
> > > > From my debugging I do see two bit flips (should be 1s, but I
> > > > read 0s) happening in the same chunk of data (0x100), from
> > > > which ECC is calculated.
> > > > As fair as I know the ECC will be "correct" for two bit flops.
> > > >
> > > >
> > > > I've looked to the yaffs_ecc.* files and found following
> > > > comment:
> > > >
> > > >
> > > > /*
> > > > * This code implements the ECC algorithm used in SmartMedia.
> > > > *
> > > > * The ECC comprises 22 bits of parity information and is
> > > > stuffed into 3 bytes.
> > > > * The two unused bit are set to 1.
> > > > * The ECC can correct single bit errors in a 256-byte page of
> > > > data.
> > > > * Thus, two such ECC blocks are used on a 512-byte NAND page.
> > > > *
> > > > */
> > > >
> > > > So it seems like two bit flops are not detected - only single
> > > > bit flop is detected and corrected.
> > > >
> > > > Is there any way to mitigate this issue?
> > > >
> > >
> > > There are many ways to do ECC in the system. Yaffs provides an ECC
> > > function that is really intended for older flash devices
> > > (SmartMedia etc - really, really old) running in Yaffs1 mode
> > > where 1-bit correcting ECC was the norm. That Yaffs ECC code is
> > > also used for protecting very small blocks of unprotected data
> > > (eg. tags).
> >
> > I've looked into the code deeper and:
> >
> > - I'm using yaffs2 with useNANDECC = 1, so I rely on Linux MTD
> > subsystem (nand flash driver -> nand_ecc.c) on calculating ECC
> >
> > Unfortunately - the Linux version which I do use (2.6.27) only
> > supports NAND_ECC_SOFT, which corresponds to Hamming 1bit/256B
> > correction.
> >
> > I think that I will switch to BCH codes 4bit/512B.
> >
>
> That sounds like a good idea if you are getting bit flips.
>
>
> >
> > One question - will yaffs2 work with OOB's ECC extended from 24B to
> > 28B?
> >
>
> That should not be a problem, so long as you sort out the new oob byte
> placement in thr driver.
>
> If you run out of space in oob to stoire tags then you can also
> switch to inband tags mode and the tags get stored in the data area.
I see. Thanks for pointing this out.
>
>
> >
> > (I'm also wondering how I could setup a test environment to validate
> > switch to BCH ECC scheme - use nandsim driver from kernel?)
> >
> > Or would you recommend something better?
> >
>
> NAND testing is very challenging because it is often hard to get
> devices to fail when you want them to. Running stress tests helps.
I've backported mtd_test Linux kernel modules for this. Let's see what
will happen.
>
> >
> > >
> > > In most circumstances (ie. running Yaffs2 mode), the actual ECC
> > > that is used is done in the driver and it is not part of Yaffs.
> > >
> > >
> >
> > Yes, you are right -> ECC is calculated in NAND driver. However,
> > with my kernel - both algorithms are the same.... (1bit/256B ECC)
> > => 24B stored in OOB.
> >
> > >
> > >
> > > >
> > > > IMHO the NAND flash page for this data chunk shall be marked as
> > > > BAD -> but we cannot detect such errors.
> > > >
> > > > Is there any plan to implement new algorithm?
> > > >
> > >
> > > When Yaffs2 was introduced (many years ago now), the biggest
> > > motivation was that there are different NAND types with different
> > > ECC requirements, different write order requirements etc.
> > >
> > > ie. the decision was made that it is impossible to try handle all
> > > the ECC variants (hw/sw/BCH/...) within Yaffs and it was better
> > > to move that into the driver.
> > >
> > > If you can post a bit more info about the flash parts you are
> > > using etc, then I can give a better appraisal.
> >
> > I'm using Samsung's NAND Flash memory - 128M x 8B. It uses 2KiB
> > pages with 128KiB erase blocks. It doesn't support ECC implemented
> > in its internal controller (only EDC).
> >
> > I'm also wondering what would happen if:
> >
> > - I do store file in the YAFFS2 FS
> >
> > - This file is RO mostly (some library)
> >
> > - By torture testing it happens that:
> > -- I do have 1 bit flip -> no problem ECC will correct it
> >
> > -- I do have 2 bit flips in the 256 ECC "covered" data -
> > is this data regarded as valid or is eccResult =
> > YAFFS_ECC_RESULT_UNFIXED set?
> >
> > From my systems it seems like the Yaffs2 chunk is
> > treated as a correct one (data is read from this file, but checksums
> > differs with the factory file).
> >
> > What can be done in such situation? How to fix it?
> >
>
> Currently Yaffs will return data if an unfixed bitflip happens. The
> question is this: Is it better to return data with a bit flip or is it
> better to return no data if there is an uncorrectable bit flip?
I any way (when ECC is not capable to fix things) we have corrupted,
unrecoverable data.
>
> I am currently working through some changes to allow the user to
> decide if they want data or would rather get -EIO.
Ok. I see.
>
> -- Charles
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de