Re: [Yaffs] yaffs2 mount failure after sometime

Attachments:
Message as email (text/plain)

Author: ian@brightstareng.com
Date:
To: Raj Kumar Yadav
CC: yaffs
Subject: Re: [Yaffs] yaffs2 mount failure after sometime

Raj,

On Thursday 02 August 2007 14:20, Raj Kumar Yadav wrote:
> > > Do you think, it is necessary to enable the option "page
> > > verify after write" in nand driver under MTD layer. Will
> > > it solve the problem ?
> >
> > Turning on verify while you figure out what's going is not a
> > bad idea. Verification costs however, the data is read back
> > and compared after the write, so there's additional i/o and
> > CPU involved. The NAND chip itself is supposed to check that
> > the write/erase is successful as the operation completes --
> > it indicates this in a status byte that's read by MTD -- so
> > it should not be necessary to read back the data to verify
> > the write. Now there could be other issues outside the NAND
> > chip that cause data errors, in which case having MTD
> > read-back and verify data just written may be useful.
>
> I have a doubt, If a block is gone bad, writing 0xA5 may end
> up to for example 0xF5 or similar at some byte, somewhere in
> page. In this case the flash controller will not return error
> or Will it ?.

Not sure what you mean by 'flash controller' do you have a
platform equipped with hardware-based NAND ECC (under MTD).
The software ECC that MTD uses can detected/correct a single bit
errors in each 256 byte segment of the page, and detect 2-bit
errors; anything more may slip through. MTD does not check
the Yaffs tag data that is placed in the NAND 'spare' bytes.
Tag data errors are only detected/corrected by Yaffs (as seen the
trace output in your original posting).

> I agree on the cost part, however if write verify makes the
> solution more robust, atleast I
> am not averse to the idea.

If one has problems with data integrity on the bus to/from the
CPU the NAND chip, ECC and write-verification is not really
going to help -- I'm not saying you actually have any such
problem.

If you think you may have erased a bad-block marker (during
development) that's very probably the explanation for all this.
Mark the block bad again by zeroing the marker byte -- mtd has a
function to do this -- not sure if there's a tool/app for this.

It looks to me like Yaffs has a problem, clearly it should not
have abandoned the scan just because of a chunk/page read error.

As a side: if a block fails with a bit stuck on (1), then I
wonder if the NAND chip's test for a successful erase will pass.
I don't see why it wouldn't -- I assume it's just testing that
all the cells read as one (1). In which case the failure of the
cell would only be seen when writing a page that has a 0 at the
location of the broken cell and MTD should be informed of such
a failure by the NAND chip's write status. If the data bit is
1, then the broken bit goes unnoticed and the failure is benign.

Retirement: Perhaps on a hard read failure it would be good to
attempt to write all-zeros to the page. If this fails, this is
more evidence that there is a real problem with the page and it
can be marked for retirement. If a cell is stuck at 0, that
should manifest itself later as an erase failure, and it can be
marked for retirement. But if a write fails, but it *is*
possible to zero all the bits, then ignore the write failure and
move on. While doing all this testing, don't actually zero any
block status bytes.

-imcd

This message is part of the following thread:
	the complete thread tree sorted by date
	Raj Kumar Yadav at