Hi Ian,

On 8/16/07, ian@brightstareng.com <ian@brightstareng.com> wrote:
> Raj,
>
> On Thursday 16 August 2007 08:16, Raj Kumar Yadav wrote:
> > On Tuesday 14 August 2007 12:32, ian@brightstareng.com wrote:
> > > Tuesday 14 August 2007 11:45, Raj Kumar Yadav wrote:
> > > > 2) It is found that
> > > > the block on which NAND verify failed, is
> > > > actually bad, as I am unable to erase/write on that block
> > > > using nand-utils or the custom bootloader commands.
> > > >
> > > > NAND Controller shows status as success after erase/write.
> > > >
> > > > But after the erase, all bytes on the 1st, 3rd, 5th, ...
> > > > page are 0xFF. and all bytes on the 2nd, 4th, 6th,... page
> > > > are 0x00.
> > > >
> > > > Similarily, writing data pattern on any of the pages in
> > > > that block have no effect on the page data.
>
> So this looks like a problem with the NAND, the nand driver, or
> MTD. Have you found anyone using the your NAND controller
> successfully?  I've not seen this kind of failure with raw
> NAND.

I differ, this is happening only on one board and in same block.
Also, not a very strong point, but i am using the SOC with the
NAND controller for more than a year now.  Yes the NAND chip
on which i am facing this problem is from Hynix 512MB
(with 2K page, 256K block).

>
> > > >
> > > > This also means that, I cannot mark the block bad, as the
> > > > first page is all 0xff and nand write on the page has no
> > > > effect on page data.
>
> Can you use a different device for testing, or replace the NAND
> chip?

Yes, the same board with same s/w and h/w is working fine on all
the other boards as of now.


> > > > So, it is ending up in a situation, where the block will
> > > > never be marked bad, and the write will always fail (due
> > > > to MTD NAND verify) on the block pages.
> > >
> > > Perhaps you could try posting this question to the
> > > linux-mtd. The ARM Linux list is also a good place to ask
> > > for platform specific (NAND controller) help.
> >
> > I have asked about it in the linux-mtd mailing list, David
> > Woodhouse suggested to use the 'bad block table' policy
> > provided in MTD layer to keep the track of such
> > blocks. Addition to the 'bbt' can be made on every nand verify
> > failure (even after the applied ECC correction).
>
> This sounds like a suggestion to use an alternative (table based)
> bad-block implementation -- if this is just to cope with a single
> broken NAND chip I think you're better of getting a new chip.

Yes, you are right, as the problem is only with one board and also not
causing any harm in term of data loss, except may be the checkpoint thing.

> If the problem is caused by a s/w or h/w controller issue,
> changing the bad-block marking scheme won't fix the problem,
> because the problem isn't really a bad-block.

As the same s/w is working on rest of the boards(same h/w), I do believe
that the block is actually bad, but not sure on this.

The SOC, also have a direct SMI Interface to access the NAND chip,
let me see, if i can use that to verify that the block in the NAND chip
on that board is actually bad.

Thanks for your quick and witty responses.

Raj Kumar Yadav