Re: [Yaffs] When is a block which is marked for retirement a…

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Ian McDonnell
Date:  
To: Wagner Scott (ST-IN/ENG1.1)
CC: yaffs
Subject: Re: [Yaffs] When is a block which is marked for retirement actually marked bad?
Scott,

On Sunday 22 March 2009, Wagner Scott (ST-IN/ENG1.1) wrote:
> Finally, my real concern is the case (which I have observed in
> real life) where I do lots of writes to a yaffs file system
> (e.g. untar a bunch of application software onto a new system)
> and encounter a bad write along the way. There is a lot of
> spewage of the form: **>> yaffs write required 2 attempts
> **>>mtd ecc error fix performed on chunk 84128:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84130:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84136:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84138:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84140:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84144:0
> **>>Block 2629 marked for retirement
> However, there is no message saying that Block 2629 has been
> retired. If I reboot the system at this point, there is also
> no indication from either the MTD or yaffs scan that Block
> 2629 is bad (hence my suspicion that the actual marking bad is
> done lazily at some future time.) Is something going wrong?


This is usually a sign that a block/page is not completely erased
when Yaffs believes it to have been so. It is 'sufficiently
erased' to make yaffs think the block/page/chunk is
unallocated -- the tags are all ones. Problem comes when MTD
writes to the non-erased page and the data is subsequently read
back.

Why wasn't the page fully erased? Are you power cycling during
test? If so, the NAND chip may be failing to fully erase a
block. Neither Yaffs or MTD cover this situation sufficiently.
It really requires hardware support -- asserting the NAND write
protect signal before the power rail goes too low for the NAND
chip to fully complete an erase.

We have seen this problem in the field - a full erase of the NAND
block(s) in question normally resolves the issue. There is no
need to truly 'retire' the block (i.e. mark it bad in the OOB).
Yaffs will not use the block for further allocation until the
filesystem is remounted. Then is may find the same problem
again, that is, until the block is fully erased -- and this may
not happen without intervention.

Ideally, Yaffs would collect up all chunks it can read/correct in
the block and copy them to another block, then erase the flaky
block.

-imcd