Hi Ian,
Thanks for the reply.
There is indeed something artificial about my test ... on this system, I
know that Block 2629 is physically bad. At least part of the time,
writes to some of its pages will not succeed. I have force-erased it
(all OOB bad block markers are reset) to simulate the condition where I
have a new NAND chip with as-yet-undiscovered bad pages. This explains
the observation that the "block/page is not completely erased when Yaffs
believes it to have been so."
My real concern is what happens when I have page write failures.. I
think what you are saying is that YAFFS:
- retries the write on a different page
- leaves the incorrectly written page alone and orphaned (e.g. it won't
be part of the YAFFS fs structure)
- allows the eraseblock (and its remaining good pages) to live on, at
least until erasure
- Marks the eraseblock bad in OOB when it is erased (e.g. retires the
block at erase time)
Is this right? If so, it seems OK as long as bad pages within an
eraseblock does not imply unreliability of other pages within the same
eraseblock.
Regards,
Scott
-----Original Message-----
From: Ian McDonnell
Cc:
yaffs@lists.aleph1.co.uk
Subject: Re: [Yaffs] When is a block which is marked for retirement
actually marked bad?
Scott,
On Sunday 22 March 2009, Wagner Scott (ST-IN/ENG1.1) wrote:
> Finally, my real concern is the case (which I have observed in
> real life) where I do lots of writes to a yaffs file system
> (e.g. untar a bunch of application software onto a new system)
> and encounter a bad write along the way. There is a lot of
> spewage of the form: **>> yaffs write required 2 attempts
> **>>mtd ecc error fix performed on chunk 84128:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84130:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84136:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84138:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84140:0
> **>>Block 2629 marked for retirement
> **>>mtd ecc error fix performed on chunk 84144:0
> **>>Block 2629 marked for retirement
> However, there is no message saying that Block 2629 has been
> retired. If I reboot the system at this point, there is also
> no indication from either the MTD or yaffs scan that Block
> 2629 is bad (hence my suspicion that the actual marking bad is
> done lazily at some future time.) Is something going wrong?
This is usually a sign that a block/page is not completely erased
when Yaffs believes it to have been so. It is 'sufficiently
erased' to make yaffs think the block/page/chunk is
unallocated -- the tags are all ones. Problem comes when MTD
writes to the non-erased page and the data is subsequently read
back.
Why wasn't the page fully erased? Are you power cycling during
test? If so, the NAND chip may be failing to fully erase a
block. Neither Yaffs or MTD cover this situation sufficiently.
It really requires hardware support -- asserting the NAND write
protect signal before the power rail goes too low for the NAND
chip to fully complete an erase.
We have seen this problem in the field - a full erase of the NAND
block(s) in question normally resolves the issue. There is no
need to truly 'retire' the block (i.e. mark it bad in the OOB).
Yaffs will not use the block for further allocation until the
filesystem is remounted. Then is may find the same problem
again, that is, until the block is fully erased -- and this may
not happen without intervention.
Ideally, Yaffs would collect up all chunks it can read/correct in
the block and copy them to another block, then erase the flaky
block.
-imcd