On 3/22/07, Chris Paulson-Ellis <
chris@edesix.com> wrote:
>
> Also, my flash device supports an in-device block-to-block page
> copy command to speed up this recovery operation. Neither Linux MTD nor
> YAFFS support this concept as far as I can tell.
The problem with using the block-to-block operations I've seen, is that you
have no way to check the ECC of the data as you move it. Thus, you could
read a page with a single-bit ECC error from a page within the old block and
write it out with the same error to a new block. When you read the data from
the new block, you detect the error (if you're lucky), and declare the new
block ready for retirement... If you're NOT lucky, a page of data will sit
with its single bit ECC error unread long enough that a second bit will flip
by the time you finally read the block, giving you an uncorrectable memory
error (UCME).
To reduce the latency of UCMEs you could have a process read each page of
NAND at some very slow rate, looking for and correcting single bit errors.
The additional reads could, of course, increase the rate at which errors
crop up, due to NAND read-disturb effects. It'd be pretty straightforward
to create a Markov model of the system failures to determine the best rate
at which to scrub for CMEs. Given the wide range of NAND configurations,
this might would need to be a tuning parameter, unless the unavailability
curves are very flat in the region around the optimum value.
William
--
wjw1961@gmail.com
William J. Watson