Author: Charles Manning Date: To: yaffs Subject: Re: [Yaffs] Question about garbage collection and delete headers
Steve
Comments inline.
On Monday 15 February 2010 15:57:07 Steve Zook wrote: > On Thursday 11 February 2010 11:29:59 Steve Zook wrote:
> >> I'm a little murky on an algorithmic aspect of the yaffs2 and I was
>
> hoping
>
> >> to get educated. I've read all the theory materials but could not find
> >> an answer to the question.
> >>
> >> When an object gets deleted, a new header gets written to media to mark
>
> the
>
> >> object as deleted. The deleting header should always be, sequence number
> >> wise, last for that object. When the block containing the deleting
> >> header gets garbage collected, how does it get decided whether to copy
> >> the deleting header or to drop it?
> >
> >This is different for the yaffs1 vs yaffs2 modes of operation. From what
> >you're saying it looks like you're talking about the yaffs2 mode of
> >operation.
>
> Yes, indeed I'm talking about the yaffs2 mode. The yaffs1 mode easily
> handles this kind of thing by marking all obsolete chunks (object headers
> and data). Obsolete things never get copied and are skipped during
> scanning, case closed.
>
> >yaffs also tracks the number of data chunks per file. When that number
>
> drops
>
> >to zero the header chunk becomes deleted and the gc will no longer copy
> > it. <snip>
> > <snip>
>
> An example of a case I'm concerned with (a bit overly simplified for
> illustration) is:
>
> Create a file (with object header as last chunk in block A).
> Write data (all data chunks in block B).
> Close file (new object header written as last chunk in block B).
> Delete file (new object header is first chunk in Block C).
>
> At this point, only the object header in C is active, all older object
> headers and data chunks are inactive.
>
> Garbage collect and/or Erase block B. Block B can simply be erased
> because there are no active chunks in it (I don't know whether yaffs2 would
> erase this block immediately after it writes the header in block C, or
> whether it waits till a garbage collection pass). The erasure happens as soon as the number of pages in use drops to zero and
the block state is FULL, but as you say, that does not really matter.
> Either way, the data
> chunks and object header are not copied because they're inactive.
> nDataChunks for the file goes to 0.
> Garbage collect block C.
> Power fail.
>
> Does the deleting object header in block C get copied during the block C
> garbage collection? If not, a zero length file will get resurrected after
> the reboot because of the left over object header in block A. But if I
> understand the algorithm outlined in your reply, the object header in block
> C will not be copied because there are no unerased data chunks.
>
> In other words, I don't understand how the copying decision can be properly
> made when only the data chunks and the most recent object header are
> tracked while older, inactive, object headers (potentially many) are not.
>
OK to summarise the problem:
Block A still contains an old object header and you're concerned that if we
erase the block C object header then we forget that we deleted the file and
the block A header comes back to life as a zero length file. That's the
simple case with many more "interesting ones".
The missing ingredient is the "shrink header" mechanism. This is actually used
for two purposes:
1) Handling the problem you mention here (or more complex variants).
2) Preventing forgetting about a file truncate down.
An object header with a truncate-down or a deletion is marked with the "shrink
header" marker. Any block containing a shrink header will not be garbage
collected until all lower sequence number blocks containing inactive chunks
have been collected.
Now:
* Block A has a lower sequence number than block C.
* Block A has at least one inactive chunk (the object header).
* Block C contains at least one shrink header (the object header we're now so
friendly with).
Therefore block C is barred from gc until block A has been gc'd. The gc of
block A will throw away the object header. Therefore by the time block C gets
gc'd the above problem won't exist.
Therefore problem handled.
The downside of this mechanism is that it can force some garbage collection
sequences which are less than ideal (ie. those with a lower reclaim rate than
we'd like to see). I have some experiments in progress to try handle shrink
headers better but there are some tough corner cases to handle....