Author: Charles Manning Date: To: yaffs Subject: Re: [Yaffs] when is the safe time to remove shrink hdr?
On Thursday 19 April 2012 23:50:21 Ezio Zhang wrote: > shrink hdr is used to identify file hole and deleted files.
> for the second situation,if a file deleted without using shrink hdr the
> file will appear in the next start.(howyaffswork.pdf says "Shrink headers
> are also used to indicate that a file has been deleted and that the record
> of the file deletion is not lost.")
> but what i am puzzling is that when it is safe to remove shrink hdr?
The best way to think about this to think about what a shrink header does and
the information it provides.
Typically we only care about the most up to date version of some information.
For example, if a file name is changed, then I only care about the new name
and not the old name. The old header with the old name can be erased any
time.
There are times when the history matters and this is when the shrink header
flag is used.
The first of those is file holes.
Consider the sequence:
Open new file.
Write 2MB of data.
Truncate back to 1MB.
Seek to 2MB.
Write 1MB of data.
The file should now contain 1MB of real data, a 1MB "hole", then 1MB of real
data.
So how do we go about remembering that the hole is there. The most recent
file header just tells us that the file is 3MB in size. If we only relied on
that information then it would not be possible to tell which data in the file
should be treated as part of the hole.
To this end, yaffs2 considers all the old headers too. After all, when a
header was written, that was the file size and any data beyond that limit
should be treated as deleted. Thus when the file header indicating the
truncation back to 1MB is seen, we now know that any data beyond 1MB should
be deleted.
That file header that tells us this is now old but still contains useful
information. We can't erase that information until the data chunks in the
hole have all been erased.
This is where shrink header markers come in. The shrink header marker is a
flag that tells us that the header cannot be erased (ie cannot be garbage
collected) until the data in the hole has been erased.
Now clearly it would be very time consuming and complex to track the location
of every chunk in every hole (there might, in theory, be thousands).
Instead we use a "trick". Since the data in the hole was written before the
shrink header, it must have an older sequence number. Since it was deleted,
the blocks holding that data must have some deleted chunks.
Therefore we can prevent the garbage collector from deleting the shink header
too early by ensuring that there are no older blocks with deleted data. That
is the function of the function yaffs_block_ok_for_gc().
Now a similar issue applies for deleted files. If we delete a file header for
a deleted file before the data chunks have themselves been erased, the data
chunks would be reconstructed into a file and placed in lost+found. Thus the
shrink header mechanism is used again.
The down side of shrink headers is that it can force the garbage collector to
ignore blocks with a lot of garbage (ie. fast blocks to gc). That is why I
added the feature to only use shrink headers for larger holes.
I think there are some improvements that can be made - particularly around
handling deleted files.