On Thursday 19 April 2012 23:50:21 Ezio Zhang wrote:
> shrink hdr is used to identify file hole and deleted files.
> for the second situation,if a file deleted without using shrink hdr the
> file will appear in the next start.(howyaffswork.pdf says "Shrink headers
> are also used to indicate that a file has been deleted and that the record
> of the file deletion is not lost.")
> but what i am puzzling is that when it is safe to remove shrink hdr?

The best way to think about this to think about what a shrink header does and 
the information it provides.

Typically we only care about the most up to date version of some information. 
For example, if a file name is changed, then I only care about the new name 
and not the old name. The old header with the old name can be erased any 
time.

There are times when the history matters and this is when the shrink header 
flag is used.

The first of those is file holes.

Consider the sequence:
Open new file.
Write 2MB of data.
Truncate back to 1MB.
Seek to 2MB.
Write 1MB of data.

The file should now contain 1MB of real data, a 1MB "hole", then 1MB of real 
data.

So how do we go about remembering that the hole is there.  The most recent 
file header just tells us that the file is 3MB in size. If we only relied on 
that information then it would not be possible to tell which data in the file 
should be treated as part of the hole.

To this end, yaffs2 considers all the old headers too. After all, when a 
header was written, that was the file size and any data beyond that limit 
should be treated as deleted. Thus when the file header indicating the 
truncation back to 1MB is seen, we now know that any data beyond 1MB should 
be deleted.

That file header that tells us this is now old but still contains useful 
information. We can't erase that information until the data chunks in the 
hole have all been erased.

This is where shrink header markers come in. The shrink header marker is a 
flag that tells us that the header cannot be erased (ie cannot be garbage 
collected) until the data in the hole has been erased.

Now clearly it would be very time consuming and complex to track the location 
of every chunk in every hole (there might, in theory, be thousands).

Instead we use a "trick". Since the data in the hole was written before the 
shrink header, it must have an older sequence number. Since it was deleted, 
the blocks holding that data must have some deleted chunks.

Therefore we can prevent the garbage collector from deleting the shink header 
too early by ensuring that there are no older blocks with deleted data. That 
is the function of the function yaffs_block_ok_for_gc().

Now a similar issue applies for deleted files. If we delete a file header for 
a deleted file before the data chunks have themselves been erased, the data 
chunks would be reconstructed into a file and placed in lost+found. Thus the 
shrink header mechanism is used again.

The down side of shrink headers is that it can force the garbage collector to 
ignore blocks with a lot of garbage (ie. fast blocks to gc). That is why I 
added the feature to only use shrink headers for larger holes.

I think there are some improvements that can be made - particularly around 
handling deleted files.

-- CHarles