[Yaffs] More compact extended tags for yaffs2

Wed Feb 15 23:24:24 GMT 2006

NB The following discussion only applies to using yaffs2 format. It does not 
apply to yaffs1 format.

I know that Vitaly has asked for more compact tags for yaffs2 and I have 
figured out a scheme that will work down to approx 15 or so bytes per OOB 
area.

I propose to have a scheme whereby the current tags structure is used if there 
is space in the OOB and switch over to a more compact form if the space does 
not allow.

The compact form will provide  all the features of the normal form, except for 
very large files (>16Mbytes) when extended tags will not be written and the 
normal tags will be used instead.

Normal tags used "chunkId == 0" to signify an object 
header. The object header is then read to extract various information (object 
type, file size, directory for file etc). Having to read this information is 
costly (costs a whole NAND page read), so we introduce the idea of extended 
tags. Instead of using "chunkId == 0" to signify an object header, we use 
just a single bit. We can then reuse some of the tags space to store some 
extended information, thus reducing the amount of stuff we read and thus 
reducing the mount/scan time.

At present, the tags use
	unsigned sequenceNumber;
	unsigned objectId;
	unsigned chunkId;
	unsigned byteCount;
	eccOther ecc_on_tags/* another 12 bytes or so depending on architecture */

That adds up to approx 28 bytes of tag info. When used as extended tags, the 
fields get reused.

This can be greatly reduced by reducing the storage for some of these items. 
For example:

	4 bytes sequenceNumber
	3 bytes objectId
	3 bytes chunkId
	2 bytes byteCount
	2 byte extra for extended tags
	2 bytes ecc_on_tags

would give us 16 bytes of tags. 

Some sizing info:
* sequenceNumber is incremented every time a block is written so that we can 
determine the sequence of writing the blocks. Therefore, the range needs to 
be large enough to cover the number of blocks written during the product's 
lifetime.  Could possibly shave this down to 3 bytes for most applications.

*objectId needs to be big enough to hold the number of objects in the system 
at any time.  2 bytes is probly enough for most systems, but 3 bytes is 
definitely safe.

* byteCount. This is normally used to save the number of bytes used in a data 
chunk. For 2k pages, a range that can hold 2k would be enough, but there are 
folks out there using 4k and greater chunks. 2 bytes should be enough to 
cover anyone.

* ECC storage requires 6 bits + 2 * bits required to hold the number of bytes. 
So to hold an ECC for up to a 16-byte area needs 6 + 2 * 4 = 14 bits = 2 
bytes.

When we're using extended tags we need to hold:
* Sequence number.
* Object Id.
* A bit to say that it is an extended tags structure.
* Some way to tell what kind of object it is (3 bits is enough)
* A bit to indicate shadowing.
* Parent object Id (3 bytes) (stored in chunkId)
* FIle length (if it is a file) (stored in byteCount + another byte)
* Equivalent object (if it is a hardlink) (stored in byteCounnt)

3 bytes is only enough to save an extended header for a file up to 16MB. For 
files > 16MB we don't store extended headers. This would make the scanning 
slower for these files, but hopefully there'd be very few of them.

By shaving objectIds down to 20 bits and a bit of fiddling we could probably 
get the extended tags down to 14 bytes.

So.... what do folk think about:
1) What is the target number of bytes for compact tags?
2) What number of files & file size should the compact tags be designed for?

Any other comments welcome.

-- Charles