[Yaffs] More compact extended tags for yaffs2

Attachments:
Message as email (text/plain)

Author: Charles Manning
Date:
To: yaffs mail list
Subject: [Yaffs] More compact extended tags for yaffs2

NB The following discussion only applies to using yaffs2 format. It does not
apply to yaffs1 format.

I know that Vitaly has asked for more compact tags for yaffs2 and I have
figured out a scheme that will work down to approx 15 or so bytes per OOB
area.

I propose to have a scheme whereby the current tags structure is used if there
is space in the OOB and switch over to a more compact form if the space does
not allow.

The compact form will provide all the features of the normal form, except for
very large files (>16Mbytes) when extended tags will not be written and the
normal tags will be used instead.

Normal tags used "chunkId == 0" to signify an object
header. The object header is then read to extract various information (object
type, file size, directory for file etc). Having to read this information is
costly (costs a whole NAND page read), so we introduce the idea of extended
tags. Instead of using "chunkId == 0" to signify an object header, we use
just a single bit. We can then reuse some of the tags space to store some
extended information, thus reducing the amount of stuff we read and thus
reducing the mount/scan time.

At present, the tags use
    unsigned sequenceNumber;
    unsigned objectId;
    unsigned chunkId;
    unsigned byteCount;
    eccOther ecc_on_tags/* another 12 bytes or so depending on architecture */

That adds up to approx 28 bytes of tag info. When used as extended tags, the
fields get reused.

This can be greatly reduced by reducing the storage for some of these items.
For example:

    4 bytes sequenceNumber
    3 bytes objectId
    3 bytes chunkId
    2 bytes byteCount
    2 byte extra for extended tags
    2 bytes ecc_on_tags

would give us 16 bytes of tags.

Some sizing info:
* sequenceNumber is incremented every time a block is written so that we can
determine the sequence of writing the blocks. Therefore, the range needs to
be large enough to cover the number of blocks written during the product's
lifetime. Could possibly shave this down to 3 bytes for most applications.

*objectId needs to be big enough to hold the number of objects in the system
at any time. 2 bytes is probly enough for most systems, but 3 bytes is
definitely safe.

* byteCount. This is normally used to save the number of bytes used in a data
chunk. For 2k pages, a range that can hold 2k would be enough, but there are
folks out there using 4k and greater chunks. 2 bytes should be enough to
cover anyone.

* ECC storage requires 6 bits + 2 * bits required to hold the number of bytes.
So to hold an ECC for up to a 16-byte area needs 6 + 2 * 4 = 14 bits = 2
bytes.

When we're using extended tags we need to hold:
* Sequence number.
* Object Id.
* A bit to say that it is an extended tags structure.
* Some way to tell what kind of object it is (3 bits is enough)
* A bit to indicate shadowing.
* Parent object Id (3 bytes) (stored in chunkId)
* FIle length (if it is a file) (stored in byteCount + another byte)
* Equivalent object (if it is a hardlink) (stored in byteCounnt)

3 bytes is only enough to save an extended header for a file up to 16MB. For
files > 16MB we don't store extended headers. This would make the scanning
slower for these files, but hopefully there'd be very few of them.

By shaving objectIds down to 20 bits and a bit of fiddling we could probably
get the extended tags down to 14 bytes.

So.... what do folk think about:
1) What is the target number of bytes for compact tags?
2) What number of files & file size should the compact tags be designed for?

Any other comments welcome.

-- Charles

This message is part of the following thread:
	the complete thread tree sorted by date