I thought posting this discussion to the YAFFS list would perhaps be helpful 
to some folks.

Regards

-- Charles


On Tue, 21 Jan 2003 14:12, Zastrow, Steve (MED) wrote:
> Charles,
>
> Thanks for the information, it was very helpful.  In fact, it was so
> helpful that it has spawned another question (sorry) regarding
> journalling.
>
> First, here is my basic idea of a journalling filesystem:  A journalling
> filesystem "journals" any write requests that it receives so that if
> there is an unexpected reboot or power failure while a write is in
> progress, the file system on startup can complete any incomplete
> journalled write requests, and the files can be returned to a valid
> state.

The basic idea of a journalling fs is that it does not try to replace data in
the fs directly, instead it write "update records". Thus in the event of a
power failure etc, the fs state can be recovered by rolling forward from a
known point by applying these patches.

> The problem with this, of course, is that the file system doesn't really
> know what a "valid" state is.  For example, if a particular file has a
> checksum value that must be updated any time there are changes to the
> rest of the file, the file system wouldn't know that the file has to be
> left with a valid checksum when all of the journalled write requests
> have been processed.  Somehow, the file system would need to know that
> it isn't appropriate to apply changes to a file unless there is a
> corresponding update to the checksum value.  Stated more generally, the
> filesystem would need to know that some sets of writes need to be
> performed as a group and that if not all of the writes can be performed
> then none of the writes should be performed.

This is really "transaction awareness" and is a common achilles heal for
databases on top of file systems.

Ok, I don't really know enough about JFS, ext3 or XFS, so I will fast-forward
past these and get on to YAFFS.

YAFFS takes journalling to the extreme. Every bit of data in a YAFFS fs is in
the form of a fixed size "chunk". A chunk == a NAND page for now. Each chunk
has tags, most important of which are object id and chunk Id (ie offset of
the chunk in the file). The YAFFS media is therefore just a big bag of these
records. When the fs is booted, YAFFS scans the media and looks at each
chunk's tags. From this, the file system state is rebuilt.

As you write to the file, new chunks are added (and overwritten chunks are
deleted). There is obviously also some gc in there too.

The tags also include a serial number (2 bits). When data is overwritten, the
new data chunk is written first before the old data chunk is deleted. This
means we don't get left with holes. If the power was to go down halfway, and
both chunks were present then we use the serial number to decide which one to
use.

> My question for you is: How does the file system know to treat several
> writes together as a single unit so that it will either apply all of
> them or none of them?  Does it know to group writes on the basis of when
> flushes occur?

YAFFS does not support roll-back/forward committing (ie. it is not a
transaction oreinted fs) and this would be really difficult to add to current
YAFFS (for reasons that need some seious indepth understanding)  THis could
potentially be added to YAFFS2 (in the pipe) quite simply because YAFFS2
provides more info on the order of events and we have flexibility to eg. roll
foward to the last flush and ignore stuff after the last flush. YAFFS2 should
be around in approx a month or so.

At present interrupting the sequence below will result in only some of the
writes happening.

However, since YAFFS has no FAT or such structure stored on disk, you will
not get a corrupt file system (like you could get under FAT or ext2).

> To restate it a different way, if the system rebooted between steps 6
> and 7 below, would it complete "data update 3" and "data update 4" even
> though the requests haven't been flushed yet, or would it skip them
> because a flush had not occured yet to indicate a complete set of
> writes?
>
>       1: fopen(...);  // open file
> 	2: fwrite(...); // data update 1
>       3: fwrite(...); // data update 2
>       4: fwrite(...); // checksum update 1
>       5: fwrite(...); // data update 3
>       6: fwrite(...); // data update 4
>       7: fwrite(...); // checksum update 2
>       8: fclose(...); // close file and flush buffer
>
> My reasons for asking this are that I'm wondering if we need to use
> flushing as the basis for "grouping" sets of writes that need to be
> completed together.  If so, then I need to notify my team to use
> flushing in this manner.  If not, and there is no way of grouping writes
> together, then we will probably need to add additional protection on top
> of journalling.

Generally, flushing is a good idea since that commits the file to media. I
assume you're using Linux. If you are, and your writes are not huge, then
likely the writes will be going into the cache and be flushed out when you
call flush. This would indeed reduce the likelihood of a corruption. You
could do some tracing to confirm this.

I assume you're doing some sort of database. Generally it is a good idea to
structure that database so as to be able to recover from the situation where
you have incomplete transactions.

-- CHarles

> Thanks very much for the help,
>
> Steve
>
>
> -----Original Message-----
> From: Charles Manning [mailto:manningc2@actrix.gen.nz]
> Sent: Monday, January 20, 2003 3:03 PM
> To: Laurie van Someren; Zastrow, Steve (MED)
> Subject: Re: Inquiry About YAFFS
>
>
> Steve
>
>
> YAFFS is intended for embedded systems and is not really a hard disk
> replacement. Hard disk is cheaper per byte and faster for many systems.
> However, there are many situations where you don't want the size, noise
> and
> power consumption of a hard disk (eg. set top box,  hand held devices
> etc).
>
> I will try answer your questions...
>
> > > - Does it do load leveling?
>
> I assume you mean "wear levelling" (ie. moving stuff around to ensure
> that
> the flash does not only get erased in one place).
> Currently, YAFFS does not do any explicite wear levelling. However, the
> way
> that blocks are allocated and freed does cause some wear levelling to
> happen.
> I will be adding some explicite wear levelling in the future (maybe the
> next
> couple of weeks).
>
> Even without explicite wear levelling, this does not seem to be an
> issue. If
> a block wears out, then it is retired. The journalling strategy used in
> YAFFS
> means that there is no position awareness (ie the data does not have to
> be in
> a particular physical area of the NAND). NAND, anyway, is shipped with
> bad
> blocks so YAFFS must cater for these anyway (as well as blocks going bad
>
> during run-time). I have done accelerated lifetime testing on YAFFS. One
>
> device has had over 200GB of data written to it with no failures.
>
> > > - Is it smart enough not to erase and rewrite a block that contains
>
> the
>
> > > exact same data that is about to be written?
>
> YAFFS does not really work like this. When you write to a file is writes
> what
> you ask to be written. Unlike FAT file systemns, this does not incur any
>
> performance issues. YAFFS outperforms any other file system on NAND.
>
> > > - Will it allow multiple reads at the same time?
>
> Yes, from the applicaition level. Internally though, YAFFS is locked so
> that
> only one thread is in YAFFS at a time. This is done to simplify the code
> (and
> hence keep it robust). This does not impact on performance since almost
> all
> YAFFS operations are very fast. Most of the time in a YAFFS operation is
>
> actually transferring the data off/on to NAND. Only one area of NAND can
> be
> accessed at a time.
>
> > > - Will it allow reading and writing at the same time?
>
> Yes, as per above.
>
> > > - Will it allow multiple writes at the same time?
>
> Yes, as per above.
>
> > > - Does it cause the application to block while you are reading?
>
> Yes. All file systems will do that. They have to block until the data is
>
> available to be able to return from the read (unless the data is already
>
> cached).
>
> > > - Does it cause the application to block while you are writing?
>
> Yes. Write caching does speed things up though.
>
> > > - What are the important differences in performance to be expected
> > > between the flash driver and a real physical disk drive?
>
> The most important differences are really:
> * NAND bandwidth is lower than a hard drive. Thus, reads and writes are
> slower.
> * However, there is no mechanical seek time for NAND, therefore random
> reads
> can be faster.
>
> > > Would it be possible to have someone respond to these?
>
> Please let me know if there are any further issues.
>
> -- CHarles

-------------------------------------------------------

---------------------------------------------------------------------------------------
This mailing list is hosted by Toby Churchill open software (www.toby-churchill.org).
If mailing list membership is no longer wanted you can remove yourself from the list by 
sending an email to yaffs-request@toby-churchill.org with the text "unsubscribe" 
(without the quotes) as the subject.