[Yaffs] Some clarifications of AUTOPLACE with mtd & YAFFS2

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: yaffs, Vitaly Wool, Thomas Gleixner
Subject: [Yaffs] Some clarifications of AUTOPLACE with mtd & YAFFS2
Hi all

I would like to let the vitriol of the last few week or so slide, but a lot of
it contained some technical content that was wrong and thus I'd like to
straighten out a few points on YAFFS2 and AUTOPLACE, hopefully in a
reasonably coherent fashion.

Three main assertions that have been made are wrong:
1) mtd interfaces are "golden" and cannot be changed. Thus YAFFS must go
through any required tricks to make a working solution.
2) That it is both possible and wise to attempt this.
3) I have not a clue what AUTOPLACE is all about and how it should work.

Below I make some comments on what Thomas has done. I am not levelling blame
at him. He is a very busy man and works to a set of priorities that are not
always what everyone else wants (besides, it is all open source so someone
else could also jump in and fix ;-)). I have the highest regard for Thomas'
knowledge and work.


A POTTED HISTORY OF THE AUTOPLACE NAND INTERFACE

Those that have been around YAFFS a long time will remember the original
integration problems because YAFFS wanted to see things that mtd was not
making available, in particular the oob info. To get this working, the first
releases of YAFFS needed mtd patches to work. At that time, mtd nand support
was very early days and soon the patch was no longer required as the mtd
layer came up to speed (damn good effort from Thomas).

YAFFS1 is designed for Smartmedia data layout and thus there are the
hard-wired yaffs_Spare structures etc. Thus, when more, and different flash
layouts were introduces, things like the oob_sel were introduced to allow the
generic nand code to work with various byte-layouts and ECC mechanisms.
oob_sel has also changed a few rimes. YAFFS1 suffered some minor traumas due
to this, but nothing too bad.

When I started designing YAFFS2, about 2.5 years ago now, one of the main
goals was to be able to support a much wider range of NAND devices and work
with different hardware ecc etc. So, one of the first areas that was
designed was the NAND interface. Clearly fixed binary structures would not
work properly and a level of abstraction was required. This drove the move
from yaffs_Spare to yaffs_ExtendedTags and the packed tags mechanisms.

As things slowly unfolded, I realised that it would be a better thing to get
buy-in from Thomas sooner, rather than layer, as to how to progress 2k and
other support in mtd. It is far better to get a sensible interface in place
earlier than try to retro-fit things later. As YAFFS is a customer to those
interfaces, it made sense to get involved and discuss things. So in late
2003, Thomas and I started some discussions on this subject. [Much of the
discussion was on IRC, and some was email. I deleted some of my old email,
but I still have some stuff I sent in October 2003]. My starting point was a
more abstract NAND interface that did not require any knowledge of actual
byte placement. [See below for a technical outline of the rationale]. Thomas,
being a very knowledgeable fellow, also brought along a bunch ideas - many of
them on the same wavelength. The results of all this were a definition of
the functionality, but not an absolute function call definition, of AUTOPLACE
and abstract bad block handling.

I then continued with YAFFS2 development, outside of Linux, and a YAFFS2
prototype was being stress tested by Christmas 2003. In January it was being
stress tested on large arrays of 2k page NAND. Note that this was a stripped
YAFFS1 and did not support simultaneous yaffs1/2 functionality or yaffs1
compatability.

In approx April 2004, Thomas started working on the AUTOPLACE and bad block
handling. IIRC, this was all in place by the end of May.

I then set about retooling yaffs2 to support both yaffs2 and yaffs1 formats
through a backward compatability layer (tags compatability) - in essence a
fusion of both the YAFFS1 and YAFFS2 prototype. This effort got somewhat
delayed by me taking some time off from YAFFS for personal reasons.

When I started checking out the YAFFS2<->mtd interfacing I used a small ram
emulation driver that I hacked up quickly. This conformed to the interface
that Thomas and I agreed on.

Then YAFFS2 was released on the world. A few people picked it up and started
playing. There were a few things that were patched that fixed some of the
fusion problems (eg. Nick Ban'es patches for compatability mode).
Unfortunately it takes time before people switch over and start testing with
the 2k page devices. Some of the bugs discovered were due to some problems in
the mtd not doing the AUTOPLACE properly (!!shock! horror! Thomas is
human!!). JFFS2 had not shown this because it does not rely on oob to the
same extent. This was discussed on the list sometime in May 2005. Some
aspects of this were corrected, but it is not yet fully sound.

Since then, the AUTOPLACE thing has been somewhat of an open wound, awaiting
resolution. During that time, pragmatic people have worked around the
problem, however it has not been cured yet.

The good news though is that a fine fellow by the name of Vitaly Wool has
started look at at some issues and clearly identifies the problem (see
http://lists.infradead.org/pipermail/linux-mtd/2005-September/013949.html). I
therefore hope for some resolution to this pretty soon. If Vitaly's
suggestions come into being then the current code in YAFFS2 will work as it
is with no modifications.

So that is pretty much a history of the AUTOPLACE business. Things don't
always progress the way one hopes, and looking back I'd have done some things
a bit differently.

As to these mtd interfaces being golden kernel interfaces: Not many people
think so. Thomas doesn't. grep a whole Linux+yaffs source tree for read_oob
and you'll only see references in mtd and yaffs. It is not like we're saying
"change kalloc".

There is no immediate solution, but it looks like the proper solution should
be there soon. I see Vitaly has proposed some patches. Thomas has asked me to
look at them and comment.

-- CHarles



[Technical side bar: Why the correct solution is to fix it in mtd]

There are many NAND types and implementations of hardware ECC etc, which means
there are a lot of different ways to to bad block management and many
different byte layouts on NAND. For instance:
1) The default nand_base uses bytes 0 and 1 for bad block marking.
2) The HW_ECC for the S3C2410 uses these bytes for ECC.
3) Some, perhaps not designed yet, hardware might do something entirely
different.

Further, one NAND driver might be serving up data to many file systems, so it
is unreasonable to have a file system's preferred binary layout This might
even be impossible with some hardware anyway.

One of the fundamental tenets of Computer Science is to use abstract
interfaces to hide detail. It is this thinking which lead to AUTOPLACE and
the abstract bad block handling interface. This is nothing new and examples
of abstract interfaces abound.

So the AUTOPLACE handling is there to provide a mechanism for abstracting away
the physical location of oob bytes. It says "here are some bytes, save them
whereever. When I ask for them back, then give them back. What every you do
behind thescenes I don't want to know".

With an abstract interface, we can get around a lot of problems quite neatly:
1) Changes in mtd don't mean changes to YAFFS, or other file systems.
2) Implementation of some funky handware can be done without implementing new
fields in oob_sel structures and having to fiddle multiple bodies of code.
Optimal handling of data (for example special hardware tricks can be
exploited).
3) Change in one place, test in one place, then it should "just work".

If we don't use an abstract interface then we get into a world of pain and we
end up with code looking like
http://www.aleph1.co.uk/pipermail/yaffs/2005q4/001581.html, which still does
not handle all cases. This raises a bunch of problems including, but not
limited to:
1) Replicated code. The algorithms are already in mtd so we should rather use
them there.
2) mtd interaction. mtd changes. oobsel will most likely change again (it is
now in about its third or fourth iteration). Don't want to have to change
YAFFS and put in conditional code for even more changes.
2) Maintenance woes: We want to test YAFFS *once* and not have to test it
against a zillion different hardware types etc and handle all the associated
patches etc. When a problem is found, strict adherence to interfaces helps
isolate problems.
3) If someone makes some new hardware/drivers that do not use oobsel, then we
don't want to have to have to expose more detail to the outside world.
4) It is insane to use an abstract AUTOPLACE interface for some accesses, and
not for others. It is important to use abstract interfaces consistently
otherwise why have them at all? {Analogy: When you write data to a serial
port driver, you use some relatively abstract interface like write_byte(),
you don't do something like outp(dev->uart->tx,b).]


-- Charles