[Yaffs] Object (file) creation fix seems to have problems

bbosch@iphase.com bbosch@iphase.com
Thu, 6 Jan 2005 01:35:31 -0600


Charles,

I recently updated my kernel to the latest CVS YAFFS code and
discovered rather serious filesystem corruption apparently triggered
by heavy file unlink and creation activity.  The symptoms are easily
reproduced by repeatedly extracting a tar archive containing several
files and symbolic links in an initially empty YAFFS file system.
Soon, tar reports "tar: Couldnt remove old file: Directory not empty"
for a random file which was not supposed to be a directory!  Other
symptoms are YAFFS errors which read "**>> yaffs chunk 792 was not
erased **>> yaffs write required 2 attempts".

After the errors, the filesystem shows corrupted directories with ls
output like:

~ # ls -l /mnt/bin
ls: /mnt/bin/=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF: No such file=
 or directory
ls: /mnt/bin/=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF: No such file=
 or directory
ls: /mnt/bin/=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF: No such file=
 or directory
ls: /mnt/bin/=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF
=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=FF=
=FF=FF=FF=FF: No such file or directory
lrwxrwxrwx    1 root     root           14 Dec  8 18:24 [ -> ../bin/bus=
ybox
lrwxrwxrwx    1 root     root           14 Dec  8 18:24 ash -> ../bin/b=
usybox
lrwxrwxrwx    1 root     root           14 Dec  8 18:24 awk -> ../bin/b=
usybox
lrwxrwxrwx    1 root     root           14 Dec  8 18:24 basename -> ../=
bin/busybox

Unmounting and remounting the file system seems to make the directory
corruption go away (at least most of the time).

My kernel is based on 2.4.26.  The architecture is ppc.  YAFFS is
running over a pretty stock MTD/NAND layer.

A condensed summary of configuration from my Makefile:

#USE_RAM_FOR_TEST =3D -DCONFIG_YAFFS_RAM_ENABLED
USE_MTD =3D -DCONFIG_YAFFS_MTD_ENABLED
#USE_OLD_MTD =3D -DCONFIG_YAFFS_USE_OLD_MTD
#USE_NANDECC =3D -DCONFIG_YAFFS_USE_NANDECC
#USE_WRONGECC =3D -DCONFIG_YAFFS_ECC_WRONG_ORDER
USE_GENERIC_RW =3D -DCONFIG_YAFFS_USE_GENERIC_RW
#USE_HEADER_FILE_SIZE =3D -DCONFIG_YAFFS_USE_HEADER_FILE_SIZE
#IGNORE_CHUNK_ERASED =3D -DCONFIG_YAFFS_DISABLE_CHUNK_ERASED_CHECK
#IGNORE_WRITE_VERIFY =3D -DCONFIG_YAFFS_DISBLE_WRITE_VERIFY
ENABLE_SHORT_NAMES_IN_RAM =3D -DCONFIG_SHORT_NAMES_IN_RAM

I have isolated the change which introduced this behavior to the CVS
changes made on 10/21/2004.  IE, "cvs diff -c -D 2004/10/20 -D
2004/10/21" will show the changes that seem to be causing the problem.
CVS 2004/10/20 seems to work fine and I would just drop back to that
revision, but, of course, that leaves the bug which Michael found to
bite me later.

I'm not familiar enough with the VFS layer to guess at the cause, but
this is quite reproducable.  Any ideas where to look?  Any suggestions
on narrowing this down to a specific VFS interaction?

BTW, my trek back thru CVS history might have been less confusing with
fewer "empty log messages".  :-)

Thanks,

--Brad Bosch


Quite some time ago, Charles Manning wrote:
 >=20
 > I have just checked in changes to yaffs_fs.c, yaffs_guts.c, yaffs_gu=
ts.h to=20
 > fix this problem.
 >=20
 > Now yaffs Objects in the object look up hash table are not freed unt=
il the=20
 > coresponding inode is cleared.
 >=20
 > I did some tests with a smaller bucket size (8) and observed that th=
e=20
 > recycling problem does not happen.  Object numbers are now only recy=
cled when=20
 > the Linux cache tells us it is OK.
 >=20
 > This mechanism does not use any new kernel calls and should thus be =
good with=20
 > older kernels.
 >=20
 > Thanx to Michael for his efforts in hunting down the problem.
 >=20
 > -- Charles
 >=20
 > _______________________________________________
 > yaffs mailing list
 > yaffs@stoneboat.aleph1.co.uk
 > http://stoneboat.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs