Re: [Yaffs] YAFFS readdir rewind causing endless listings

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Rong Shen
Date:  
To: yaffs
Subject: Re: [Yaffs] YAFFS readdir rewind causing endless listings
> On Friday 07 November 2008 04:23:39 Ian McDonnell wrote:
>> Andrew, Yang,
>>
>> On Wednesday 05 November 2008, Andrew McKay wrote:
>> > Yang Chen wrote:
>> > > Andrew,
>> > >
>> > > Thanks for your info. I will try the fix. Btw , have you
>> > > tested below fix with more cases?
>> >
>> > I've been running the Kernel on a few of our systems with this
>> > fix and haven't found any weird issues yet. It was more
>> > important that ls wasn't getting stuck in a loop. Our system
>> > would lock up,.
>> >
>> > > I searched archived mails and found an old mailloop
>> > > regarding the rewind change . Looks like that was for rm
>> > > -rf issues of busybox ...
>> >
>> > If rm -rf did leave something behind, that's less of a problem
>> > for us. At least we could modify our scripts to handle that
>> > case. Check for the existence of the directory after it's
>> > removal and try again.
>> >
>> > My coworker had done some digging for info for me and came up
>> > with this:
>> >
>> > --------------------------------------------------------------
>> >----------------- Some stuff from the posix standard:
>> >
>> > "If a file is removed from or added to the directory after the
>> > most recent call to opendir() or rewinddir(), whether a
>> > subsequent call to readdir() returns an entry for that file is
>> > unspecified."
>> >
>> > In other words, "readdir" is not expected to care if new files
>> > are added to a directory while it is running.
>> >
>> > <http://www.opengroup.org/onlinepubs/009695399/functions/readd
>> >ir.html>
>> >
>> > Rm is not guaranteed to succeed in the posix std.
>> >
>> > "If this fails for any reason, rm shall write a diagnostic
>> > message to standard error, do nothing more with the current
>> > file, and go on to any remaining files."
>> >
>> > <http://www.opengroup.org/onlinepubs/009695399/utilities/rm.ht
>> >ml>
>> >
>> > Also,
>> >
>> > "Some implementations do not permit the removal of the last
>> > link to an executable binary file that is being executed; see
>> > the [EBUSY] error in the unlink() function defined in the
>> > System Interfaces volume of IEEE Std 1003.1-2001. Thus, the rm
>> > utility can fail to remove such files."
>> > --------------------------------------------------------------
>> >-----------------
>> >
>> > So rm -rf leaving stuff behind doesn't break posix standards
>> > and is definitely less of an issue than readdir never
>> > returning NULL and putting applications into endless loops, or
>> > causing them to use up all the memory in the system (ls
>> > buffers everything before trying to write it out to the
>> > terminal).
>> >
>> > Andrew
>>
>> The problem IIRC is that Yaffs does not have a real directory
>> structure per se. Nearly all other filesystems have some form
>> of on-media directory structure. When reading a directory
>> (opendir/readdir) one normally moves over the entries in a
>> directory just like reading a file. When an entry is removed
>> from a directory, the slot occupied by the entry is left in
>> place, marked as free. The readdir operation simply skips over
>> the free slots.
>>
>> This is not how Yaffs works, Yaffs directories are more like
>> in-core linked lists. When a directory entry is removed, it is
>> removed from the list and all subsequent entries appear to move
>> up the list.
>>
>> When reading a directory, there is a notion of a current
>> position, a cursor -- this paradim does not map easily onto a
>> linked list. It works okay when the directory has a real
>> structure that still contains the slot that was occupied by a
>> removed entry.
>>
>> The earlier change to Yaffs, rewinding the 'cursor', was to help
>> fix a problem with 'rm -rf' and the like. If the 'rm' command
>> only performs a single pass over a directory, think linked-list,
>> then it may miss an entry if its 'cursor' is to the right of an
>> removed entry. As the 'rm' command 'unlink's entries, the list
>> shrinks and entries move to the left, but the 'cursor' keeps
>> advancing. The solution to 'rm -rf' problem is to repeatedly
>> pass over the directory until it stops succeeding at 'unlink'ing
>> entries. Apparently, busybox 'rm' does (did?) not do this.
>> Busybox is very often used with Yaffs on Linux, and but Yaffs is
>> perceived to be the odd one (filesystem) out, thus the push for
>> the earlier 'fix'. But Yaffs is not alone, there are several
>> filesystems that have similar issues -- usually caused by the
>> same underlying issue -- that directories may dynamic objects,
>> not simply flat files.
>>
>> Careful thought should be put into this issue and any changes.
>> Tweaking for one case may easily break another. Seams like both
>> Yaffs AND busybox could be improved here.
>>
>> -imcd
>
> Ian is correct there.
>
> The change was done to correct for the way busybox struggled to do an 'rm *'
> and this was a pragmatic change for people who were doing an rm * but then
> still having files left behind.
>
> The best solution is probably to handle the position in the same way that
> yaffs direct and WinCE do (done to fix the busy-box-like sparse rm *
> problem): attach a "cursor structure" to the next object in the list. See
> yaffsfs_DirectorySearchContext in yaffsfs.c.
>
> This means that if we get a dir rewind the directory starts from scratch, but
> if we get a second read, we can check that the offset matches the current
> cursor offset. The happy path is that it will and we just proceed from that
> point
>
> I think that will catch both corner cases (busybox rm and the big dir).
>
> Does anyone else want to give this a stab?
>
> -- CHarles
>


While I was debugging my issue posted in "a bug regarding when space
running out - shown in example",
I managed to narrow it down to the same issue described here. I'm
using 2.4.30 kernel with the latest Yaffs2
cvs check out, hopefully my discussion will still apply here.

My tracing showed the endless loop is caused by,

1. busybox calls to readdir()

2. libc (uclibc in my case) calls to __getdents() to fill the dirent
buffer, which is followed by a lseek() as it
wants to make sure the next call to __getdents() continues from the
previous read position.

3. lseek() hits the default kernel function default_llseek() as Yaffs
doesn't provide one. (it seems
generic_file_llseek() should be used, but it's slightly off-topic)

4. default_llseek() sets file->f_version to ++event, which causes it
no longer same as inode->i_version, which
then hits Todd's patch in yaffs_readdir() and causes yaffs_readdir()
to list the directory files from the beginning,
over and over again.

This issue only happens if the listed directory has too many files to
be returned in one yaffs_readdir() call.

f_version and i_version IIRC are used by some file systems to make
sure file->f_pos hasn't been changed and neither
the directory, so f_pos can be trusted and used in the following
readdir() call without being revalidated. It seems to me
you only need to do f_pos revalidating for stream based directory
implementation. In the case of Yaffs, f_pos is
just used to point to a position in the double link list, it's no
longer important (or critical) to track the f_pos change in
*_llseek() calls. We can still have i_version to track directory
changes, so we can reset it for the case of 'rm -rf'.

So here comes my proposal to solve the endless listing issue, It's
actually pretty simple - define a llssek() on directory
operation. What I did for my 2.4 kernel was just copying
generic_file_llseek(), deleting the line changing f_version, and
pasting
the rest to yaffs_fs.c, shown below.


@@ -157,6 +157,7 @@
 static int yaffs_sync_object(struct file *file, struct dentry *dentry,
                  int datasync);


+static loff_t yaffs_dir_llseek(struct file *file, loff_t offset, int origin);
static int yaffs_readdir(struct file *f, void *dirent, filldir_t filldir);

 #if (LINUX_VERSION_CODE > KERNEL_VERSION(2,5,0))
@@ -302,6 +303,7 @@
     .read = generic_read_dir,
     .readdir = yaffs_readdir,
     .fsync = yaffs_sync_object,
+    .llseek = yaffs_dir_llseek,
 };


 static struct super_operations yaffs_super_ops = {
@@ -951,6 +953,29 @@
     return nWritten == 0 ? -ENOSPC : nWritten;
 }


+static loff_t yaffs_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+    long long retval;
+
+    switch (origin) {
+        case 2:
+            offset += file->f_dentry->d_inode->i_size;
+            break;
+        case 1:
+            offset += file->f_pos;
+    }
+    retval = -EINVAL;
+    if (offset>=0 && offset<=file->f_dentry->d_inode->i_sb->s_maxbytes) {
+        if (offset != file->f_pos) {
+            file->f_pos = offset;
+            file->f_reada = 0;
+            //file->f_version = ++event;
+        }
+        retval = offset;
+    }
+    return retval;
+}
+
 static int yaffs_readdir(struct file *f, void *dirent, filldir_t filldir)
 {
     yaffs_Object *obj;



Comments?

Rong