> -----Original Message-----
> From: Charles Manning [mailto:manningc2@actrix.gen.nz]
> Sent: Sunday, December 04, 2005 9:55 AM
> To: yaffs@stoneboat.aleph1.co.uk
> Cc: Lawson.Reed
> Subject: Re: [Yaffs] YAFFS2 and kswapd dead lock problem
>
>
> On Saturday 03 December 2005 07:56, Lawson.Reed wrote:
> > Found it.... took me 5 work days.
> > And this deadlock issue IS in the YAFFS2 tip code on CVS here:
> >
> http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?
> rev=1.34&view
> >=auto
> >
>
> Thanks for this huge effort Reed.
>
> I am quite surprised that someone else has not stumbled on
> this either.
>
> Does it only impact on 2.4.x kernels or also 2.6?
>From looking at the inode.c in 2.6 I see they are still doing the
wait_queue thing on the I_FREEING flag just the same as 2.4.
So, this deadlock issue should also be in 2.6 kernels. I do not
have a system running 2.6 so I can not verify this.
It only occurs under heavy multi thread load on the yaffs2 fs.
It is fairly infrequent and to troubleshoot it, I had to write
a torture test program.
> > So, no one has seen this???
> >
> > Here is what is happening:
> >
> > Process 'A' grabs the YAFFS2 grossLock.
> > Process 'B' preempts and it's job is to free unused inodes
> everywhere.
> > (hint: 'B' is kswapd). So, 'B' sets I_FREEING. Then it calls
> > yaffs_clear_inode() which needs the grossLock. So, it goes
> > on the wait queue because 'A' has the grossLock.
> >
> > Now process 'A' runs. It's holding the grossLock. It calls
> > yaffs_get_inode() which calls BACK UP to iget()... With
> > the grossLock held! That calls find_inode(). It finds
> > I_FREEING set and then gets put on a wait queue in
> > __wait_on_freeing_inode().
> >
> > Presto chango deadlock.
> >
> > So, my solution is to make sure the grossLock is not held when
> > calling yaffs_get_inode(). Plus, I added grossLocking to
> > yaffs_read_inode() since NB's comment in there is no longer
> > true.
> >
> > I ran my 20 thread torture test which usually deadlocks in under 30
> > seconds. It ran overnight with this fix. The test found no compare
> > errors in the 20 files that it reads and writes at random times with
> > random data and random lengths.
> >
> > So, I strongly suggest that someone close to the YAFFSs
> effort review
> > this change and incorporate it. I am kinda new to all this and I'm
> > not even sure what the correct way to submit the changes are.
> > So, let me know how I can help.
>
> Did you mean to include some code?
No.
> The easiest is to send something in patch form using "diff
> -Naur old new", but others are fine too.
OK, how is this? Warning... I did not even compile this in the
tip YAFFS2 code because I do not have a 2.6 tree. So, I copied my
changes from my 2.4 tree into the tip yaffs2/yaffs_fs.c code.
So, please review it carefully...
I started with this:
http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?rev=1.34&view=auto
File: [Development] / yaffs2 / yaffs_fs.c (download) (as text)
Revision: 1.34, Mon Nov 14 21:00:54 2005 UTC (2 weeks, 3 days ago) by charles
Thanks,
- Reed.
--------------------------------------------8<--------------------------------
diff -Naur yaffs2_old/yaffs_fs.c yaffs2/yaffs_fs.c
--- yaffs2_old/yaffs_fs.c 2005-12-05 09:53:10.000000000 -0800
+++ yaffs2/yaffs_fs.c 2005-12-05 10:15:32.000000000 -0800
@@ -310,6 +310,9 @@
obj = yaffs_GetEquivalentObject(obj); /* in case it was a hardlink */
+ // can not hold grossLock when calling yaffs_get_inode()
+ yaffs_GrossUnlock(dev);
+
if (obj) {
T(YAFFS_TRACE_OS,
(KERN_DEBUG "yaffs_lookup found %d\n", obj->objectId));
@@ -325,8 +328,6 @@
/*dget(dentry); // try to solve directory bug */
d_add(dentry, inode);
- yaffs_GrossUnlock(dev);
-
/* return dentry; */
return NULL;
#endif
@@ -336,7 +337,6 @@
T(YAFFS_TRACE_OS, (KERN_DEBUG "yaffs_lookup not found\n"));
}
- yaffs_GrossUnlock(dev);
/* added NCB for 2.5/6 compatability - forces add even if inode is
* NULL which creates dentry hash */
@@ -725,6 +725,8 @@
/* NB Side effect: iget calls back to yaffs_read_inode(). */
/* iget also increments the inode's i_count */
+ // RL AND you better not be holding the grossLock, or dead lock with wait_on_freeing_inode()
+ // will occur because of kswapd (and other things)!!
return inode;
}
@@ -941,6 +943,9 @@
break;
}
+ // can not call yaffs_get_inode() with grossLock held.
+ yaffs_GrossUnlock(dev);
+
if (obj) {
inode = yaffs_get_inode(dir->i_sb, mode, rdev, obj);
d_instantiate(dentry, inode);
@@ -954,8 +959,6 @@
error = -ENOMEM;
}
- yaffs_GrossUnlock(dev);
-
return error;
}
@@ -1238,6 +1241,7 @@
{
/* NB This is called as a side effect of other functions and
* thus gross locking should always be in place already.
+ * RL Put the grossLock back in because of dead lock issue.
*/
yaffs_Object *obj;
@@ -1246,10 +1250,14 @@
T(YAFFS_TRACE_OS,
(KERN_DEBUG "yaffs_read_inode for %d\n", (int)inode->i_ino));
+ yaffs_GrossLock(dev);
+
obj = yaffs_FindObjectByNumber(dev, inode->i_ino);
yaffs_FillInodeFromObject(inode, obj);
+ yaffs_GrossUnlock(dev);
+
}
static LIST_HEAD(yaffs_dev_list);
@@ -1478,13 +1486,13 @@
("yaffs_read_super: guts initialised %s\n",
(err == YAFFS_OK) ? "OK" : "FAILED"));
+ // Can not call yaffs_get_inode() with the grossLock held.
+ yaffs_GrossUnlock(dev);
+
/* Create root inode */
if (err == YAFFS_OK)
inode = yaffs_get_inode(sb, S_IFDIR | 0755, 0,
yaffs_Root(dev));
-
- yaffs_GrossUnlock(dev);
-
if (!inode)
return NULL;
--------------------------------------------8<--------------------------------
>
> Thanx
>
> Charles
>