> -----Original Message----- > From: Charles Manning [mailto:manningc2@actrix.gen.nz] > Sent: Sunday, December 04, 2005 9:55 AM > To: yaffs@stoneboat.aleph1.co.uk > Cc: Lawson.Reed > Subject: Re: [Yaffs] YAFFS2 and kswapd dead lock problem > > > On Saturday 03 December 2005 07:56, Lawson.Reed wrote: > > Found it.... took me 5 work days. > > And this deadlock issue IS in the YAFFS2 tip code on CVS here: > > > http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c? > rev=1.34&view > >=auto > > > > Thanks for this huge effort Reed. > > I am quite surprised that someone else has not stumbled on > this either. > > Does it only impact on 2.4.x kernels or also 2.6? >From looking at the inode.c in 2.6 I see they are still doing the wait_queue thing on the I_FREEING flag just the same as 2.4. So, this deadlock issue should also be in 2.6 kernels. I do not have a system running 2.6 so I can not verify this. It only occurs under heavy multi thread load on the yaffs2 fs. It is fairly infrequent and to troubleshoot it, I had to write a torture test program. > > So, no one has seen this??? > > > > Here is what is happening: > > > > Process 'A' grabs the YAFFS2 grossLock. > > Process 'B' preempts and it's job is to free unused inodes > everywhere. > > (hint: 'B' is kswapd). So, 'B' sets I_FREEING. Then it calls > > yaffs_clear_inode() which needs the grossLock. So, it goes > > on the wait queue because 'A' has the grossLock. > > > > Now process 'A' runs. It's holding the grossLock. It calls > > yaffs_get_inode() which calls BACK UP to iget()... With > > the grossLock held! That calls find_inode(). It finds > > I_FREEING set and then gets put on a wait queue in > > __wait_on_freeing_inode(). > > > > Presto chango deadlock. > > > > So, my solution is to make sure the grossLock is not held when > > calling yaffs_get_inode(). Plus, I added grossLocking to > > yaffs_read_inode() since NB's comment in there is no longer > > true. > > > > I ran my 20 thread torture test which usually deadlocks in under 30 > > seconds. It ran overnight with this fix. The test found no compare > > errors in the 20 files that it reads and writes at random times with > > random data and random lengths. > > > > So, I strongly suggest that someone close to the YAFFSs > effort review > > this change and incorporate it. I am kinda new to all this and I'm > > not even sure what the correct way to submit the changes are. > > So, let me know how I can help. > > Did you mean to include some code? No. > The easiest is to send something in patch form using "diff > -Naur old new", but others are fine too. OK, how is this? Warning... I did not even compile this in the tip YAFFS2 code because I do not have a 2.6 tree. So, I copied my changes from my 2.4 tree into the tip yaffs2/yaffs_fs.c code. So, please review it carefully... I started with this: http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?rev=1.34&view=auto File: [Development] / yaffs2 / yaffs_fs.c (download) (as text) Revision: 1.34, Mon Nov 14 21:00:54 2005 UTC (2 weeks, 3 days ago) by charles Thanks, - Reed. --------------------------------------------8<-------------------------------- diff -Naur yaffs2_old/yaffs_fs.c yaffs2/yaffs_fs.c --- yaffs2_old/yaffs_fs.c 2005-12-05 09:53:10.000000000 -0800 +++ yaffs2/yaffs_fs.c 2005-12-05 10:15:32.000000000 -0800 @@ -310,6 +310,9 @@ obj = yaffs_GetEquivalentObject(obj); /* in case it was a hardlink */ + // can not hold grossLock when calling yaffs_get_inode() + yaffs_GrossUnlock(dev); + if (obj) { T(YAFFS_TRACE_OS, (KERN_DEBUG "yaffs_lookup found %d\n", obj->objectId)); @@ -325,8 +328,6 @@ /*dget(dentry); // try to solve directory bug */ d_add(dentry, inode); - yaffs_GrossUnlock(dev); - /* return dentry; */ return NULL; #endif @@ -336,7 +337,6 @@ T(YAFFS_TRACE_OS, (KERN_DEBUG "yaffs_lookup not found\n")); } - yaffs_GrossUnlock(dev); /* added NCB for 2.5/6 compatability - forces add even if inode is * NULL which creates dentry hash */ @@ -725,6 +725,8 @@ /* NB Side effect: iget calls back to yaffs_read_inode(). */ /* iget also increments the inode's i_count */ + // RL AND you better not be holding the grossLock, or dead lock with wait_on_freeing_inode() + // will occur because of kswapd (and other things)!! return inode; } @@ -941,6 +943,9 @@ break; } + // can not call yaffs_get_inode() with grossLock held. + yaffs_GrossUnlock(dev); + if (obj) { inode = yaffs_get_inode(dir->i_sb, mode, rdev, obj); d_instantiate(dentry, inode); @@ -954,8 +959,6 @@ error = -ENOMEM; } - yaffs_GrossUnlock(dev); - return error; } @@ -1238,6 +1241,7 @@ { /* NB This is called as a side effect of other functions and * thus gross locking should always be in place already. + * RL Put the grossLock back in because of dead lock issue. */ yaffs_Object *obj; @@ -1246,10 +1250,14 @@ T(YAFFS_TRACE_OS, (KERN_DEBUG "yaffs_read_inode for %d\n", (int)inode->i_ino)); + yaffs_GrossLock(dev); + obj = yaffs_FindObjectByNumber(dev, inode->i_ino); yaffs_FillInodeFromObject(inode, obj); + yaffs_GrossUnlock(dev); + } static LIST_HEAD(yaffs_dev_list); @@ -1478,13 +1486,13 @@ ("yaffs_read_super: guts initialised %s\n", (err == YAFFS_OK) ? "OK" : "FAILED")); + // Can not call yaffs_get_inode() with the grossLock held. + yaffs_GrossUnlock(dev); + /* Create root inode */ if (err == YAFFS_OK) inode = yaffs_get_inode(sb, S_IFDIR | 0755, 0, yaffs_Root(dev)); - - yaffs_GrossUnlock(dev); - if (!inode) return NULL; --------------------------------------------8<-------------------------------- > > Thanx > > Charles >