> A quick follow up...
> 
> To summarise what seems to be going on, the problem is occuring 
> because there 
> is an inode in the cache for an obect that no longer exists in 
> YAFFS. When 
> asked to create a new object, YAFFS choses an objectId (same as an inode 
> number) that is the same as the one in cache. This means that 
> when iget() is 
> called, no callback happens to yaffs_read_inode and the new 
> object's info is 
> not associated with the inode in the cache and we get an inconsistency.
> 
> This happens very infrequently because the bucket size is 
> relatively large 
> (256), though Michael has made a useful test case below.  It occurs to me 
> that by changing the bucket size to a smaller power of 2 (say 4), it will 
> become easier to force the issue.
> 
> Michael has also provided a patch that would seem to get around 
> the problem 
> by aborting an object creation when the inode refernce count is too high 
> (which means that this problem would have occurred). While this 
> hack would 
> seem to work, I think it is, as Michael says, a "dirty hack" and 
> cause extra 
> writes to flash etc.
> 
> I would prefer to do one of the following:
> 1) At the time of generating the objectId (inode number) for a 
> new object, 
> first check that the object does not relate to an existing inode 
> in the cache 
> and don't allocate that number if there is a conflict.
> 2)Don't just rely on the callback to yaffs_read_inode to fill out 
> the inode 
> details. Also fill them out  for other cases. The problem with 
> this is that 
> we then end up with the thing in the cache being revalidated and 
> cross-linked 
> to a differnt object which seems rather unhealthy to me! So I don't think 
> this will work.
> 3) Stop trying to reuse object ids. Instead of just always trying 
> to reuse 
> the lowest value objectId in any bucket, we can rather keep allocating 
> upwards and wrap around when the objectId space is depleated (18 bits == 
> 0x40000). While this would not absolutely guarantee we don't get 
> reuse, it 
> would reduce the odds by a significant amount.
> 4) When we delete an object keep the object id for that object 
> "in use" until 
> the last iput releases it from the cache.
> 
> While (3) will likely work very well it does leave a bitter taste in the 
> mouth. I'd prefer (4) and (1) in that order. I don't trust (2), 
> so dismiss 
> that immediately.
> 
> Comments/thoughts more than welcome.
> 
> -- Charles

I entirely agree with Charles. 

One (possibly dim) query. Why is the reference count not zero anyway once the object is deleted? Does the vfs hold an additional reference? As you say (4), unless it is zero then it should not be considered not to be in use.

(3) is just not good enough especially for TCL who are using yaffs in the medical/assistive market. It will also mean that any errors/failures of a system will leave a "what if its that yaffs bug again?" uncertainty lurking.
(1) is ok. Reading/writing NAND is the bottleneck so a cache search won't add anything too noticeable I guess.

Nick