Hi Charles, > Date: Fri, 4 Dec 2009 07:56:54 +1300 > > Hi Chris > Do you have a test case to get the alias into this state? eg. > ln -s "" xxx or some such? > > It would be nice to be able to make the problem appear and disappear on > demand. > No. Honestly, we have never been able to replicate it here. But I am going to try again now that I have a better understanding of the problem. Here is how I think it happens: 1. user deletes symlink 2. YFREE on .alias 3. hardware interrupt occurs 4. yaffs code is interrupted 5. ISR executes and ZMALLOCs or something similar, and returns 6. yaffs code is returned to 7. yaffs calls strncpy on 'scribbled' buffer that was YFREE'ed, Now, the next ScanBackwards() call will fail. But there is something I do not understand, and that is, ScanBackwards() is only called when yaffs checkpoint is invalid. I can see where the checkpoint is invalidated, and I can see that a call to put_super() causes it to be validated again. But, when does put_super() typically get called? (Sorry, I know this is a bit off topic, but I lack experience here, and the informatian may help me replicate this issue.) Note, if a hardware interrupt does not occur between 2 and 7, or if the ISR doesn't call MALLOC, The YFREE'd buffer will still have valid data, which is then written to disk, and the problem will not exhibit any symptoms. I think this is what usually happens. I do not think it will be simple to reproduce the problem. But, like I said I am going to try, and will report my results. Also note, we have never tried to set a symlink to an empty string value. I am reasonably certian all our symlinks were set to valid filenames. > As you point out, the primary problem is that CloneString(*str) returns NULL > if str is "". I think that's the bogus value you are talking about. > By my reading that's handled properly by the patch > http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_guts.c?r1=1.95&r2=1.96 > at around 2258. > @@ -2253,11 +2258,12 @@ > { > YCHAR *newStr = NULL; > > - if (str && *str) { > - newStr = YMALLOC((yaffs_strlen(str) + 1) * sizeof(YCHAR)); > - if (newStr) > - yaffs_strcpy(newStr, str); > - } > + if (!str) > + str = _Y(""); > + > + newStr = YMALLOC((yaffs_strlen(str) + 1) * sizeof(YCHAR)); > + if (newStr) > + yaffs_strcpy(newStr, str); > > return newStr; > > > Or am I missing something??? > Actually, I do not believe that is the primary problem. The primary problem is that a FREE'ed buffer's contents is written to disk. And values in that buffer could have been ANYTHING. It just happens to be the case I've seen it set to a zero length string once. So, for instance, the value in .alias yaffs_ObjectHeader structure on disk could be a NULL terminated string, or, even worse, it could be a bunch of non-zero values, and its even concievable that there is no NULL termination. If it is the latter case, very bad things could happen here when the yaffs_ObjectHeader (oh) is read from disk: yaffs_guts.c, towards end of yaffs_ScanBackwards() case YAFFS_OBJECT_TYPE_SYMLINK: if (oh) { in->variant.symLinkVariant.alias = yaffs_CloneString(oh->alias); if (!in->variant.symLinkVariant.alias) alloc_failed = 1; } break; and in yaffs_ObjectHeader alias is defined as: YCHAR alias[YAFFS_MAX_ALIAS_LENGTH + 1]; So let us assume oh->alias contains all 0xFF values with no NULL termination. A pointer to this is passed into CloneString(), then to strcpy. Now strcpy will copy past the end of .alias, and this is bad. Therefore, I think the oh->alias, when read from disk, for a deleted symlink, is best ignored. Both my patches do exactly this. To put this in perspective, this is only going to happen when the fs is generated by pre-'December 2009' version of yaffs, and the system is deleting symlinks, and there is an ISR or tasklet that likes to initialize buffers to non zero values. Rare, I admit. But it could happen. I would prefer to say it couldn't. Now, back to this statement: > As you point out, the primary problem is that CloneString(*str) returns NULL I am not convinced that CloneString(*str) returning NULL for the empty string is a problem. If having an empty string in filesystem related strings, including the deleted symlink .alias field, is considered to be an invalid situation, then there is nothing particularly wrong with the previous CloneString() implementation. If this assertion is correct, I would suggest adding an informative TRACE message to CloneString(*str) for debugging purposes to differentiate the zero length / null pointer / really running out of memory cases during development/debugging. Best Regards, -Chris