Hi Charles,
> Date: Fri, 4 Dec 2009 07:56:54 +1300
>
> Hi Chris
> Do you have a test case to get the alias into this state? eg.
> ln -s "" xxx or some such?
>
> It would be nice to be able to make the problem appear and disappear on
> demand.
>
No. Honestly, we have never been able to replicate it here. But I am
going to try again now that I have a better understanding of the
problem.
Here is how I think it happens:
1. user deletes symlink
2. YFREE on .alias
3. hardware interrupt occurs
4. yaffs code is interrupted
5. ISR executes and ZMALLOCs or something similar, and returns
6. yaffs code is returned to
7. yaffs calls strncpy on 'scribbled' buffer that was YFREE'ed,
Now, the next ScanBackwards() call will fail. But there is something
I do not understand, and that is, ScanBackwards() is only called when
yaffs checkpoint is invalid. I can see where the checkpoint is
invalidated, and I can see that a call to put_super() causes it to be
validated again. But, when does put_super() typically get called?
(Sorry, I know this is a bit off topic, but I lack experience here,
and the informatian may help me replicate this issue.)
Note, if a hardware interrupt does not occur between 2 and 7, or if
the ISR doesn't call MALLOC, The YFREE'd buffer will still have valid
data, which is then written to disk, and the problem will not exhibit
any symptoms. I think this is what usually happens.
I do not think it will be simple to reproduce the problem. But, like
I said I am going to try, and will report my results.
Also note, we have never tried to set a symlink to an empty string
value. I am reasonably certian all our symlinks were set to valid
filenames.
> As you point out, the primary problem is that CloneString(*str) returns NULL
> if str is "". I think that's the bogus value you are talking about.
> By my reading that's handled properly by the patch
> http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_guts.c?r1=1.95&r2=1.96
> at around 2258.
> @@ -2253,11 +2258,12 @@
> {
> YCHAR *newStr = NULL;
>
> - if (str && *str) {
> - newStr = YMALLOC((yaffs_strlen(str) + 1) * sizeof(YCHAR));
> - if (newStr)
> - yaffs_strcpy(newStr, str);
> - }
> + if (!str)
> + str = _Y("");
> +
> + newStr = YMALLOC((yaffs_strlen(str) + 1) * sizeof(YCHAR));
> + if (newStr)
> + yaffs_strcpy(newStr, str);
>
> return newStr;
>
>
> Or am I missing something???
>
Actually, I do not believe that is the primary problem. The primary
problem is that a FREE'ed buffer's contents is written to disk. And
values in that buffer could have been ANYTHING. It just happens to be
the case I've seen it set to a zero length string once.
So, for instance, the value in .alias yaffs_ObjectHeader structure on
disk could be a NULL terminated string, or, even worse, it could be a
bunch of non-zero values, and its even concievable that there is no
NULL termination. If it is the latter case, very bad things could
happen here when the yaffs_ObjectHeader (oh) is read from disk:
yaffs_guts.c, towards end of yaffs_ScanBackwards()
case YAFFS_OBJECT_TYPE_SYMLINK:
if (oh) {
in->variant.symLinkVariant.alias =
yaffs_CloneString(oh->alias);
if (!in->variant.symLinkVariant.alias)
alloc_failed = 1;
}
break;
and in yaffs_ObjectHeader alias is defined as:
YCHAR alias[YAFFS_MAX_ALIAS_LENGTH + 1];
So let us assume oh->alias contains all 0xFF values with no NULL
termination. A pointer to this is passed into CloneString(), then to
strcpy. Now strcpy will copy past the end of .alias, and this is bad.
Therefore, I think the oh->alias, when read from disk, for a deleted
symlink, is best ignored. Both my patches do exactly this.
To put this in perspective, this is only going to happen when the fs
is generated by pre-'December 2009' version of yaffs, and the system
is deleting symlinks, and there is an ISR or tasklet that likes to
initialize buffers to non zero values. Rare, I admit. But it could
happen. I would prefer to say it couldn't.
Now, back to this statement:
> As you point out, the primary problem is that CloneString(*str) returns NULL
I am not convinced that CloneString(*str) returning NULL for the empty
string is a problem. If having an empty string in filesystem related
strings, including the deleted symlink .alias field, is considered to
be an invalid situation, then there is nothing particularly wrong with
the previous CloneString() implementation. If this assertion is
correct, I would suggest adding an informative TRACE message to
CloneString(*str) for debugging purposes to differentiate the zero
length / null pointer / really running out of memory cases during
development/debugging.
Best Regards,
-Chris