Re: [Yaffs] Oops crash in yaffs_AddOrFindLevel0Tnode during …

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: yaffs
CC: Blair Barnett, Paul Lima, Gennady Dagman
Subject: Re: [Yaffs] Oops crash in yaffs_AddOrFindLevel0Tnode during mount
On Tuesday 26 September 2006 06:11, Gennady Dagman wrote:
> Hello,
>
> We ran into this linux kernel crash during mounting yaffs2 partition
> (please find full Oops file attached below)
> and from the trace-back and register analysis I conclude that:
>
> The trace-back function call chain:
> get_sb_bdev ->
> yaffs_internal_read_super ->
> yaffs_GutsInitialise ->
> yaffs_CheckpointRestore ->
> yaffs_ReadCheckpointData ->
> yaffs_ReadCheckpointObjects ->
> yaffs_ReadCheckpointTnodes ->
> yaffs_AddOrFindLevel0Tnode -> memcpy
>
> >From the looking into yaffs_AddOrFindLevel0Tnode code it's pretty clear
>
> that the only reason for memcpy
> (at the end of yaffs_AddOrFindLevel0Tnode) to crash is having both
> fStruct->topLevel = 0 and fStruct->top = 0.
>
> Looks like this problem is not reproducible easily - we saw it only ones
> so far and I suspect
> the root cause of it (as well as few others odd problems we run into
> from time to time - see, for example,
> http://aleph1.co.uk/lurker/message/20060914.181435.951c1454.en.html) is
> a flash file system corruption.
>
> Questions:
> ---------------
>
> 1) Can you imagine what could be the reason (other than flash fs
> corruption) for this Oops crash ?


I'll have a look at that oops to see what hapopened.

>
> 2) I see that currently in our yaffs code we have defined:
>
> #define CONFIG_YAFFS_DISABLE_CHUNK_ERASED_CHECK
>
> means that erasure check of NAND chunks is NOT performed before
> write, but I know for sure that from time to time we do encounter
> not erased chunks as result of power-off during block erasure.
> What could be the consequences of using not erased chunks for
> yaffs_WriteChunkWithTagsToNAND ? Could it cause fs corruptions ?
> problems like
> http://aleph1.co.uk/lurker/message/20060914.181435.951c1454.en.html ??
> or this current crash ???


Try out the new code I recently checked in for handling the retirement better.
Amongst other things this handles the erased checking far better and I hunch
it will fix the problem you describe here without burdening you with lots of
erase checks.

See http://aleph1.co.uk/lurker/message/20060921.084052.7a405cd0.en.html