Re: [Yaffs] Crash in yaffs_CheckpointClose after corruption.

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Chris Paulson-Ellis
Date:  
To: yaffs
CC: Charles Manning
Subject: Re: [Yaffs] Crash in yaffs_CheckpointClose after corruption.
I've run my tests against the new code and it seems to be working.
Many thanks.
Chris.

Charles Manning wrote:
> I have checked in some code that calculates the blocks required in the
> checkpoint. This will hopefully make this problem go away.
>
> There is no need to play "guess the magic number" any more.
>
> -- CHarles
>
>
> On Wednesday 28 November 2007 06:46:44 you wrote:
>> Hi,
>>
>> I've run some tests with CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS set to 10
>> and 20. It does indeed fail when it runs out of checkpoint blocks. It seems
>> by 1GB NAND needs about 12 checkpoint blocks for my usage.
>>
>> Running this test script:
>>
>> http://www.edesix.com/yaffs/yaffs_test
>>
>> when CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS was set to 10, crashed the
>> kernel:
>>
>> http://www.edesix.com/yaffs/yaffs_test.log.1
>>
>> and when CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS was set to 10, didn't:
>>
>> http://www.edesix.com/yaffs/yaffs_test.log.2
>>
>> Note that the lines in the log from [sys|k]logd lag behind the lines
>> written by the script.
>>
>> You can see that the 2 runs start behaving differently when the number of
>> checkpoint blocks reaches 11. The crash occurs, as before, on trying to
>> unmount after a failed write to a freshly mounted, but full filesystem.
>>
>> Regards,
>> Chris.
>>
>> Charles Manning wrote:
>>> I've been thinking about this a bit.
>>>
>>> This is probably being triggered by running out of checkpoint blocks.
>>>
>>> Try setting CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS to a larger number
>>> (say 20).
>>>
>>>
>>> -- CHarles
>>>
>>> On Thursday 15 November 2007 00:49:06 Chris Paulson-Ellis wrote:
>>>> Hi,
>>>>
>>>> I have a yaffs2 filesystem on a 1Gbyte NAND flash that has somehow
>>>> become corrupt. The filesystem mounts, but yaffs crashes on unmount if
>>>> any changes are made to the directory with the corrupt entries.
>>>>
>>>> My pattern of access is continuous writing (at about 200kBytes/s) with a
>>>> background task removing the oldest files when the filesystem reports
>>>> less than 1Mbyte free. The files so created and deleted are about
>>>> 135MBytes, so there are only a few of them stored at once and space is
>>>> freed in large blocks.
>>>>
>>>> The corrupt filesystem had become full, perhaps due to a failure of the
>>>> mechanism described above to take into account the ability of yaffs to
>>>> garbage collect blocks fast enough or to checkpoint.
>>>>
>>>> My current yaffs code was taken from CVS on 13-Nov-2007. The version
>>>> running when the filesystem apparently first became corrupt was taken
>>>> from CVS on 30-Sep-2007, so was almost exactly the same.
>>>>
>>>> It crashes on unmount like this:
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 00000000 pgd = c3dc0000
>>>> [00000000] *pgd=23db6031, *pte=00000000, *ppte=00000000
>>>> Internal error: Oops: 17 [#1] PREEMPT
>>>> Modules linked in: pss_nand qhal
>>>> CPU: 0    Not tainted  (2.6.22.2 #1)
>>>> PC is at yaffs_CheckpointClose+0x9c/0x128
>>>> LR is at yaffs_CheckpointSave+0x478/0x4d8
>>>> pc : [<c00eb0a8>]    lr : [<c00e9590>]    psr: 80000013
>>>> sp : c0017e00  ip : c0017e20  fp : c0017e1c
>>>> r10: befaeedc  r9 : c3cef000  r8 : 00000000
>>>> r7 : 00000000  r6 : 00000000  r5 : c3dc9e00  r4 : c3cef000
>>>> r3 : 00000000  r2 : c0016000  r1 : c0016000  r0 : c01fc018
>>>> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  Segment user
>>>> Control: c000317f  Table: 23dc0000  DAC: 00000015
>>>> Process umount (pid: 155, stack limit = 0xc0016260)
>>>> Stack: (0xc0017e00 to 0xc0018000)
>>>> 7e00: c3cef000 00000000 c3dc9e00 c02e0f80 c0017e8c c0017e20 c00e9590
>>>> c00eb01c 7e20: c0086610 c3cef000 c3dc9e00 00000000 00000000 00000000
>>>> c0016000 befaeedc 7e40: c0017e64 c0017e50 c002df64 c002db88 c0016000
>>>> c0017e6c c0017e8c c0017e78 7e60: c00df754 c3cef000 c3dc9e00 c02e0f80
>>>> 00000000 00000000 c0016000 befaeedc 7e80: c0017ea4 c0017e90 c00dfd88
>>>> c00e9128 c3dc9e00 c022b3a8 c0017ebc c0017ea8 7ea0: c0073cec c00dfd58
>>>> c02e6060 c022b448 c0017ed4 c0017ec0 c0073d9c c0073c7c 7ec0: c004638c
>>>> c3dc9e00 c0017eec c0017ed8 c0073e9c c0073d94 c3dc9e00 c02e0f20 7ee0:
>>>> c0017f0c c0017ef0 c00892b0 c0073e48 c0017f28 c02e0f20 c3dc9e00 00000000
>>>> 7f00: c0017f24 c0017f10 c00794c8 c0089248 00000000 c023cf78 c0017fa4
>>>> c0017f28 7f20: c008ac7c c00794b4 c3db5878 c02e0f20 c0017fa4 c0017f40
>>>> c00757ec 00000001 7f40: 00000001 00000000 01f0000b 000041b6 00000001
>>>> 00000000 00000000 00000000 7f60: 00000800 00000000 34aadccf 00000000
>>>> 34aadccf 00000000 c0017f78 c0017f78 7f80: 00001000 00053008 000532c0
>>>> 00053298 00000034 c001f044 00000000 c0017fa8 7fa0: c001eec0 c008aa98
>>>> 00053008 000532c0 000532c0 00000000 00000008 00000000 7fc0: 00053008
>>>> 000532c0 00053298 00000034 00000000 00000000 befaeedc 00000000 7fe0:
>>>> befacb7c befacb50 400f9b08 400f9b2c 60000010 000532c0 ebffe5b7 e59f216c
>>>> Backtrace:
>>>> [<c00eb00c>] (yaffs_CheckpointClose+0x0/0x128) from [<c00e9590>]
>>>> (yaffs_CheckpointSave+0x478/0x4d8) r6:c02e0f80 r5:c3dc9e00 r4:00000000
>>>> [<c00e9118>] (yaffs_CheckpointSave+0x0/0x4d8) from [<c00dfd88>]
>>>> (yaffs_put_super+0x40/0xb8) [<c00dfd48>] (yaffs_put_super+0x0/0xb8) from
>>>> [<c0073cec>] (generic_shutdown_super+0x80/0x118) r5:c022b3a8 r4:c3dc9e00
>>>> [<c0073c6c>] (generic_shutdown_super+0x0/0x118) from [<c0073d9c>]
>>>> (kill_block_super+0x18/0x2c) r5:c022b448 r4:c02e6060
>>>> [<c0073d84>] (kill_block_super+0x0/0x2c) from [<c0073e9c>]
>>>> (deactivate_super+0x64/0x7c) r4:c3dc9e00
>>>> [<c0073e38>] (deactivate_super+0x0/0x7c) from [<c00892b0>]
>>>> (mntput_no_expire+0x78/0xc0) r5:c02e0f20 r4:c3dc9e00
>>>> [<c0089238>] (mntput_no_expire+0x0/0xc0) from [<c00794c8>]
>>>> (path_release_on_umount+0x24/0x28) r7:00000000 r6:c3dc9e00 r5:c02e0f20
>>>> r4:c0017f28
>>>> [<c00794a4>] (path_release_on_umount+0x0/0x28) from [<c008ac7c>]
>>>> (sys_umount+0x1f4/0x208) r4:c023cf78
>>>> [<c008aa88>] (sys_umount+0x0/0x208) from [<c001eec0>]
>>>> (ret_fast_syscall+0x0/0x2c) r8:c001f044 r7:00000034 r6:00053298
>>>> r5:000532c0 r4:00053008
>>>> Code: e59f0088 e1560003 aa000004 e59430f4 (e7935106)

>>>>
>>>>
>>>> Here is a (ARM9) dissasembly of the file containing
>>>> yaffs_CheckpointClose:
>>>>
>>>> http://www.edesix.com/yaffs/yaffs_checkptrw.o.dis
>>>>
>>>>
>>>> Here is a trace of the mount after doing:
>>>>
>>>> # echo +all > /proc/yaffs
>>>> # echo -mtd > /proc/yaffs
>>>> # echo 9 > /proc/sys/kernel/printk
>>>> # mount -t yaffs2 /dev/mtdblock11 /nand/7
>>>>
>>>> http://www.edesix.com/yaffs/mount.txt
>>>>
>>>>
>>>> Here is the trace of the (failing) write operation after doing:
>>>>
>>>> # echo +all > /proc/yaffs
>>>> # touch /nand/7/video/test
>>>>
>>>> http://www.edesix.com/yaffs/touch.txt
>>>>
>>>>
>>>> Here is a trace of the umount (including above crash) after doing:
>>>>
>>>> # echo +all > /proc/yaffs
>>>> # echo -mtd > /proc/yaffs
>>>> # umount /nand/7
>>>>
>>>> http://www.edesix.com/yaffs/umount.txt
>>>>
>>>>
>>>> Here is a dump of the NAND (warning - 1Gbyte), created with:
>>>>
>>>> # nanddump -f nand7.dump /dev/mtd11
>>>> ECC failed: 0
>>>> ECC corrected: 0
>>>> Number of bad blocks: 5
>>>> Number of bbt blocks: 0
>>>> Block size 131072, page size 2048, OOB size 64
>>>> Dumping data starting at 0x00000000 and ending at 0x40000000...
>>>>
>>>> http://www.edesix.com/yaffs/nand7.dump
>>>>
>>>>
>>>> Regards,
>>>> Chris.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> yaffs mailing list
>>>>
>>>> http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>
>
>
>