Re: [Yaffs] Crash in yaffs_CheckpointClose after corruption.

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Chris Paulson-Ellis
Date:  
To: yaffs
CC: Charles Manning
Subject: Re: [Yaffs] Crash in yaffs_CheckpointClose after corruption.
Hi,

While you're looking at this, you may want to consider the failure modes when the filesystem is full. The dd that wrote the last available block failed as expected with ENOSPC...

dd: writing '/nand/7/f11': No space left on device
20+0 records in
19+0 records out

However, the next dd that tries to write to the full filesystem fails not with ENOSPC, but ENOMEM...

dd: can't open '/nand/7/f12': Cannot allocate memory

I don't know if this is expected behaviour, but maybe it indicates some problem in the filesystem. It is probably unrelated to the checkpoint problem as it happens with CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS set to 20.

Despite all this, yaffs has been stable for me as long as I prevent the filesystem becoming full. Thanks for all your hard work.

Chris.

Charles Manning wrote:
> On Wednesday 28 November 2007 06:46:44 Chris Paulson-Ellis wrote:
>> Hi,
>>
>> I've run some tests with CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS set to 10
>> and 20. It does indeed fail when it runs out of checkpoint blocks. It seems
>> by 1GB NAND needs about 12 checkpoint blocks for my usage.
>
> Thanks for confirming that.
>
> It's going to be far better for yaffs to calculate its checkpoint needs on the
> fly. I'll be doing that shortly.
>
> I guess too that there should be better checking so that it does not panic if
> the space runs out.
>
> -- Charles
>
>> Running this test script:
>>
>> http://www.edesix.com/yaffs/yaffs_test
>>
>> when CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS was set to 10, crashed the
>> kernel:
>>
>> http://www.edesix.com/yaffs/yaffs_test.log.1
>>
>> and when CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS was set to 10, didn't:
>>
>> http://www.edesix.com/yaffs/yaffs_test.log.2
>>
>> Note that the lines in the log from [sys|k]logd lag behind the lines
>> written by the script.
>>
>> You can see that the 2 runs start behaving differently when the number of
>> checkpoint blocks reaches 11. The crash occurs, as before, on trying to
>> unmount after a failed write to a freshly mounted, but full filesystem.
>>
>> Regards,
>> Chris.
>>
>> Charles Manning wrote:
>>> I've been thinking about this a bit.
>>>
>>> This is probably being triggered by running out of checkpoint blocks.
>>>
>>> Try setting CONFIG_YAFFS_CHECKPOINT_RESERVED_BLOCKS to a larger number
>>> (say 20).
>>>
>>>
>>> -- CHarles
>>>
>>> On Thursday 15 November 2007 00:49:06 Chris Paulson-Ellis wrote:
>>>> Hi,
>>>>
>>>> I have a yaffs2 filesystem on a 1Gbyte NAND flash that has somehow
>>>> become corrupt. The filesystem mounts, but yaffs crashes on unmount if
>>>> any changes are made to the directory with the corrupt entries.
>>>>
>>>> My pattern of access is continuous writing (at about 200kBytes/s) with a
>>>> background task removing the oldest files when the filesystem reports
>>>> less than 1Mbyte free. The files so created and deleted are about
>>>> 135MBytes, so there are only a few of them stored at once and space is
>>>> freed in large blocks.
>>>>
>>>> The corrupt filesystem had become full, perhaps due to a failure of the
>>>> mechanism described above to take into account the ability of yaffs to
>>>> garbage collect blocks fast enough or to checkpoint.
>>>>
>>>> My current yaffs code was taken from CVS on 13-Nov-2007. The version
>>>> running when the filesystem apparently first became corrupt was taken
>>>> from CVS on 30-Sep-2007, so was almost exactly the same.
>>>>
>>>> It crashes on unmount like this:
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 00000000 pgd = c3dc0000
>>>> [00000000] *pgd=23db6031, *pte=00000000, *ppte=00000000
>>>> Internal error: Oops: 17 [#1] PREEMPT
>>>> Modules linked in: pss_nand qhal
>>>> CPU: 0    Not tainted  (2.6.22.2 #1)
>>>> PC is at yaffs_CheckpointClose+0x9c/0x128
>>>> LR is at yaffs_CheckpointSave+0x478/0x4d8
>>>> pc : [<c00eb0a8>]    lr : [<c00e9590>]    psr: 80000013
>>>> sp : c0017e00  ip : c0017e20  fp : c0017e1c
>>>> r10: befaeedc  r9 : c3cef000  r8 : 00000000
>>>> r7 : 00000000  r6 : 00000000  r5 : c3dc9e00  r4 : c3cef000
>>>> r3 : 00000000  r2 : c0016000  r1 : c0016000  r0 : c01fc018
>>>> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  Segment user
>>>> Control: c000317f  Table: 23dc0000  DAC: 00000015
>>>> Process umount (pid: 155, stack limit = 0xc0016260)
>>>> Stack: (0xc0017e00 to 0xc0018000)
>>>> 7e00: c3cef000 00000000 c3dc9e00 c02e0f80 c0017e8c c0017e20 c00e9590
>>>> c00eb01c 7e20: c0086610 c3cef000 c3dc9e00 00000000 00000000 00000000
>>>> c0016000 befaeedc 7e40: c0017e64 c0017e50 c002df64 c002db88 c0016000
>>>> c0017e6c c0017e8c c0017e78 7e60: c00df754 c3cef000 c3dc9e00 c02e0f80
>>>> 00000000 00000000 c0016000 befaeedc 7e80: c0017ea4 c0017e90 c00dfd88
>>>> c00e9128 c3dc9e00 c022b3a8 c0017ebc c0017ea8 7ea0: c0073cec c00dfd58
>>>> c02e6060 c022b448 c0017ed4 c0017ec0 c0073d9c c0073c7c 7ec0: c004638c
>>>> c3dc9e00 c0017eec c0017ed8 c0073e9c c0073d94 c3dc9e00 c02e0f20 7ee0:
>>>> c0017f0c c0017ef0 c00892b0 c0073e48 c0017f28 c02e0f20 c3dc9e00 00000000
>>>> 7f00: c0017f24 c0017f10 c00794c8 c0089248 00000000 c023cf78 c0017fa4
>>>> c0017f28 7f20: c008ac7c c00794b4 c3db5878 c02e0f20 c0017fa4 c0017f40
>>>> c00757ec 00000001 7f40: 00000001 00000000 01f0000b 000041b6 00000001
>>>> 00000000 00000000 00000000 7f60: 00000800 00000000 34aadccf 00000000
>>>> 34aadccf 00000000 c0017f78 c0017f78 7f80: 00001000 00053008 000532c0
>>>> 00053298 00000034 c001f044 00000000 c0017fa8 7fa0: c001eec0 c008aa98
>>>> 00053008 000532c0 000532c0 00000000 00000008 00000000 7fc0: 00053008
>>>> 000532c0 00053298 00000034 00000000 00000000 befaeedc 00000000 7fe0:
>>>> befacb7c befacb50 400f9b08 400f9b2c 60000010 000532c0 ebffe5b7 e59f216c
>>>> Backtrace:
>>>> [<c00eb00c>] (yaffs_CheckpointClose+0x0/0x128) from [<c00e9590>]
>>>> (yaffs_CheckpointSave+0x478/0x4d8) r6:c02e0f80 r5:c3dc9e00 r4:00000000
>>>> [<c00e9118>] (yaffs_CheckpointSave+0x0/0x4d8) from [<c00dfd88>]
>>>> (yaffs_put_super+0x40/0xb8) [<c00dfd48>] (yaffs_put_super+0x0/0xb8) from
>>>> [<c0073cec>] (generic_shutdown_super+0x80/0x118) r5:c022b3a8 r4:c3dc9e00
>>>> [<c0073c6c>] (generic_shutdown_super+0x0/0x118) from [<c0073d9c>]
>>>> (kill_block_super+0x18/0x2c) r5:c022b448 r4:c02e6060
>>>> [<c0073d84>] (kill_block_super+0x0/0x2c) from [<c0073e9c>]
>>>> (deactivate_super+0x64/0x7c) r4:c3dc9e00
>>>> [<c0073e38>] (deactivate_super+0x0/0x7c) from [<c00892b0>]
>>>> (mntput_no_expire+0x78/0xc0) r5:c02e0f20 r4:c3dc9e00
>>>> [<c0089238>] (mntput_no_expire+0x0/0xc0) from [<c00794c8>]
>>>> (path_release_on_umount+0x24/0x28) r7:00000000 r6:c3dc9e00 r5:c02e0f20
>>>> r4:c0017f28
>>>> [<c00794a4>] (path_release_on_umount+0x0/0x28) from [<c008ac7c>]
>>>> (sys_umount+0x1f4/0x208) r4:c023cf78
>>>> [<c008aa88>] (sys_umount+0x0/0x208) from [<c001eec0>]
>>>> (ret_fast_syscall+0x0/0x2c) r8:c001f044 r7:00000034 r6:00053298
>>>> r5:000532c0 r4:00053008
>>>> Code: e59f0088 e1560003 aa000004 e59430f4 (e7935106)

>>>>
>>>>
>>>> Here is a (ARM9) dissasembly of the file containing
>>>> yaffs_CheckpointClose:
>>>>
>>>> http://www.edesix.com/yaffs/yaffs_checkptrw.o.dis
>>>>
>>>>
>>>> Here is a trace of the mount after doing:
>>>>
>>>> # echo +all > /proc/yaffs
>>>> # echo -mtd > /proc/yaffs
>>>> # echo 9 > /proc/sys/kernel/printk
>>>> # mount -t yaffs2 /dev/mtdblock11 /nand/7
>>>>
>>>> http://www.edesix.com/yaffs/mount.txt
>>>>
>>>>
>>>> Here is the trace of the (failing) write operation after doing:
>>>>
>>>> # echo +all > /proc/yaffs
>>>> # touch /nand/7/video/test
>>>>
>>>> http://www.edesix.com/yaffs/touch.txt
>>>>
>>>>
>>>> Here is a trace of the umount (including above crash) after doing:
>>>>
>>>> # echo +all > /proc/yaffs
>>>> # echo -mtd > /proc/yaffs
>>>> # umount /nand/7
>>>>
>>>> http://www.edesix.com/yaffs/umount.txt
>>>>
>>>>
>>>> Here is a dump of the NAND (warning - 1Gbyte), created with:
>>>>
>>>> # nanddump -f nand7.dump /dev/mtd11
>>>> ECC failed: 0
>>>> ECC corrected: 0
>>>> Number of bad blocks: 5
>>>> Number of bbt blocks: 0
>>>> Block size 131072, page size 2048, OOB size 64
>>>> Dumping data starting at 0x00000000 and ending at 0x40000000...
>>>>
>>>> http://www.edesix.com/yaffs/nand7.dump
>>>>
>>>>
>>>> Regards,
>>>> Chris.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> yaffs mailing list
>>>>
>>>> http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>> _______________________________________________
>> yaffs mailing list
>>
>> http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>
>
>
> _______________________________________________
> yaffs mailing list
>
> http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>
>