>
> I have a Balloon 3 (P2) that has spent most of the afternoon spitting
> out error messages like this whenever I've tried to boot it:
>
> BUG: soft lockup detected on CPU#0!
>
> Pid: 711, comm: depmod CPU: 0 PC is at nand_read_byte+0x18/0x20 LR is at
> nand_command+0x114/0x1f4 pc : [<c0145850>] lr : [<c0145ea4>] Not tainted
> sp : c3cbd9cc ip : c3cbd9dc fp : c3cbd9d8 r10: 00000000 r9 : 00000000 r8
> : ffffffff r7 : 000000ff r6 : c3c46000 r5 : c3c46180 r4 : ffffffff r3 :
> c3c46180 r2 : c4812000 r1 : c4814014 r0 : 00000000 Flags: nZcv IRQs on
> FIQs on Mode SVC_32 Segment user Control: 397F Table: A3AB4000 DAC:
> 00000015 [<c002393c>] (show_regs+0x0/0x4c) from [<c00595f0>]
> (softlockup_tick+0x68/0x80)
> r4 = C3CBD984 [<c0059588>] (softlockup_tick+0x0/0x80) from [<c0043804>]
> (do_timer+0x70/0x10c)
> r4 = 00000001 [<c0043794>] (do_timer+0x0/0x10c) from [<c0026ecc>]
> (timer_tick+0xe0/0x134) [<c0026dec>] (timer_tick+0x0/0x134) from
> [<c002d488>] (pxa_timer_interrupt+0x34/0x84)
> r6 = 00000001 r5 = C3CBD984 r4 = F2A00000 [<c002d454>]
> (pxa_timer_interrupt+0x0/0x84) from [<c0022980>] (__do_irq+0x4c/0x90)
> r8 = 00000000 r7 = 0000001A r6 = C3CBD984 r5 = 00000000
> r4 = C027B2F8
> [<c0022934>] (__do_irq+0x0/0x90) from [<c0022bd0>] (do_level_IRQ+0x68/0xc0)
> r8 = FFFFFFFF r7 = 000000FF r6 = C3CBD984 r5 = 0000001A
> r4 = C02D22C8 [<c0022b68>] (do_level_IRQ+0x0/0xc0) from [<c0022d70>]
> (asm_do_IRQ+0x4c/0x74)
> r6 = 04000000 r5 = C3CBD9B8 r4 = C3CBD984
> [<c0022d24>] (asm_do_IRQ+0x0/0x74) from [<c0021924>] (__irq_svc+0x24/0x60)
> r4 = FFFFFFFF [<c0145838>] (nand_read_byte+0x0/0x20) from [<c0145ea4>]
> (nand_command+0x114/0x1f4) [<c0145d90>] (nand_command+0x0/0x1f4) from
> [<c0147a20>] (nand_write_oob+0xc8/0x22c)
> r8 = C3C46000 r7 = C3C46180 r6 = FFFFFFFB r5 = FFFFFFFF
> r4 = 00000000 [<c0147958>] (nand_write_oob+0x0/0x22c) from [<c013aed4>]
> (part_write_oob+0x78/0xb0) [<c013ae5c>] (part_write_oob+0x0/0xb0) from
> [<c00c5780>] (nandmtd_WriteChunkToNAND+0x108/0x114)
> r6 = C3DAF600 r5 = 00000000 r4 = 00F83E00 [<c00c5678>]
> (nandmtd_WriteChunkToNAND+0x0/0x114) from [<c00c4eb0>]
> (yaffs_WriteChunkToNAND+0x4c/0x54)
> r8 = C3CBDAB8 r7 = 00000000 r6 = C3DEF000 r5 = 00000000
> r4 = 00007C1F [<c00c4e64>] (yaffs_WriteChunkToNAND+0x0/0x54) from
> [<c00c5354>] (yaffs_TagsCompatabilityWriteChunkWithTagsToNAND+0xf0/0xfc)
> r6 = 00007C1F r5 = C3DEF000 r4 = C3CBDB1C [<c00c5264>]
> (yaffs_TagsCompatabilityWriteChunkWithTagsToNAND+0x0/0xfc) from
> [<c00c603c>] (yaffs_WriteChunkWithTagsToNAND+0xac/0x114) [<c00c5f90>]
> (yaffs_WriteChunkWithTagsToNAND+0x0/0x114) from [<c00bfed0>]
> (yaffs_DeleteChunk+0x20c/0x264)
> r8 = 00000000 r7 = 00007C3F r6 = 000003E1 r5 = C3DEF000
> r4 = C3CBDB1C [<c00bfcc4>] (yaffs_DeleteChunk+0x0/0x264) from
> [<c00bccfc>] (yaffs_HandleWriteChunkError+0x94/0xb4) [<c00bcc68>]
> (yaffs_HandleWriteChunkError+0x0/0xb4) from [<c00bcb80>]
> (yaffs_WriteNewChunkWithTagsToNAND+0xc0/0xe0)
> r6 = 00000000 r5 = C3DEF000 r4 = 00007C3F [<c00bcac0>]
> (yaffs_WriteNewChunkWithTagsToNAND+0x0/0xe0) from [<c00bffac>]
> (yaffs_WriteChunkDataToObject+0x84/0xc8) [<c00bff28>]
> (yaffs_WriteChunkDataToObject+0x0/0xc8) from [<c00c1854>]
> (yaffs_WriteDataToFile+0x2a0/0x2bc) [<c00c15b4>]
> (yaffs_WriteDataToFile+0x0/0x2bc) from [<c00ba98c>]
> (yaffs_file_write+0x98/0x190) [<c00ba8f4>] (yaffs_file_write+0x0/0x190)
> from [<c00ba560>] (yaffs_commit_write+0x90/0x160) [<c00ba4d0>]
> (yaffs_commit_write+0x0/0x160) from [<c005c074>]
> (generic_file_buffered_write+0x26c/0x614) [<c005be0c>]
> (generic_file_buffered_write+0x4/0x614) from [<c005c67c>]
> (__generic_file_aio_write_nolock+0x260/0x500) [<c005c41c>]
> (__generic_file_aio_write_nolock+0x0/0x500) from [<c005ca28>]
> (__generic_file_write_nolock+0x84/0xb0) [<c005c9a4>]
> (__generic_file_write_nolock+0x0/0xb0) from [<c005cc48>]
> (generic_file_write+0x48/0xbc) [<c005cc00>]
> (generic_file_write+0x0/0xbc) from [<c007a6bc>] (vfs_write+0xc0/0x184)
> [<c007a5fc>] (vfs_write+0x0/0x184) from [<c007a84c>]
> (sys_write+0x50/0x7c) [<c007a7fc>] (sys_write+0x0/0x7c) from
> [<c0021ca0>] (ret_fast_syscall+0x0/0x2c)
> r9 = C3CBC000 r8 = C0021E44 r7 = 00000004 r6 = 000185F8
> r5 = 00001000 r4 = 40330000
> **>> Block 993 retired
> **>> yaffs chunk 31808 was not erased
> **>> yaffs chunk 31809 was not erased
> **>> yaffs chunk 31810 was not erased
> **>> yaffs chunk 31811 was not erased
> **>> yaffs chunk 31812 was not erased
> **>> yaffs chunk 31813 was not erased
> **>> yaffs chunk 31814 was not erased
> **>> yaffs chunk 31815 was not erased
> **>> yaffs chunk 31816 was not erased
> **>> yaffs chunk 31817 was not erased
> **>> yaffs chunk 31818 was not erased
> **>> yaffs chunk 31819 was not erased
> **>> yaffs chunk 31820 was not erased
> **>> yaffs chunk 31821 was not erased
> **>> yaffs chunk 31822 was not erased
> **>> yaffs chunk 31823 was not erased
> **>> yaffs chunk 31824 was not erased
> **>> yaffs chunk 31825 was not erased
>
> It keeps doing this (but with progressively higher chunk numbers) until
> grinding to a halt after several minutes. It also occasionally throws
> out a message about retiring a block.
>
> This evening however, it has started behaving normally again and it
> boots all the way through to the login prompt, but I hadn't done anything
> to it to try and fix it.
>
>
> The same board was producing similar 'yaffs chunk n was not erased'
> messages on Thursday, but while trying to download a big file via
> zmodem, not while booting. On that occasion the only way to cure it and
> get a stable system again was a force-erase of the nand. (A non-forced
> erase didn't seem to work). It's been back to TCL and there doesn't seem
> to be anything wrong with the hardware. TCL were also able to cure it by
> doing a force-erase and managed to squirt several MBs over zmodem
> without reproducing any sysmptons.
>
> Any suggestions as to what's going on? Is it just encountering bad nand
> that's been marked as good (thanks to the forced erasing), or is there
> something else going on?
>
I have also seen this. I was investigating this stable bad behaviour and
then .. it went away. Gah! I have not (yet) been able to make it do it
again except that on my E1 board a reboot of the kernel sometimes spits out
loads of bad blocks where it was working fine. After I sort out bootldr
(again) I need to look at this. Note that the balloon mtd nand interface is
using absolute timings to wait for the nand to become ready (no ready/busy
line available to linux yet - being fixed). For 2k nand I had to increase
it. If you probelem is repeatable, maybe testing it with a higher value on
512b nand might change its behaviour.
Nick Bane
> Paul Fidler