[Yaffs] Behaviour of yaffs_CheckpointWrite() when a NAND wri…

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: David Peverley
Date:  
To: yaffs
Subject: [Yaffs] Behaviour of yaffs_CheckpointWrite() when a NAND write fails
Hi all,

I'm debugging a system where I get occasional failures of a NAND
verify when enabling CONFIG_MTD_NAND_VERIFY_WRITE. If I perform a
stack dump on the verify failure I see :

[ 275.078800] mt29f2g08aadwp_nand_verifybuf: Stack that brought us to
the verify failure :
[ 275.087956] [<c002fd5c>] (unwind_backtrace+0x0/0xec) from
[<c0330b68>] (dump_stack+0x18/0x1c)
[ 275.097638] [<c0330b68>] (dump_stack+0x18/0x1c) from [<c020bdcc>]
(mt29f2g08aadwp_nand_verifybuf+0x100/0x124)
[ 275.109212] [<c020bdcc>]
(mt29f2g08aadwp_nand_verifybuf+0x100/0x124) from [<c02067fc>]
(nand_write_page+0xd8/0x194)
[ 275.120848] [<c02067fc>] (nand_write_page+0xd8/0x194) from
[<c020782c>] (nand_do_write_ops+0x230/0x3a4)
[ 275.131358] [<c020782c>] (nand_do_write_ops+0x230/0x3a4) from
[<c0207a18>] (nand_write_oob+0x78/0xe4)
[ 275.141731] [<c0207a18>] (nand_write_oob+0x78/0xe4) from
[<c01e89a4>] (part_write_oob+0x8c/0xb8)
[ 275.152098] [<c01e89a4>] (part_write_oob+0x8c/0xb8) from
[<c0148f80>] (nandmtd2_WriteChunkWithTagsToNAND+0xd0/0x14c)
[ 275.163844] [<c0148f80>]
(nandmtd2_WriteChunkWithTagsToNAND+0xd0/0x14c) from [<c01461ec>]
(yaffs_CheckpointFlushBuffer+0x100/0x2e0)
[ 275.177803] [<c01461ec>] (yaffs_CheckpointFlushBuffer+0x100/0x2e0)
from [<c01465c0>] (yaffs_CheckpointWrite+0xbc/0xd4)
[ 275.189976] [<c01465c0>] (yaffs_CheckpointWrite+0xbc/0xd4) from
[<c014565c>] (yaffs_CheckpointTnodeWorker+0xdc/0xec)
[ 275.201717] [<c014565c>] (yaffs_CheckpointTnodeWorker+0xdc/0xec)
from [<c0145bc4>] (yaffs_CheckpointSave+0x4e4/0x578)
[ 275.213486] [<c0145bc4>] (yaffs_CheckpointSave+0x4e4/0x578) from
[<c0138cc8>] (yaffs_do_sync_fs+0xf0/0x120)
[ 275.224388] [<c0138cc8>] (yaffs_do_sync_fs+0xf0/0x120) from
[<c0138d2c>] (yaffs_sync_fs+0x34/0x50)
[ 275.234719] [<c0138d2c>] (yaffs_sync_fs+0x34/0x50) from
[<c00d9f50>] (__sync_filesystem+0x34/0x44)
[ 275.245827] [<c00d9f50>] (__sync_filesystem+0x34/0x44) from
[<c00da138>] (sync_filesystem+0x34/0x60)
[ 275.256108] [<c00da138>] (sync_filesystem+0x34/0x60) from
[<c00ba130>] (generic_shutdown_super+0x38/0x11c)
[ 275.267182] [<c00ba130>] (generic_shutdown_super+0x38/0x11c) from
[<c00ba234>] (kill_block_super+0x20/0x38)
[ 275.278050] [<c00ba234>] (kill_block_super+0x20/0x38) from
[<c00ba738>] (deactivate_super+0x50/0x68)
[ 275.288314] [<c00ba738>] (deactivate_super+0x50/0x68) from
[<c00d0158>] (mntput_no_expire+0x74/0xac)
[ 275.298612] [<c00d0158>] (mntput_no_expire+0x74/0xac) from
[<c00d04d0>] (sys_umount+0x58/0x364)
[ 275.308883] [<c00d04d0>] (sys_umount+0x58/0x364) from [<c0029a20>]
(ret_fast_syscall+0x0/0x28)

We know that we're failing with -EFAULT from
mt29f2g08aadwp_nand_verifybuf() so looking at how it gets passed up I
see :
mt29f2g08aadwp_nand_verifybuf .............. Has failed. Returns -EFAULT
nand_write_page ............................ Has failed. Returns -EIO.
nand_do_write_ops .......................... Returns the return
value of nand_write_page() : -EIO
nand_write_oob ............................. Returns the return
value of nand_do_write_ops() : -EIO
part_write_oob ............................. Returns the return
value of nand_write_oob() : -EIO
nandmtd2_WriteChunkWithTagsToNAND .......... Returns the return
value of mtd->write() : -EIO
yaffs_CheckpointFlushBuffer ................ Returns 1 (success) as
nandmtd2_WriteChunkWithTagsToNAND() not checked for return value

This concerns me as the actual operation in progress from the sync was
yaffs_CheckpointSave() which will think it has successfully completed
due to this which I think is a Bad Thing. My initial instinct is to
return a failure from yaffs_CheckpointFlushBuffer() on the write
failure. Would this make sense to you guys?

I know that we're using an old version of YAFFS but we have deployed
systems running this version and don't want to cause version
conflicts! Checking in the GIT head, the return value is still ignored
so this is still possible.

Cheers!

~Pev