Re: [Yaffs] Re: [YAFFS1] Some bits are changed - systematica…

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Charles Manning
Date:  
To: yaffs
CC: Martin Egholm Nielsen
Subject: Re: [Yaffs] Re: [YAFFS1] Some bits are changed - systematically
On Wednesday 30 November 2005 03:28, Martin Egholm Nielsen wrote:
> Hi Ian,
>
> I take to the list - who knows if somebody might read this one day...
>
> Ian McDonnell wrote:
> >>>>I'm sure Charles will ask for this when he wakes up in New
> >>>>Zealand...
> >>
> >>I guess not :-)
> >
> > Yes, he is very quiet isn't he.


Yup I do take a break now and then.... and there are another around 200 people
on the list who would hopefully have some opinions too.

> :
> :-)
> :
> >>Can I enable some flags in the YAFFS core enabling some debug
> >>information that'll help me investigating this problem?
> >>I guess what I want is to have a list of nodes stating what
> >>nodes belong to what file and when the node was written...


Fiddle with the yaffs_traceFlags trace mask. Set according to the flags
defined in yportenv.h.

This really should be a thing you can se on the fly through procfs.

> >
> > Can you reproduce the problem? Does the corruption hit the same
> > file? Is it similar in other files? Do you know it's not a NAND
> > or MTD problem -- i.e a corrupted write or a bad device. Have
> > you seen this problem on other instances of the h/w. etc.
>
> That's the only device I've seen it with - out of 20-30 pieces having
> had the same "treatment" :-)
> And no I haven't tried that device any more - I didn't want to ruin the
> possibility to analyse what has happened...
>
> And I don't know if it's a NAND or MTD problem - I was hoping that some
> could guide me...
>
> Can this occur, say, with a bad NAND? Would YAFFS/MTD puke up with a lot
> of checksum errors?


A few things that I can think of:

1) A gross NAND failure. YAFFS/mtd are not magic and need reasonably reliable
media to do anything. ECC can fix for single bit errors, but nothing more. If
can't fix gross NAND errors any more than ReiserFS can work with a disk with
a 6 inch nail through it.

2) Iffy timing. CHeck you NAND access timing. Marginal timing has a habit of
making some parts work OK and others not.

3) Check that the ECC code is actually working OK. A poor ECC implementation
could cause more damage than it fixes.

4) Bad block handling. If a bad block is not being flagged correctly then you
could end up retrying it on every mount. That would be a problem.

-- CHarles