On Tue, Feb 21, 2017 at 9:08 AM, Hunter Somerville wrote: > Charles, > > I don't think the mapping thing is the case. Before this problem was > found, my testing involved filling up the entire 190gb partition and > draining it out one file at a time while validating each file. This would > happen several times successfully. > > Our latest theory was that it was continuing to write in a low-power state > and maybe ending up writing random stuff in potentially random locations. > Our last revision of the driver should fix a bug we found that might allow > this to happen even if it was writing garbage and nothing has changed. That > said, I have done a nanddump of the blocks earlier in the address range > than the file being written and validated their contents. None of the > previously existing data is being corrupted during the write. Hence why I > can mount the partition read-only and read back all of the files. > > There is a smaller partition (~400 blocks instead of ~47500) which I tried > using for testing. I cannot cause the failure in this smaller partition no > matter what I try. A smaller one seems even more unlikely to see a failure. > I'd also have to deliberately slow down our driver to cut power in time for > a write under 80MB in size (doable, but might obscure the problem). > 47500 blocks of 4MB (=128 pages of 32k) is a total of around 6 million pages. That's pretty large. I'll do some calcs to see if this could be a number space issue. Sometimes it also helps to turn off some features to see if that makes a difference. It's not that I recommend running with those features off, but just trying to isolate the issue. The two major features are checkpoint and block summaries. > I've used nandwrite/nanddump to write/read this partition extensively, > yes. I never lose data this way, aside from the partial write happening > during the powerloss. The data is only erased after yaffs marks all the > blocks as unused and performs garbage collection. > > My colleague is in the process of modifying UBIFS to work with DMA such > that we can test if the problem still exists with a different filesystem... > If you can't make it fail with mtdtools then a filesystem should not change things. Charles > Thanks, > Hunter > > On Mon, Feb 20, 2017 at 2:27 PM, Charles Manning > wrote: > >> >> >> On Tue, Feb 21, 2017 at 8:17 AM, Hunter Somerville < >> hsomervi5790@gmail.com> wrote: >> >>> >>> On Thu, Feb 9, 2017 at 4:23 PM, Charles Manning >>> wrote: >>> >>>> Hi Hunter >>>> >>>> On Fri, Feb 10, 2017 at 8:57 AM, Hunter Somerville < >>>> hsomervi5790@gmail.com> wrote: >>>> >>>>> On Tue, Feb 7, 2017 at 3:44 PM, Charles Manning >>>>> wrote: >>>>> >>>>>> On Tue, Feb 7, 2017 at 5:19 AM, Hunter Somerville < >>>>>> hsomervi5790@gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We are encountering an issue where we will usually lose an entire >>>>>>> partition of data if the flash device loses power during a write operation. >>>>>>> When we bring the system back up and remount, all files/directories appear >>>>>>> as long strings of questionmarks with incorrect filenames and such, and we >>>>>>> end up having to flash erase the partition to recover. This only happens on >>>>>>> the device with fairly large pages (4MB Erase blocks, 32KB pages, 1KB OOB), >>>>>>> and does not occur on the more typical device in the same system which uses >>>>>>> 4KB pages. >>>>>>> >>>>>> >>>>>> What kind of flash are you using? What part number? >>>>>> >>>>> >>>>> The hardware is proprietary, and not designed by us. What I can tell >>>>> you is that we interface with an FPGA - not the flash chips directly. The >>>>> FPGA performs the writes. >>>>> >>>> >>>> Surely the flash parts are off the shelf. >>>> >>> >>> I'm getting permission on this. They're Samsung parts. >>> >>> We've discovered that mounting the partition as read-only after >>> powerloss demonstrates that the data is all present and correct, aside from >>> the file which was actively being written. I can read back any of the files >>> and verify their contents. If at any point I mount this partition as >>> read-write after the powerloss, yaffs appears to mark all blocks as unused >>> and then proceeds to garbage collect every block. My files all slowly >>> disappear. >>> >>> yaffs: Collecting block 3, in use 1, shrink 0, whole_block 0 >>> yaffs: Collecting block 3 that has no chunks in use >>> yaffs: yaffs_block_became_dirty block 3 state 8 >>> yaffs: yaffs_tags_marshall_read chunk 256 data ef1f0000 tags ef6f5cd8 >>> yaffs: packed tags obj -1 chunk -1 byte -1 seq -1 >>> yaffs: ext.tags eccres 1 blkbad 0 chused 0 obj 0 chunk0 byte 0 del 0 ser >>> 0 seq 0 >>> yaffs: yaffs_tags_marshall_read chunk 257 data ef1f0000 tags ef6f5cd8 >>> yaffs: packed tags obj -1 chunk -1 byte -1 seq -1 >>> yaffs: ext.tags eccres 1 blkbad 0 chused 0 obj 0 chunk0 byte 0 del 0 ser >>> 0 seq 0 >>> ....... >>> yaffs: yaffs_tags_marshall_read chunk 382 data ef1f0000 tags ef6f5cd8 >>> yaffs: packed tags obj -1 chunk -1 byte -1 seq -1 >>> yaffs: ext.tags eccres 1 blkbad 0 chused 0 obj 0 chunk0 byte 0 del 0 ser >>> 0 seq 0 >>> yaffs: yaffs_tags_marshall_read chunk 383 data ef1f0000 tags ef6f5cd8 >>> yaffs: packed tags obj -1 chunk -1 byte -1 seq -1 >>> yaffs: ext.tags eccres 1 blkbad 0 chused 0 obj 0 chunk0 byte 0 del 0 ser >>> 0 seq 0 >>> yaffs: Erased block 3 >>> >>> I can't yet figure out why it's marking these blocks as unused when >>> there are clearly files present. Any help on this matter would be greatly >>> appreciated. >>> >> >> Hello Hunter >> >> That sounds pretty weird. >> >> The only time I've ever seen something like that happen was when there >> was a bug in the driver so that the flash got mapped twice. (ie the driver >> said the part was, say, 32 MB but was actually just accessing the first >> 16MB twice). >> >> When you get issues like this it is often also a good thing to first just >> try a small partition (say 20 blocks). That way there's a lot less detail >> and you can maybe spot the patters quicker. >> >> If you're using Linux, have you tried testing the drivers by just using >> the mtdtools to run tests? >> >> -- Charles >> >> >