Greetings all,

I have been using yaffs for a few years now (with no issues) and have recently been experiencing a bit of a mystery. I am using openwrt and I should probably be posing this question/issue over there, but I have a feeling you guys will have some better insight into this particular problem. (note: the files are written directly to flash, there is no overlay filesystem like some OpenWRT images)

The basic symptom is that the disk is reporting full, but there are not enough files to legitimately take up that much space. These systems were previously working, and are still running (in the last few months they have been constrained by "no free space").

When I do "df -h", it reports the system is full, but when I do a sort of crude totaling of file sizes (du -hcs /), it is reporting significantly less usage.

root@errored_system:/root# du -hcs /
16.3M    /
16.3M    total
root@errored_system:/root# df -h
Filesystem                Size      Used Available Use% Mounted on
rootfs                   58.0M     58.0M      8.0K 100% /
/dev/root                58.0M     58.0M      8.0K 100% /
tmpfs                    30.3M    444.0K     29.8M   1% /tmp
tmpfs                   512.0K         0    512.0K   0% /dev

(~41.7MB unaccounted for.... thats more than can be justified by blocks that are not full)

I have looked for files that might be sucking up this much space, but have not found any (big files, or lots of small files). This leads me to believe there is something amiss with the file system and why I am seeking your insight.

I don't know much about debugging yaffs specifically, but this seems like a good clue:
root@errored_system:/root# echo "-all+gc">/proc/yaffs
root@errored_system:/root# dmesg | tail -f
[ 3419.220000] yaffs_block_became_dirty block 3048 state 8
[ 3419.220000] GC Selected block 3437 with 1 free, prioritised:0
[ 3419.220000] yaffs: GC erasedBlocks 5 aggressive 1
[ 3419.260000] yaffs_block_became_dirty block 3437 state 8
[ 3419.270000] GC Selected block 2201 with 1 free, prioritised:0
[ 3419.270000] yaffs: GC erasedBlocks 5 aggressive 1
[ 3419.310000] yaffs_block_became_dirty block 2201 state 8
[ 3419.310000] GC Selected block 2231 with 1 free, prioritised:0
[ 3419.310000] yaffs: GC erasedBlocks 5 aggressive 1
[ 3419.370000] yaffs_block_became_dirty block 2231 state 8

These "yaffs_block_became_dirty block" records rapidly fill up the kernel log... (on the good systems, I do not get these messages, at least not in rapid succession)

Heres the interesting thing: they are all repeating on the same few blocks. (2201, 2357, 2819, 3048 & 3437... same story on the other error'd systems, only different block numbers) So my hunch is that something is preventing these blocks from getting GC'd and are somehow backlogging the other blocks that need freeing. Is that even possible? I am able to delete legitimate files and regain their space, but that doesn't really address the issue of the mystery disk usage. It does tell me that the GC is working at least in part.

Incidentally, I cross-referenced the yaffs_block_became_dirty block against the "bad eraseblocks" and "blocks marked bad" as reported on startup and they were different (i.e. the blocks in the above GC messages were not "known bad blocks"

Is there any way to forcibly verify the integrity of the data blocks in yaffs and free up this mystery space? In other words, I have remote access to these systems, but re-flashing isn't exactly an option for me as they are geographically distributed (i.e. data collectors). Can I somehow force those block numbers to be marked as bad? (I'd imagine that wouldn't be wise, even if I could)

Some more (hopefully) useful information:

I am not writing much to flash. About once an hour I update a couple of values in a sqlite database. I dont know if the update operation is the culprit (since its behaving on other systems), but its the only source of disk IO I can think of.

There are not many reported bad blocks (5-20 depending on which system I'm looking at)

The board manufacturer apparently has changed the NAND chip somewhere along the line. I am experiencing the issue on both types (so I'm fairly certain its not exclusively hardware related)
One system:
[ 2.520000] NAND device: Manufacturer ID: 0xad, Chip ID: 0x76 (Hynix NAND 64MiB 3,3V 8-bit)
Another system:
[ 2.510000] NAND device: Manufacturer ID: 0xec, Chip ID: 0x76 (Samsung NAND 64MiB 3,3V 8-bit)

Here is the /proc/yaffs from a good system (one that isn't full)

root@good_system:/root# cat /proc/yaffs
Multi-version YAFFS built:May 25 2012 01:48:02

Device 0 "rootfs"
start_block.......... 0
end_block............ 3711
total_bytes_per_chunk 512
use_nand_ecc......... 1
no_tags_ecc.......... 0
is_yaffs2............ 0
inband_tags.......... 0
empty_lost_n_found... 0
disable_lazy_load.... 0
refresh_period....... 500
n_caches............. 10
n_reserved_blocks.... 5
always_check_erased.. 0

data_bytes_per_chunk. 512
chunk_grp_bits....... 0
chunk_grp_size....... 1
n_erased_blocks...... 294
blocks_in_checkpt.... 0

n_tnodes............. 7901
n_obj................ 5653
n_free_chunks........ 55483

n_page_writes........ 597395
n_page_reads......... 906712
n_erasures........... 11524
n_gc_copies.......... 22402
all_gcs.............. 32575
passive_gc_count..... 32575
oldest_dirty_gc_count 0
n_gc_blocks.......... 11524
bg_gcs............... 0
n_retired_writes..... 0
nRetireBlocks........ 0
n_ecc_fixed.......... 0
n_ecc_unfixed........ 0
n_tags_ecc_fixed..... 0
n_tags_ecc_unfixed... 0
cache_hits........... 100140
n_deleted_files...... 37
n_unlinked_files..... 22541
refresh_count........ 0
n_bg_deletions....... 0

And heres one from a "disk full" system:

root@errored_system:/root# cat /proc/yaffs
Multi-version YAFFS built:May 25 2012 01:48:02

Device 0 "rootfs"
start_block.......... 0
end_block............ 3711
total_bytes_per_chunk 512
use_nand_ecc......... 1
no_tags_ecc.......... 0
is_yaffs2............ 0
inband_tags.......... 0
empty_lost_n_found... 0
disable_lazy_load.... 0
refresh_period....... 500
n_caches............. 10
n_reserved_blocks.... 5
always_check_erased.. 0

data_bytes_per_chunk. 512
chunk_grp_bits....... 0
chunk_grp_size....... 1
n_erased_blocks...... 5
blocks_in_checkpt.... 0

n_tnodes............. 23388
n_obj................ 19836
n_free_chunks........ 206

n_page_writes........ 2912051
n_page_reads......... 2907834
n_erasures........... 88768
n_gc_copies.......... 2751776
all_gcs.............. 88768
passive_gc_count..... 0
oldest_dirty_gc_count 0
n_gc_blocks.......... 88768
bg_gcs............... 0
n_retired_writes..... 0
nRetireBlocks........ 0
n_ecc_fixed.......... 1
n_ecc_unfixed........ 0
n_tags_ecc_fixed..... 0
n_tags_ecc_unfixed... 0
cache_hits........... 1851
n_deleted_files...... 2
n_unlinked_files..... 35064
refresh_count........ 0
n_bg_deletions....... 0

Thanks,

Walter