[Yaffs] YAFFS Disk filling up - GC stuck?

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
Delete this message
Reply to this message
Author: walter
Date:  
To: yaffs
Subject: [Yaffs] YAFFS Disk filling up - GC stuck?
Greetings all,

I have been using yaffs for a few years now (with no issues) and have
recently been experiencing a bit of a mystery. I am using openwrt and I
should probably be posing this question/issue over there, but I have a
feeling you guys will have some better insight into this particular
problem. (note: the files are written directly to flash, there is no
overlay filesystem like some OpenWRT images)

The basic symptom is that the disk is reporting full, but there are not
enough files to legitimately take up that much space. These systems were
previously working, and are still running (in the last few months they have
been constrained by "no free space").

When I do "df -h", it reports the system is full, but when I do a sort of
crude totaling of file sizes (du -hcs /), it is reporting significantly
less usage.
root@errored_system:/root# du -hcs /
16.3M    /
*16.3M*    total
root@errored_system:/root# df -h
Filesystem                Size      Used Available Use% Mounted on
rootfs                   58.0M     *58.0M*      8.0K 100% /
/dev/root                58.0M     58.0M      8.0K 100% /
tmpfs                    30.3M    444.0K     29.8M   1% /tmp
tmpfs                   512.0K         0    512.0K   0% /dev


(~41.7MB unaccounted for.... thats more than can be justified by blocks
that are not full)

I have looked for files that might be sucking up this much space, but have
not found any (big files, or lots of small files). This leads me to believe
there is something amiss with the file system and why I am seeking your
insight.





I don't know much about debugging yaffs specifically, but this seems like a
good clue:
root@errored_system:/root# echo "-all+gc">/proc/yaffs
root@errored_system:/root# dmesg | tail -f
[ 3419.220000] yaffs_block_became_dirty block 3048 state 8
[ 3419.220000] GC Selected block 3437 with 1 free, prioritised:0
[ 3419.220000] yaffs: GC erasedBlocks 5 aggressive 1
[ 3419.260000] yaffs_block_became_dirty block 3437 state 8
[ 3419.270000] GC Selected block 2201 with 1 free, prioritised:0
[ 3419.270000] yaffs: GC erasedBlocks 5 aggressive 1
[ 3419.310000] yaffs_block_became_dirty block 2201 state 8
[ 3419.310000] GC Selected block 2231 with 1 free, prioritised:0
[ 3419.310000] yaffs: GC erasedBlocks 5 aggressive 1
[ 3419.370000] yaffs_block_became_dirty block 2231 state 8

These "yaffs_block_became_dirty block" records rapidly fill up the kernel
log... (on the good systems, I do not get these messages, at least not in
rapid succession)


Heres the interesting thing: they are all repeating on the same few blocks.
(2201, 2357, 2819, 3048 & 3437... same story on the other error'd systems,
only different block numbers) So my hunch is that something is preventing
these blocks from getting GC'd and are somehow backlogging the other blocks
that need freeing. Is that even possible? I am able to delete legitimate
files and regain their space, but that doesn't really address the issue of
the mystery disk usage. It does tell me that the GC is working at least in
part.


Incidentally, I cross-referenced the yaffs_block_became_dirty block against
the "bad eraseblocks" and "blocks marked bad" as reported on startup and
they were different (i.e. the blocks in the above GC messages were not
"known bad blocks"




Is there any way to forcibly verify the integrity of the data blocks in
yaffs and free up this mystery space? In other words, I have remote access
to these systems, but re-flashing isn't exactly an option for me as they
are geographically distributed (i.e. data collectors). Can I somehow force
those block numbers to be marked as bad? (I'd imagine that wouldn't be
wise, even if I could)






Some more (hopefully) useful information:


I am not writing much to flash. About once an hour I update a couple of
values in a sqlite database. I dont know if the update operation is the
culprit (since its behaving on other systems), but its the only source of
disk IO I can think of.



There are not many reported bad blocks (5-20 depending on which system I'm
looking at)


The board manufacturer apparently has changed the NAND chip somewhere along
the line. I am experiencing the issue on both types (so I'm fairly certain
its not exclusively hardware related)
One system:
[    2.520000] NAND device: Manufacturer ID: 0xad, Chip ID: 0x76 (Hynix
NAND 64MiB 3,3V 8-bit)
Another system:
[    2.510000] NAND device: Manufacturer ID: 0xec, Chip ID: 0x76 (Samsung
NAND 64MiB 3,3V 8-bit)








Here is the /proc/yaffs from a good system (one that isn't full)

root@good_system:/root# cat /proc/yaffs
Multi-version YAFFS built:May 25 2012 01:48:02


Device 0 "rootfs"
start_block.......... 0
end_block............ 3711
total_bytes_per_chunk 512
use_nand_ecc......... 1
no_tags_ecc.......... 0
is_yaffs2............ 0
inband_tags.......... 0
empty_lost_n_found... 0
disable_lazy_load.... 0
refresh_period....... 500
n_caches............. 10
n_reserved_blocks.... 5
always_check_erased.. 0

data_bytes_per_chunk. 512
chunk_grp_bits....... 0
chunk_grp_size....... 1
n_erased_blocks...... 294
blocks_in_checkpt.... 0

n_tnodes............. 7901
n_obj................ 5653
n_free_chunks........ 55483

n_page_writes........ 597395
n_page_reads......... 906712
n_erasures........... 11524
n_gc_copies.......... 22402
all_gcs.............. 32575
passive_gc_count..... 32575
oldest_dirty_gc_count 0
n_gc_blocks.......... 11524
bg_gcs............... 0
n_retired_writes..... 0
nRetireBlocks........ 0
n_ecc_fixed.......... 0
n_ecc_unfixed........ 0
n_tags_ecc_fixed..... 0
n_tags_ecc_unfixed... 0
cache_hits........... 100140
n_deleted_files...... 37
n_unlinked_files..... 22541
refresh_count........ 0
n_bg_deletions....... 0








And heres one from a "disk full" system:

root@errored_system:/root# cat /proc/yaffs
Multi-version YAFFS built:May 25 2012 01:48:02


Device 0 "rootfs"
start_block.......... 0
end_block............ 3711
total_bytes_per_chunk 512
use_nand_ecc......... 1
no_tags_ecc.......... 0
is_yaffs2............ 0
inband_tags.......... 0
empty_lost_n_found... 0
disable_lazy_load.... 0
refresh_period....... 500
n_caches............. 10
n_reserved_blocks.... 5
always_check_erased.. 0

data_bytes_per_chunk. 512
chunk_grp_bits....... 0
chunk_grp_size....... 1
n_erased_blocks...... 5
blocks_in_checkpt.... 0

n_tnodes............. 23388
n_obj................ 19836
n_free_chunks........ 206

n_page_writes........ 2912051
n_page_reads......... 2907834
n_erasures........... 88768
n_gc_copies.......... 2751776
all_gcs.............. 88768
passive_gc_count..... 0
oldest_dirty_gc_count 0
n_gc_blocks.......... 88768
bg_gcs............... 0
n_retired_writes..... 0
nRetireBlocks........ 0
n_ecc_fixed.......... 1
n_ecc_unfixed........ 0
n_tags_ecc_fixed..... 0
n_tags_ecc_unfixed... 0
cache_hits........... 1851
n_deleted_files...... 2
n_unlinked_files..... 35064
refresh_count........ 0
n_bg_deletions....... 0




Thanks,
Walter