Hello Everyone,
I'm facing a critical problem with yaffs2 partition when using
OpenWRT on RouterBoards, specially RB450 and RB450G, which are the ones
i have to test/use.
Afternormal operations of writing and deleting files, space seems
to completly got lost on the partition until the point i get 'no space
left on the device' and have to reflash the board to get it operational
again. On RB450 board the space seems to get lost during reboots,while
on RB450Gs it get lost on reboots as well as normal write/delete
operations as well.
I have found other users claiming similar problems with
RouterBoards models as well, despite i can only test on RB450 and RB450G.
As i'm facing the problem on OpenWRT, i have already opened a
ticket to its bugtraq system, which had not received any attention yet.
With latest OpenWRT svn versions, i think things will be hard to
debug on yaffs part, as it seems that's for compiling 3.10 kernel, the
/proc/yaffs files were disabled (done on OpenWRT revision 37285)
https://dev.openwrt.org/changeset/37285
I have also download the revision 37000, which still builds on
kernel 3.9 and have the /proc/yaffs files, and the problem occurs
exactly the same way as related on my bugtraq ticket to the OpenWRT
team. If needed, i can provide /proc/yaffs contents between the tests i
have made, just let me know in which points i can provide that information.
The bugtraq to the OpenWRT team islocated at
https://dev.openwrt.org/ticket/16651
and reproduced here:
i'm having real problems on space simple dissapearing on my flash
partition on Routerboard 450 and a pretty recent trunk version (r40867)
steps to build and install:
1) svn'ed openwrt trunk, revision 40867
2) 'make menuconfig' with no .config present, edited the default one and
made just two changes: changed subtarget to 'Mikrotik devices with NAND
flash' and also checked root .tar.gz target image, so i could netboot AND
install this built
3) built the images
4) netbooted the image
5) installed it on the flash using wget2nand
6) rebooted from NAND
everything works fine, no problem until here.
To test the dissapearing space problem, right after this first boot from
NAND, that means, a completly fresh OpenWRT install, I run dd to create a
5Mb file from random data, excluded the file and do everything again.
Between runs, a counter was being displayed just to let me know how many
times it has already run.
i=1; while true; do echo "run # $i"; rm -f /teste; dd if=/dev/urandom
of=/teste bs=1M count=5; rm -f /teste; df | head -2; i=$(($i + 1)); done
After running a lot of times, everything SEEMS to be OK, used and
available spaces are the exact ones i had before starting the test:
run # 210
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 61440 3060 58380 5% /
However, after stopping the test (CTRL+C), manually excluded the last
remaining /teste file and rebooted the RB450, i had:
root@OpenWrt:/# df
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 61440 46292 15148 75% /
My 'used' space went from 3060 to 46292 by simply rebooting !!!
I ran the dd test for some extra times and space APPEARS unaffected
between dd runs:
run # 150
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 61440 46292 15148 75% /
another manually removal of any /teste left, reboot and ...
root@OpenWrt:/# df
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 61440 61424 16 100% /
and from now on, it's game over ... system really cannot write anything on
the partition anymore, as i'm really out of space. Listing files from /
shows that i'm not actually using that much space, which makes me believe,
somehow, my space is really getting lost, as i stated on the summary of
this ticket.
root@OpenWrt:/# dd if=/dev/urandom of=/teste bs=1M count=5
dd: writing '/teste': No space left on device
one curious note: during boot, as i'm watching boot from the serial
console, there's a pause between these lines:
yaffs: dev is 32505862 name is "mtdblock6" ro
yaffs: passed flags ""
--- pause here ---
VFS: Mounted root (yaffs filesystem) readonly on device 31:6.
Freeing unused kernel memory: 228K (802e7000 - 80320000
On the freshly installed OpenWRT, this pause is pretty short, less than 1
second. And as space is getting lost, this pause gets bigger, reaching
almost 7-8 seconds when i have the disk full problem on the RB450 (64Mb
NAND flash). Dont know if this matters, but i noticed it.
On RB450G, things are a little different, as space seems to get lost
DURING dd runs as well:
run # 7
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 520192 7224 512968 1% /
run # 8
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 520192 7264 512928 1% /
...
run # 27
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 520192 8032 512160 2% /
run # 28
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 520192 8072 512120 2% /
...
run # 37 (last run i waited)
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 520192 8444 511748 2% /
and after manually excluded any /teste left and rebooted:
root@OpenWrt:/# df
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 520192 8520 511672 2% /
So, i can reproduce this weird and apparently buggy behavior on RB450 as
well on RB450G. The two boards behaves a little different, RB450 losing
space during reboots and RB450G losing space during normal operations of
write/delete files and reboot as well.
I had already opened a ticket, three years ago, regarding weird space
usage on RB450, #9818 [
https://dev.openwrt.org/ticket/9818]. It was closed
with 'not a bug, wont fix'. I must confess that, at that time, i hadn't
noticed if space was being lost during reboots as i noticed now.
I have also found, on the forums, users complaining about space
dissapearing on their yaffs2 partition as well. Curiously enough, both
complaints are from users running on Routerboards
[
https://forum.openwrt.org/viewtopic.php?id=47352]
[
https://forum.openwrt.org/viewtopic.php?id=44940]
I know lots of OpenWRT users wont do any kind of writing on the flash
memory, just config files, so this will probably not be a real problem for
them or, maybe, it will simply take too long for becaming a problem in
fact. By the other side, i run squid on these boxes and need to have
logging enabled. Of course, i have a closely watched log rotation script,
given the low available flash space available. Anyway, that's completly
fine to me, as these are deployed on small networks.
For those that, somehow, uses writing on the partitions, this is certainly
a critical bug because after reaching the 'disk full' condition, only a
reflash will restore the box usable again. I could not find a way of
'recovering' that space, so just a new flashing will get that box
operational again.
--
Atenciosamente / Sincerily,
Leonardo Rodrigues
Solutti Tecnologia
http://www.solutti.com.br
Minha armadilha de SPAM, NÃO mandem email
gertrudes@solutti.com.br
My SPAMTRAP, do not email it