Block devices in pagecache, fsck, and what's going on...

Petr Vandrovec (VANDROVE@vc.cvut.cz)
Mon, 22 Jul 2002 22:34:44 +0200


Hello,
I've noticed very bad problem few weeks ago, but I thought that it
is just some bad accident. But today it happened again.

Say that for some reason kernel will crash (such as missing read_unlock...)
and you have to hard reboot it.

Now you reboot, and system runs e2fsck, which removes couple of
/var/run/*.pid entries.

It was not severe, so initscripts (Debian unstable) remount root read-write,
and continue booting. And now - oops - first cleanup script complains
that it could not remove these *.pid files because of -EIO, and shortly
after that filesystem is remounted read-only because of "Freeing blocks not
in datazone - block = 271450112, count=56; block = 32768, count = 2564"
and it goes downhill very quickly...

After reboot fsck it says that entries xxx.pid in /var/run has deleted/unused
inode yyy, and if I do not reboot, there is very high chance that it
will die again.

If I reboot after running e2fsck, everything is fine. Always.

My question is very simple: is this intended behavior of non-coherent
cache between /dev/hda1 and ext2 layer (which also makes dump to crash),
or is it something missing in e2fsck (1.27), or is it Debian bug and
it is now required to reboot machine after any run of e2fsck on /
partition?
Thanks,
Petr Vandrovec
vandrove@vc.cvut.cz

P.S.: BTW, I find it very strange that in /var/log/* I can have
written: "Can't open file /var/cache/samba/browse.dat.. Error was Read-only
file system" when /var/log/* lives on same filesystem as /var/cache...
and also all remounting-readonly messages are here, written to same
partition which was remounted-read only due to these fatal errors.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/