Re: Version 2.4.1 has ext2 problems.

Russell King (rmk@arm.linux.org.uk)
Sat, 3 Feb 2001 10:23:47 +0000 (GMT)


Richard B. Johnson writes:
> Files generated by e2fsck in lost+found cannot be removed.
> # rm *
> rm: cannot remove `#1006': Value too large for defined data type

Well, I can say that this isn't an isolated incident. I was hitting 2.4.1
hard last night on ARM, and ended up loosing my /usr and /var mountpoints
and a few other files to this exact corruption.

I resorted to using debugfs to remove these entries, and re-running e2fsck.

Oh, the other interesting thing about it was that they had random modes
(eg, 1066440) - e2fsck also complained about a large number of errors on
the affected inodes (eg, various fields of the inode structure which should
be zero, d_time stuff, etc). Sorry, don't have the e2fsck logs, and I'm
reluctant to try to reproduce it.

I've been wondering if the ARMv3 implementation of insw/outsw is broken
(yes, its running in PIO only), hence I haven't reported it until now,
but it seemed to check out last night.

Maybe this problem and my random process SEGV problem are connected in
some way. Basically, I was trying to track down a problem with processes
getting SEGV'd when swap partitions was enabled. I ended up with init
in a loop panicing about SEGVs. It turns out that the wrong page had
been paged back in into the binary, and therefore glibc's __environ
pointer was corrupted. Specifically, the page that was placed there was
the immediately preceding page.

I know that other people have been seeing weird effects on 2.4.1 with
corrupted zero pages, but I don't think this is my problem.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/