fs corruption with 2.4.20 IDE+md+LVM

Carl Wilhelm Soderstrom (chrome@real-time.com)
Sat, 4 Jan 2003 22:45:00 -0600


I observed filesystem corruption on my home workstation recently. I was
running kernel 2.4.20 (built myself with gcc 2.95.4), and ext3 with the
default journaling mode (ordered?).

I was downloading files, and noticed that they weren't being saved. I
immediately did a 'df -h', and it reported my home partition as having 7.3T
used, -64Z free.

I (foolishly) immediately did a 'du -sch ~/*' to see what might be taking up
all the space. after realizing what was going on (du reported filesystem
permission errors on files it shouldn't have), I shut down all programs, and
dropped to runlevel 1.

I unmounted my LVM'ed partitions (/var /usr /home), and tried to fsck
/dev/sys/home (the /home partition). it couldn't find a good superblock; and
fell back to using another backup superblock. fsck reported that the journal
was corrupt, and discarded it. many of the low-numbered inodes had wrong
refcounts, or wrong modes.

eventually it fixed the filesystem; but everything ended up in many files &
directories under lost+found. (had to pull the home dirs from one or more
dirs each, under lost+found).

after fixing the filesystem, I gratuitously fsck -f'ed all my other
partitions; they came up clean.

fortunately, looks like the only stuff I really lost were some chunks of my
XFree86 source tree, and some linux kernel sources. easily replaceable
stuff.

here's my system architecture:
2x Western Digital 80GB Special Edition IDE drives (hde, hdf)
- / is an ext3 RAID1 /dev/md0 made of hde1 and hdf1
- /dev/md1 is LVM-formatted RAID1, made of hde2 and hdf2. this partition
contains /var, /usr, and /home.

/home is the only place that I saw this corruption.

I have since reverted back to kernel 2.4.18.

I'm thinking that my reaction *should* have been to power-cycle the box
immediately upon notice of the problem, to prevent further fs corruption,
and bring it back up in single-user read-only mode. shutting down programs
nicely would have written more stuff to disk, worsening the corruption.

I will also point out that kernel 2.4.20-ac1 and 2.4.21-pre6 will not boot
on my machine; they kernel panic when detecting my IDE devices. I have not
tried 2.4.20-ac2 nor 2.4.21-pre2 yet. 2.4.20 and 2.4.18 boot quite happily
tho. I suppose I ought to try the latest versions and set up a serial
console to capture the oops, before reporting a bug on this.

Carl Soderstrom.

-- 
Systems Administrator
Real-Time Enterprises
www.real-time.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/