PROBLEM: More filesystem corruption with 2.4.1-pre3 and SW raid5

Holger Kiehl (Holger.Kiehl@dwd.de)
Mon, 15 Jan 2001 13:28:05 +0100 (CET)


Hello

Doing further tests I have experienced more filesystem corruption.
This time on another node, but also with SMP and SW raid5. The machine
has run the same test several times under 2.2.18, 2.2.17, 2.2.14 and
2.2.12 with no problems. This was the first time the test was run under
2.4.1 and gave me filesystem corruption. I observed the same thing on
my machine at home.

The test I am doing is copying/linking thousands of files around and delete
them again. The test starts of with 58 process copying 600 files (SMALL),
then 135 process copy around 9000 files (MEDIUM) and the in the last
test 325 process copy 80000 files (BIG). Each of the three tests (SMALL,
MEDIUM, BIG) is further divided into one test where the files get transmitted
via FTP (localhost) and another where the files are just being linked
from one directory to another one. And it always starts when I come
to the linking test. The link rate is about 2000 files/s. Here follows
some data what syslog reported:

Jan 13 17:09:03 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1881249), 0
Jan 13 17:09:03 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1881250), 0
Jan 13 17:09:03 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1881251), 0
.
.
.
Jan 13 17:19:56 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688150
Jan 13 17:19:57 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338561), 0
Jan 13 17:19:57 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338562), 0
Jan 13 17:19:57 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338563), 0
.
.
.
Jan 13 17:20:00 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3338647), 0
Jan 13 17:20:00 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688139
Jan 13 17:20:00 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688136
Jan 13 17:20:00 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 6688182
Jan 13 17:26:34 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361022), 0
Jan 13 17:26:34 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361023), 0
Jan 13 17:26:34 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361024), 0
.
.
.
Jan 13 17:26:35 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361023), 0
Jan 13 17:26:35 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (3361024), 0
Jan 13 17:29:20 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 918960
Jan 13 17:29:20 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 918961
Jan 13 17:29:20 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 918962
.
.
.
Jan 13 17:30:57 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 3808052
Jan 13 17:30:57 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 3808053
Jan 13 17:30:57 florix kernel: EXT2-fs error (device md(9,2)): ext2_free_blocks: bit already cleared for block 3808054
Jan 13 17:32:56 florix kernel: EXT2-fs error (device md(9,2)): ext2_readdir: bad entry in directory #2894349: rec_len % 4 != 0 - offset=0, inode=270105152, rec_len=1397, name_len=39
Jan 13 17:32:56 florix kernel: EXT2-fs warning (device md(9,2)): empty_dir: bad directory (dir #2894349) - no `.' or `..'
Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940635), 0
Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940636), 0
Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940637), 0
Jan 13 17:37:22 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1940638), 0
.
.
.
Jan 13 19:34:27 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1933469), 0
Jan 13 19:34:27 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1933471), 0
Jan 13 19:34:27 florix kernel: EXT2-fs warning (device md(9,2)): ext2_unlink: Deleting nonexistent file (1933472), 0

At this point I was not able to log in on the machine but it was still
running and doing something as I discovered this morning when I came to
work. On console it was constantly writting:

__alloc_pages: 0-order allocation failed

I was not able to log in so I had to reset the machine. After it came
back I had to repair the filesystem by hand but only the filesystem
and directories and files where I was doing my test where effected.
Here are some messages when I was fsck the disk:

Duplicate or bad block in use ...
Has 1 duplicate block(s), shared with 1 file(s): ...
Entry XXX has deleted/unused inode ...
Unattached inode XXX connect to /lost+found
Inode XXX ref count is 2, should be 1.
Inode XXX ref count is 6, should be 5.
Free blocks count wrong for group XXX

Here are the details of the machine:

Asus P2B-DS with two P3-450 and 256 MB ECC SDRAM
Oboard Adaptec AIC-7890/1 Ultra2 SCSI
6 x 9GB U2W SCSI disk put together as SW Raid 5
2 x Intel EEPro 100
RedHat 6.1 with the following installed:

Linux florix 2.4.1-pre2 #3 SMP Sat Jan 13 15:39:55 GMT 2001 i686 unknown
Kernel modules 2.4.1
Gnu C egcs-2.91.66
Gnu Make 3.77
Binutils 2.9.1.0.25
Linux C Library 2.1.2
Dynamic linker ldd (GNU libc) 2.1.2
Procps 2.0.4
Mount 2.10r
Net-tools 1.53
Console-tools 1999.03.02
Sh-utils 2.0
Modules Loaded w83781d sensors i2c-piix4 i2c-isa i2c-core

Holger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/