Re: [PATCH] - filesystem corruption on soft RAID5 in 2.4.0+

Holger Kiehl (Holger.Kiehl@dwd.de)
Mon, 22 Jan 2001 12:19:10 +0100 (CET)


On Sun, 21 Jan 2001, Manfred Spraul wrote:

> I've attached Holger's testcase (ext2, SMP, raid5)
> boot with "mem=64M" and run the attached script.
> The script creates and deletes 9 directories with 10.000 in each dir.
> Neil, could you run it? I don't have an raid 5 array - SMP+ext2 without
> raid5 is ok.
>
> Holger, what's your ext2 block size, and do you run with a degraded
> array?
>
No, I do not have a degraded array and the blocksize of ext2 is 4096. Here is
what /proc/mdstat looks like:

afdbench@florix:~/testdir$ cat /proc/mdstat
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md3 : active raid1 sdc1[1] sdb1[0]
136448 blocks [2/2] [UU]

md4 : active raid1 sde1[1] sdd1[0]
136448 blocks [2/2] [UU]

md0 : active raid1 sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
24000 blocks [5/5] [UUUUU]

md1 : active raid5 sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
3148288 blocks level 5, 64k chunk, algorithm 0 [5/5] [UUUUU]

md2 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
32033280 blocks level 5, 32k chunk, algorithm 0 [5/5] [UUUUU]

unused devices: <none>

What I do have is a spare disk and I am running swap on raid1. However,
my machine at home, which experienes the same problems, does not have swap
on raid and is also not degraded.

I applied Neils patch to 2.4.1-pre9 and rerun the test, again with
filesystem corruption. I now pressed the reset button and had all parity
recalculated under 2.2.18 and rebooted again to 2.4.1-pre9 to rerun
the test. Now, I do not see anymore filesystem corruption in syslog,
however forcing a check with e2fsck produces the following:

root@florix:~# !e2fsck
e2fsck -f /dev/md2
e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Special (device/socket/fifo) inode 3630145 has non-zero size. Fix<y>? yes

Special (device/socket/fifo) inode 3630156 has non-zero size. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md2: 20002/4006240 files (4.8% non-contiguous), 219556/8008320 blocks

Doing this three times, two of them reported the same inodes with non-zero
size. One test went without any problem (first time ever under 2.4.x).
Now, I am not sure if this still is a filessytem corruption and why
the corruptions where so bad, before the parity recalculation under
2.2.18. I do remember the first time I run 2.4.x with a much larger
testset, it corrupted my system so badly that I had to push the reset
button and parity was recalculated under 2.4.1-pre3.

I will now run my other testset, but this always takes 8 hours. When
this is done I report back.

Holger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/