Re: kernel freeze on 2.4.0.prerelease (smp,raid5)

Neil Brown (neilb@cse.unsw.edu.au)
Wed, 3 Jan 2001 20:18:22 +1100 (EST)


On Tuesday January 2, gf435@gmx.net wrote:
>
> Perhaps a deadlock with a normal (not irq) spinlock.
>
> Could you enable SysRQ and press <Alt>+<SysRq>+<P> ("showPc")
> Then write down the EIP values (including the [< >] brackets) and
> translate them with ksymoops.
>
> Ksymoops repeats only the EIP values.
>
> But searching through the System.map file has only Labels from
> the raid5 staff around.
>
> As stated in my first mail I run actually my raid5 devices in degrated mode
> and as I remenber there has been some raid5 stuff changed between
> test13p3 and newer kernels.

So tell us, why do you run your raid5 devices in degraded mode?? I
cannot be good for performance, and certainly isn't good for
redundancy!!! But I'm not complaining as you found a bug...

>
> Hope this gives someone an idea?

Yep. This, combined with a related bug report from n0ymv@callsign.net
strongly suggests the following patch.
Writes to the failed drive are never completing, so you eventually
run out of stripes in the stripe cache and you block waiting for a
stripe to become free.

Please test this and confirm that it works.

NeilBrown

--- ./drivers/md/raid5.c 2001/01/03 09:04:05 1.1
+++ ./drivers/md/raid5.c 2001/01/03 09:04:13
@@ -1096,8 +1096,10 @@
bh->b_rdev = bh->b_dev;
bh->b_rsector = bh->b_blocknr * (bh->b_size>>9);
generic_make_request(action[i]-1, bh);
- } else
+ } else {
PRINTK("skip op %d on disc %d for sector %ld\n", action[i]-1, i, sh->sector);
+ clear_bit(BH_Lock, &bh->b_state);
+ }
}
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/