raid5 multi-drive-failure and recovery?

Dan Merillat (harik@chaos.ao.net)
Thu, 19 Dec 2002 19:49:05 -0500


I've got a problem. Two drives in my array developed bad sectors at
the same time. They're in completely different places, but I can't
read the disk because md simply fails out both of them, and marks the
array unusable.

Now, I can either buy new drives and dd the raw partition over, or I
can hack the kernel to make it a bit smarter about unrecoverable
reads.

Obviously, if I have raid5_error not mark the drive bad, it will hammer
away on it, failing over and over. My thought was to let the read
fail, catch it in raid5_end_read_request then tag the stripe_head
with the device that's failed. If one has already failed, return
EIO. This way further reads on the stripe_head will go to the parity
disk (until it's eventually freed. One IO error per stripe isn't too
harsh a price to pay for disaster recovery)

in 2.4.20, I'm at raid5.c:421 where we're about to call md_error.
What happens to the bh from that point? Obviously, it's not up-to-date,
so when 1 drive fails how does it get re-issued to be pulled from a parity
drive to reconstruct it?

Please CC me, I read via the archives.

Thanks,

--Dan

Obviously, this would ONLY be for recovery in the face of bad sectors.
As quickly as possible the bad drives need to be replaced
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/