Re: RAID problems

Neil Brown (neilb@cse.unsw.edu.au)
Wed, 31 Jul 2002 13:32:26 +1000 (EST)


On Tuesday July 30, wa1hco@adelphia.net wrote:
>
> Raid needs an automatic way to maintain device synchronization. Why should
> I have to...
> manually examine the device data (lsraid)
> find two devices that match
> mark the others failed in /etc/raidtab
> reinitialize the raid devices...putting all data at risk
> hot add the "failed" device
> wait for it to recover (hours)
> change /etc/raidtab again
> retest everything
>
> This is 10 times worse that e2fsck and much more error prone. The file
> system guru's worked hard on journalling to minimize this kind of risk.
>

Part of the answer is to use mdadm
http://www.cse.unsw.edu.au/~neilb/source/mdadm/

mdadm --assemble --force ....

will do a lot of that for you.

Another part of the answer is that raid5 should never mark two drives
as failed. There really isn't any point.
If they are both really failed, you've lost your data anyway.
If it is really a cable failure, then it should be easier to get back
to where you started from.
I hope to have raid5 working better in this respect in 2.6.

A finally part of the answer is that even perfect raid software cannot
make up for buggy drivers, shoddy hard drives, or flacky cabling.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/