Re: Are linux-fs's drive-fault-tolerant by concept?

Dr. David Alan Gilbert (gilbertd@treblig.org)
Sat, 19 Apr 2003 19:41:20 +0100


Hi,
Besides the problem that most drive manufacturers now seem to use
cheese as the data storage surface, I think there are some other
problems:

1) I don't trust drive firmware.
2) I don't think all drives are set to remap sectors by default.
3) I don't believe that all drivers recover neatly from a drive error.
4) It is OK saying return the drive and get a new one - but many of
us can't do this in a commercial environment where the contents of
the drive are confidential - leading to stacks of dead drives
(often many inside their now short warranty periods).

To be fair I'm not sure if it is only the drive firmware I don't trust -
it could be the controllers and the IDE drivers as well - I don't know.

While RAID works well for drives that just go pop and die, for drives
with dodgy firmware we just sit there and watch the filesystems decay.
I don't think the kernel can do much about that - but it is a sad state.

I'd find two things useful in this respect:
1) A tool to check the consistency of a RAID - presuming I shut my
RAID down safely I should actually be able to use the redundant
information to test it; this should reveal corruption early.
(Perhaps the kernel could check a few sectors a second in the
background)

2) A disc exerciser - something that I can use to see if this drive,
connected to this controller, on this motherboard on this kernel
actually works and keeps its data safe before I put it into live
service.

Dave (After a few weeks of fighting pissy IDE hard drives)

---------------- Have a happy GNU millennium! ----------------------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/