Re: Disk errors and Reiserfs

Stephen C. Tweedie (sct@redhat.com)
Tue, 18 Sep 2001 11:17:02 +0100


Hi,

On Mon, Sep 17, 2001 at 01:40:36AM +0100, Alan Cox wrote:
> > My issue, though, is Linux did not handle it well. Userspace actually has
> > an 'EIO' error code for this situation but, instead, any program touching
> > the mounted partition hung in a D state.
>
> Thats a reiserfs property and one you'll find in pretty much any other
> fs.

No --- ext2 and ext3 will propagate EIO up to the application. We've
also spent a lot of effort making sure that ext2 won't ever panic even
if the IO succeeds but returns bogus data (disk, cable or controller
faults). Disk failures should never cause process kernel hangs, any
more than bogus network packets should.

> Killing the process isnt neccessary, its been halted in its tracks. As to
> a clean shutdown - no chance. You've just hit a disk failure, the on disk
> state is not precisely known, writes have been lost. Nothing is going to
> make a clean shutdown possible under such circumstances.

Why not? ext2 lets you select between three behaviours on detecting
such an error: continue (the fs is marked as having errors and will be
fscked on the next boot, as long as we can write the error flag to the
superblock); remount-readonly (we fail the IO and force the fs
readonly, but otherwise continue as above); or panic immediately. As
long as you've selected continue or continue-ro, you should be able to
unmount the disk as soon as you've killed any processes still
accessing it. I've also spent a lot of effort making sure that the
backoff-and-remount-readonly code in ext3 is solid, too. I don't
regard a kernel lockup as a necessary response to disk failure.

Cheers,
Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/