Re: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Andrew Morton (akpm@zip.com.au)
Sat, 22 Dec 2001 23:53:26 -0800


Andre Hedrick wrote:
>
> Daniel,
>
> Would it be great if Linux did not have such a lame design to handle the
> these problems? Just imaging if we had a properly formed
> FS/BLOCK/MAIN-LOOP/LOW_LEVEL model; whereas, an error of this nature in a
> write to media failure could be passed back up the pathway to the FS and
> request the FS to re-issue the storing of data to a new location.
>
> To bad the global design lacks this ablitity and I suspect that nobody
> gives a damn, or even worse ever thought of this idea.
>
> Now the importance of model is not to promote the use of dying media, but
> to be able to secure the content in buffer-cache, from app.-space, or
> user-space to the media and flag the UID(0) that we are in deep DOO-DOO
> and you better start backing up the content now.

For the filesystems with which I am familiar, this feature would
be quite difficult to implement. Quite difficult indeed. And given
that it only provides recovery for write errors, its value seems
questionable.

Much easier would be for those applications which really care about
data integrity to fsync() the file before closing it. If either of those
calls returns an error, the *application* should take some form of
action to preserve the data. MTAs do this, for example.

That being said, our propagation of I/O errors is a bit wobbly in places
(and the loop device hides write IO errors completely). But that's a
separate issue.

On this one, I would be more interested in the opinion of sophisticated
*users* of linux rather than kernel developers, btw. Whether this is
a valuable feature.

> I am just waiting to rip somebody's lunch who disagrees with me on the
> importance of data-recovery and relocation upon media failures. Any
> points claiming it is not important because the predictive nature of
> device failure is unknown, should maybe ask an expert in the industry to
> get you the answer. So lets have some fun and light off a really good
> flame war!
>

umm... Daniel's error was on a read. The data is not, by definition,
in memory. It's gone.

What percentage of disk errors occur on writes, as opposed to reads?
If errors-on-writes preponderate then maybe you're on to something.
But I don't think they do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/