Actually, I really think we should move the failure recovery up to the
filesystem: we can fairly easily do it already today, as basically very
few of the filesystems actually do the requests directly, but instead
rely on helper functions like "bread()" and "generic_file_read()".
Moving error handling up has a lot of advantages:
- it simplifies the (often fragile) lower layers, and moves common
problems to common code instead of putting it at a low level and
duplicating it across different drivers.
- it allows "context-sensitive" handling of errors, ie if there is a
read error on a read-ahead request the upper layers can comfortably
just say "f*ck it, I don't need it yet", which can _seriously_ help
interactive feel on bad mediums (right now we often try to re-read
a failing sector tens of times, because we re-read it during
read-ahead several times, and the lower layers re-read it _too_).
In fact, it would even be mostly _simple_ to do it at a higher level, at
least for reads.
Writes are somewhat harder, mainly because the upper layers have never
had to handle re-issuing of requests, and don't really have the state
For reads, sufficient state information is already there ("uptodate" bit
- just add a counter for retries), but for writes we only have the dirty
bit that gets cleared when the request gets sent off. So for writes
we'd need to add a new bit ("write in progress", and then clear it on
successful completion, and set the "dirty" bit again on error).
So I'd actually _like_ for all IO requests to be clearly "try just
once", and it being up to th eupper layers to retry on error.
(The notion of timeouts are much harder - the upper layers can retry on
errors, but I really don't think that the upper layers want to handle
timeouts and the associated "cancel this request" issues. So low layers
would still have to do _that_ part of error recovery, but at least they
wouldn't have to worry about keeping the request around until it is
Does anybody see any really fundamental problems with moving retrying to
_above_ ll_rw_block.c instead of below it?
(And once it is above, you can much more easily support filesystems that
can automatically remap blocks on IO failure etc, and even have
interruptible block filesystem mounts for those pesky bad media problems
- allowing user level to tell the kernel to not screw around too much
with errors and just return them early to user space).
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/