Justin Gibbs writes:
> This currently only effects the initial bus reset delay.  If the
> driver holds off commands after subsequent bus resets, it can cause
> undeserved timeouts on the commands it has intentionally deferred.
> The mid-layer has a 5 second delay after bus resets, but I haven't
> verified that this is honored correctly during error recovery.
I'm planning a major re-write of this area in the error handler.  The way I 
think it should go is:
1) Quiesce host (set in_recovery flag)
2) Suspend active timers on this host
3) Proceed down the error correction track (eliminate abort and go down 
device, bus and host resets and finally set the device offline).
5) On each error recovery wait out a recovery timer for the device to become 
active before talking to it again.  Send all affected commands back to the 
block layer to await reissue (note: it would now be illegal for commands to 
lie to the mid layer and say they've done the reset when they haven't).
6) issue a TUR using a command allocated to the eh for that purpose.  Process 
the return code (in particular, if the device says NOT READY, wait some more). 
 Only if the TUR definitively fails proceed up the recovery chain all the way 
to taking the device offline.
I also plan to expose the suspend and resume timers API in some form for FC 
drivers to use.
James
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/