Ok, problem found.
There's a nice loop in scsi_send_eh_cmnd() which just loops endlessly
trying to retry a SCpnt command on medium error without restoring it
to its pristine state before giving it back to the host, or limiting
the number of retries.
The end result is that the SCSI command says N sectors, the request
list may say N * sector size bytes, but SCpnt->request_bufflen may
be complete rubbish, since the HBA driver is allowed to modify this
field during the processing of a command.
We do have a specific function to restore the SCSI command data to
a pristine state (scsi_eh_retry_command).
Ok, so there's two bugs here:
1. It's possible for SCSI to completely bring a box to a complete
standstill when it encounters a SCSI error.
2. Not retrying the command in its correct state.
I'll look at cooking up a patch for both of these in the next few days
or so.
-- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/