These come from the firmware
> Mar 9 07:13:00 system kernel: scsi : aborting command due to timeout :
> 162469964, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 06 a3 ff 00 00 08
We start to timeout because the firmware isnt responding
> Mar 9 07:13:07 system kernel: aacraid:ID(0:02:0); Error Event
> Mar 9 07:13:07 system kernel: aacraid:ID(0:02:0); Medium Error, Block
> Range 435234 : 435234
> Mar 9 07:13:07 system kernel: aacraid:ID(0:02:0); Error Too Long To
Firmware finally gives up
> 3. disk 2 on channel 0 fails. No problem, it's a mirror, right ?
> Mar 9 07:13:36 system kernel: aacraid: BBR timed out at Block 0x6a42d
> Mar 9 07:13:36 system kernel: aacraid:Drive 0:2:0 returning error
> Mar 9 07:13:36 system kernel: aacraid:ID(0:02:0) - IO failed, Cmd[0x28]
Drive firmware fails the I/O
> So, why does the system run fine on the broken mirror, but panics and
> crashes when the mirror actually breaks ?
> This is very frustrating - one of the reasons we spent money to mirror
> things was to reduce possible downtimes (since a disk failure will not
> crash the machine) but ... a disk failure does crash the machine.
> Explanations welcome.
Looking at the trace the driver was thrown by something. I think I know
what may have occurred in your case but not in the test/qualification
sets. Somehow the firmware spent so long we aborted/gave up and killed
of a command - then it completed and we tried to sell the scsi layer.
It'll be a while before I can validate that, you might also want to
report it to firstname.lastname@example.org (I think - see MAINTAINERS file for
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/