Re: [patch for playing] Patch to support 4000 disks and maintain

Patrick Mansfield (patmans@us.ibm.com)
Fri, 11 Apr 2003 13:04:07 -0700


On Fri, Apr 11, 2003 at 11:35:43AM -0700, Joel Becker wrote:
> On Fri, Apr 11, 2003 at 11:12:32AM -0700, Patrick Mansfield wrote:
> > I'm trying to pull the current multi-path patch up to 2.5.66 (ouch).
>
> I wasn't aware of this work. This is very interesting. Two
> questions:
>
> 1) When does it failover? Meaning, if I I/O to a disk, but someone
> yanks the fibrechannel plug. Does your multipath wait for a SCSI
> timeout to redirect the I/O?

> 2) If so, have you considered trapping loop up/down events to handle
> such a case? Real users of multipath tech do not want to wait 90s for
> failover.
>
> Joel

Generally it fails a path when we get a path specific error; it fails the
IO if we get a device (i.e. logical unit) error. If there are no paths
available, the IO is failed (though this could be changed to be
user-settable).

Behaviour on a cable removal is fibre, adapter, and adapter drive specific
- the qla driver has some sort of timeout on a port down that can be
lowered. It (fibre channel) can immediately complete (with failure) an
outstanding IO on a port down (or SCN notification). loop attached does
not always give you notification.

So with loop attached (AFAIK) you still might have to wait for a timeout
if you yank a disk.

Timeouts are the hardest to deal with - since we don't know where the
error occurred, so generally should not fail the IO (or path).

If we had user scanning, and some sort of hotplug for targets coming and
going, those be used to add and remove (or just fail) paths (at least for
switch attached).

-- Patrick Mansfield
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/