Re: [PATCH] Bio Traversal Changes

Suparna Bhattacharya (suparna@in.ibm.com)
Tue, 6 Aug 2002 17:47:25 +0530


On Mon, Aug 05, 2002 at 10:47:39AM -0500, James Bottomley wrote:
> suparna@in.ibm.com said:
> > There is only one call to ->request_fn for the entire request, and the
> > drivers manages things underneath. The chunks are expected to complete
> > sequentially. In the situation where the request is restarted in the
> > event of an error (say), the submission pointers are rolled back to
> > the last (successfully) completed point before issuing the request
> > again.
>
> Yes, that's the way I thought it would operate.
>
> suparna@in.ibm.com said:
> > I must say that I initially did think that this could be extended to
> > the more generic case which you probably are referring to and that
> > such an approach could take away the need to split bios in certain
> > cases (i.e. when the i/o is destined for a single queue). Later it
> > appeared that trying to cover the case where each of these pieces
> > gets queued up and might complete out of order (requiring a tag to
> > correlate things on completion), would most likely boil down to
> > trying to maintain all the state that struct request does today.
>
> For this more generic case, most of our problems seem to be because the
> barrier has width: It actually belongs to an I/O request. If the barrier had
> zero width (i.e. it was simply a barrier in the stream with no I/O attached)
> then it would be much easier to preserve it correctly across this (or any
> other) type of bio splitting. It would also make it much more obvious to the
> implementing driver where the barrier was supposed to be in the I/O stream,
> and would allow more efficient "wait for completion" barrier implementations
> for drivers that couldn't enforce it any other way.
>
> > Would be nice (for me) to understand this in more detail. There might
> > be some possibilities. Any pointers that I can look up to get a
> > clearer idea ?
>
> The SCSI standards (www.t10.org) are the only real authoritative source (with
> even some explanation). However, I'll do my best to summarise.
>
> In SCSI, commands are allowed to disconnect, that is suspend temporarily while
> the device does other things. When the device implements tag command
> queueing, it is allowed to disconnect one command and subsequently reconnect
> (restart) a different one. In theory, this means that we can have multiple
> active I/Os at once. The way you signal to the scsi device that you want a
> barrier is to label one or more of the tags as "ordered" which means that the
> device must complete all I/O of tags prior to the ordered one before it and
> may not begin I/O of subsequent tags until the ordered tag has completed.
>
> looping a single request over a big bio means that the SCSI device sees the
> I/O as a discrete stream of tags. However, we lose throughput if we stall the
> queue waiting for this single bio to complete and we can't work out what the
> next tag is until the prior tag completes. In the non barrier case,
> everything will still be OK as long as the queue isn't stalled because we'll
> be getting throughput from other bios coming down.
>
> I think basically, I'd like to translate as much of the bio as I can into SCSI
> tags to improve throughput and each tag currently requires a struct request.

I didn't think of the possibility of serializing the chunks
of a single request, while letting other requests on the queue through
in the no barrier situation. That's a thought, though it might result
in non-optimal scans ... and in that sense affect the throughput.
But, now I see why the barrier case was the one you were mainly worried
about.

>
> > Does completion notification happen only when all the commands
> > covered by a single tag complete ? Otherwise, what is the ordering
> > amongst the multiple commands in question (do they complete in serial
> > order as well) ?
>
> Yes and no. You get a special completion code (INTERMEDIATE_TASK_COMPLETE)
> which says "I've finished this bit, give me the next part". You don't get a
> real SCSI completion until the last part of the linked task set completes.
> The task is linked sequentially, so it does complete in serial order.

Thanks for the explanation. I think I get the gist.

>
> However, Don't worry about the linked task stuff, it's a rather esoteric area
> of the SCSI standard (that allows a single tag to be used across multiple I/Os
> in very much the same way the bio splitting works) which, on mature
> reflection, probably isn't such a good idea to use since I'd be doubtful about
> how well it's implemented in the devices we have to deal with.

OK.

Regards
Suparna

>
> James
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/