Re: [PATCH] io stalls

Nick Piggin (piggin@cyberone.com.au)
Wed, 11 Jun 2003 10:40:03 +1000


Robert White wrote:

>From: Nick Piggin [mailto:piggin@cyberone.com.au]
>Sent: Monday, June 09, 2003 8:23 PM
>
>>Actually, there is no priority other than time (ie. FIFO), and
>>seek distance in the IO subsystem. I guess this is why your
>>arguments fall down ;)
>>
>
>I'll buy that for the most part, though one of the differences I read
>elsewhere in the thread was the choice between add_wait_queue() and
>add_wait_queue_exclusive(). You will, however, note that one of the factors
>that is playing in this patch is process priority.
>
>(If I understand correctly) The wait queue in question becomes your FIFOing
>agent, it is kind of a pre-queue on the actual IO queue, once you reach a
>"full" condition.
>

Right.

>
>In the later case [add_wait_queue_exclusive()] you are strictly FIFO over
>the set of processes, where the moment-of-order is determined by insertion
>into the wait queue.
>
>In the former case [add_wait_queue()] when the queue is woken up all the
>waiters will be marked executable on the scheduler, and the scheduler will
>then (at least tend to) sort the submissions into task priority order. So
>the higher priority tasks will get to butt into line. Worse, the FIFO is
>essentially lost to the vagaries of the scheduler so without the _exclusive
>you have no FIFO at all.
>
>I think that is the reason that Chris was saying the
>add_wait_queue_exclusive() mode "does seem to scale much better."
>

Yep

>
>So your "original new" batching agent is really order-of-arrival that
>becomes anti-sorted by process priority. Which can lead to scheduler
>induced starvation (and the observed "improvements" by using the strict FIFO
>created by a_w_q_exclusive). The problem is that you get a little communist
>about the FIFO-ness when you use a_w_q_exclusive() and that can *SERIOUSLY*
>harm a task that must approach real-time behavior.
>

I think it had better be FIFO for now. If its not, you're
making the worst case latency worse. It requires a lot of
careful testing to get something like that working right.

You have some good ideas, and quite possibly they would be
worth implementing, but the behaviour of the code is quite
complex, especially when you take into account its affect
on the io scheduler.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/