> 2) Provide a somewhat obvious patch that makes the current
> __get_request_wait call significantly more fair, in hopes of either
> blaming it for the stalls or removing it from the list of candidates
Without the a_w_q_exclusive() on add_wait_queue the FIFO effect is lost when
all the members of the wait queue compete for their timeslice in the
scheduler. For all intents and purposes the fairness goes up some (you stop
having the one guy sorted to the un-happy end of the disk) but low priority
tasks will still always end up stalled on the dirty end of the stick.
Basically each new round at the queue-empty moment is a mob rush for the
With the a_w_q_exclusive(), you get past fair and well into anti-optimal.
Your FIFO becomes essentially mandatory with no regard for anything but the
order things hit the wait queue. (Particularly on an SMP machine, however)
"new requestors" may/will jump to the head of the line because they were
never *in* the wait queue. So you have only achieved "fairness" with
respect to requests that come in to a io queue that was full-at-the-time of
the initial entry into the driver. This becomes exactly like the experience
of waiting patiently on line to get off the highway and watching all the
rude people driving by you only to cut over and nose into the queue just at
the exit sign.
So you need the _exclusive if you want any kind of predictable fairness
(without getting into anything obscure) but it is still only "fair" for
those that were unfortunate enough to end up on the wait queue originally.
There is a small window for tasks to butt in freely.
> 3) fix the stalls
Without the _exclusive() you can't have fixed the stalls, you can only have
moved the locus-of-blame to the scheduler which may (or may not) have some
way to compensate and "fake fairness" built in by coincidence.
The thing I suggest in my other email, where you use the non-exclusive
version of the routine but temporarily bump the process priority each time a
request gets foisted off on the wait_queue instead of the IO queue, actually
has semantic fairness built in. This basically builds a fairness elevator
that functions both over time-in-queue and original process priority (when
built into your basic patch).
It's also quite space/time efficient and fairly clear to reader and
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/