Re: Latency: allowing resheduling while holding spin_locks

george anzinger (george@mvista.com)
Mon, 15 Jan 2001 19:09:14 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Andre Hedrick: "Re: IDE not fully found (2.4.0)"
Previous message: H. Peter Anvin: "Re: [PATCH] i386/setup.c cpuinfo notsc"

Roger Larsson wrote:
>
> On Sunday 14 January 2001 01:06, george anzinger wrote:
> > Nigel Gamble wrote:
> > > On Sat, 13 Jan 2001, Roger Larsson wrote:
> > > > A rethinking of the rescheduling strategy...
> > >
> > > Actually, I think you have more-or-less described how successful
> > > preemptible kernels have already been developed, given that your
> > > "sleeping spin locks" are really just sleeping mutexes (or binary
> > > semaphores).
> > >
> > > 1. Short critical regions are protected by spin_lock_irq(). The maximum
> > > value of "short" is therefore bounded by the maximum time we are happy
> > > to disable (local) interrupts - ideally ~100us.
> > >
> > > 2. Longer regions are protected by sleeping mutexes.
> > >
> > > 3. Algorithms are rearchitected until all of the highly contended locks
> > > are of type 1, and only low contention locks are of type 2.
> > >
> > > This approach has the advantage that we don't need to use a no-preempt
> > > count, and test it on exit from every spinlock to see if a preempting
> > > interrupt that has caused a need_resched has occurred, since we won't
> > > see the interrupt until it's safe to do the preemptive resched.
> >
> > I agree that this was true in days of yore. But these days the irq
> > instructions introduce serialization points and, me thinks, may be much
> > more time consuming than the "++, --, if (false)" that a preemption
> > count implemtation introduces. Could some one with a knowledge of the
> > hardware comment on this?
> >
> > I am not suggesting that the "++, --, if (false)" is faster than an
> > interrupt, but that it is faster than cli, sti. Of course we are
> > assuming that there is <stuff> between the cli and the sti as there is
> > between the ++ and the -- if (false).
> >
>
> The problem with counting scheme is that you can not schedule inside any
> spinlock - you have to split them up. Maybe you will have to do that anyway.
> But if your RT process never needs more memory - it should be quite safe.
>
> The difference with a sleeping mutex is that it can be made lazier - keep it
> in the runlist, there should be very few...
>
Nigel and I agree on the approach he has layed out with the possible
exception of just how to handle the short spinlocks. It is agreed that
we can not preempt a task that has a spinlock. He suggests that the
overhead of testing for preemption on the exit of a spinlock protected
with the preempt_count is higher than the cost of turning off and on the
interrupt system. He may well be right, and surly was right 5 or 10
years ago. Today the cost of an cli or sti is much higher relative to
the memory references, especially if we don't need to make the result
visible to other processors (and we don't). We only have to serialize
WRT our own interrupt system, but the interrupt itself will do this, and
only when we need it.

snip

WRT your patch, A big problem with simple sleeping mutexes is that of
priority inversion. An example:

Given tasks L of low priority, M of medium, and H of high and X a mutex.
If L is holding X when it is preempted by M and
M wants to run a long time....
Then when H preempts M and trys to get X it will have to wait while M
does his thing, just because L can not get the cycles needed to get out
of X.

A priority inherit mutex (pi_mutex) handles this by, when H trys to get
X, boosting the priority of L (the holder of X) to its own priority
until L releases X. At this point L reverts to its prior priority and H
continues, now having suceeded in getting X. This is all complicated,
of course, by remembering that a task can hold several mutexes at a time
and each can have several waiters.

list looking for someone to wake up. We should know who to wake up from
the getgo. Likewise, clutter in the run_list adds wasted cycles and
cache lines to the schedule process.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andre Hedrick: "Re: IDE not fully found (2.4.0)"
Previous message: H. Peter Anvin: "Re: [PATCH] i386/setup.c cpuinfo notsc"