Re: userspace irq balancer

Andrew Theurer (habanero@us.ibm.com)
Tue, 20 May 2003 09:35:22 -0500


On Tuesday 20 May 2003 09:21, Jeff Garzik wrote:
> On Tue, May 20, 2003 at 09:07:41AM -0500, Andrew Theurer wrote:
> > On Tuesday 20 May 2003 01:40, David S. Miller wrote:
> > > From: Dave Hansen <haveblue@us.ibm.com>
> > > Date: 19 May 2003 23:36:23 -0700
> > >
> > > I don't even think we can do that. That code was being integrated
> > > around the same time that our Specweb setup decided to go south on
> > > us and start physically frying itself.
> > >
> > > This gets more amusing by the second. Let's kill this code
> > > already. People who like the current algorithms can push
> > > them into the userspace solution.
> >
> > Remember this all started with some idea of "fairness" among cpus and
> > very little to do with performance. particularly on P4 with HT, where
> > the first logical cpu got all the ints and tasks running on that cpu were
> > slower than other cpus. This was in most cases the highest performing
> > situation, -but- it was unfair to the tasks running on cpu0. irq_balance
> > fixed this with a random target cpu that was in theory supposed to not
> > change often enough to preserve cache warmth. In practice is the target
> > cpus changed too often which thrashed cache and the HW overhead of
> > changing the destination that often was way way to high.
>
> You call that a fix? ;-) I call that working around a bug.
>
> If tasks run slower on cpuX than cpuY because of a heavier int load,
> that's the fault of the scheduler not the irqbalancer, be it in-kernel
> or userspace. If there's a lesser-utilized cpu the task needs to be
> migrated to that cpu from the irq-loaded one, when CPU accounting
> notices the kernel interrupt handling having an impact.

On paper it sounds good but it may be more difficult (and not perfect) in
practice. For example, if you have one runnable task per cpu, moving it to
another cpu would not help. To preserve fairness, all you could do is swap
the tasks around, which would thrash the cache pretty well. Also, the int
load may be very dynamic, so you need to be really careful and make sure you
are measuring sustained int load.

However I do agree, we need something in the scheduler, in particular
something to modify max_load and load in find_busiest_queue, based on a int
load average for those cpus.

-Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/