Re: [patch] severe softirq handling performance bug, fix, 2.4.5

Andrea Arcangeli (andrea@suse.de)
Sun, 27 May 2001 19:56:19 +0200


On Sun, May 27, 2001 at 10:17:22AM -0700, David S. Miller wrote:
> I still fail to understand, won't the C code in do_IRQ() handle
> this case as well? What is so special about returning from an
> interrupt to the idle task on x86? And what about that special'ness
> makes the code at the end of do_IRQ() magically not run?

Nothing special of course, I don't like making it special infact,
everything is fine and nothing changes unless the underlined check
(part of the softirq patch) doesn't trigger:

if (active) {
struct softirq_action *h;

restart:
/* Reset active bitmask before enabling irqs */
softirq_active(cpu) &= ~active;

local_irq_enable();

h = softirq_vec;
mask &= ~active;

do {
if (active & 1)
h->action(h);
h++;
active >>= 1;
} while (active);

local_irq_disable();

active = softirq_active(cpu);
if ((active &= mask) != 0)
goto retry;
}
if (softirq_active(cpu) & softirq_mask(cpu)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I mean everything is fine until the same softirq is marked active again
under do_softirq, in such case neither the do_softirq in do_IRQ will
run it (because we are in the critical section and we hold the per-cpu
locks), nor we will run it again ourself from the underlying do_softirq
to avoid live locking into do_softirq.

When the check triggers the ksoftirq patch just wakeup the kernel daemon
so the softirq flood will be balanced by the scheduler, possibly it
could keep running all the time if the machine is idle, like if we would
not mask &= ~active in the do_softirq, but without risks of live locks
because the scheduler will be fair.

if (softirq_active(cpu) & softirq_mask(cpu)) {
/*
* we cannot loop indefinitely here to avoid userspace starvation,
* but we also don't want to introduce a worst case 1/HZ latency
* to the pending events, so lets the scheduler to balance
* the irq load for us.
*/
struct task_struct * tsk = ksoftirqd_task(cpu);
if (tsk && tsk->state != TASK_RUNNING)
wake_up_process(tsk);
}

> Andrea, I think you are talking about a deeper and different problem.

That is the exactly same problem pointed about by Ingo with the idle
task as far I can tell.

If the machine is idle waiting the next interrupt before running the
softirq is even worse because we definitely waste cpu, it's not only a
latency issue in that case, but the problematic is the same if the
machine is loaded and we don't run the softirq again because it was
marked active under us.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/