Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)

Christian Ehrhardt (ehrhardt@mathematik.uni-ulm.de)
Fri, 16 Aug 2002 16:17:18 +0200


On Thu, Aug 15, 2002 at 02:33:02AM +0100, Alan Cox wrote:
> Thanks - your analysis is informative to say the least. It looks like
> the PIV load balancing code is the problem.

A few more observations and maybe a solution for the problem
(kernel is 2.4.20-pre1-ac3):

I ran Richard Gooch's scheduler benchmark[1] as a normal user with
num_running set to 50 (./a.out 50). The box locks up hard within a few
seconds. There is no Ooops, Magic-SysRq doesn't work anymore, neither does
console switching or Ctrl-Alt-Delete. NMI-Watchdog is enabled and failed
to reboot the box.

This suggests that sched_yield, nice or sched_setscheduler is involved
with sched_yield beeing #1 candidate.

Further investigation showed that adding BUG_ON(p->array != array);
in dequeue_task would have given some interesting results.
At least the following is quite possible and doesn't even require SMP:

Task Interrupt
Calls sys_sched_yield
======> Timer Interrupt
Timer Interrupt decreases times lice,
the time slice expires and the task is
moved to the expired array.
Continues with yield.
Assume current->prio == MAX_PRIO-1,
current->time_slice <= 1 is satisifed
anyway, i.e. wie do:
dequeue_task(current, rq->active);
enqueue_task(current, rq->expired);
However, the task has already been moved
from the active to the expired array
by the timer interrupt, i.e.
dequeue_task and enqueue_task will get
the nr_active counts and the bitmaps
wrong because they remove the task from
the wrong array. --> BOOOM

The (untested) patch below should correct this problem along with
a locking oddity (last hunk) that IMHO either needs fixing or a BIG
comment. Be prepared for a few (up to 4) lines of fuzz due to additional
BUG_ONs in both versions of the file.

regards Christian Ehrhardt

[1] http://www.atnf.csiro.au/people/rgooch/benchmarks/linux-scheduler.html

--- /usr/src/linux-2.4.20-pre1-ac3/kernel/sched.c Thu Aug 15 20:03:01 2002
+++ sched.c Fri Aug 16 16:15:57 2002
@@ -769,7 +772,7 @@
set_tsk_need_resched(p);

/* put it at the end of the queue: */
- dequeue_task(p, rq->active);
+ dequeue_task(p, p->array);
enqueue_task(p, rq->active);
}
goto out;
@@ -785,7 +788,7 @@
if (p->sleep_avg)
p->sleep_avg--;
if (!--p->time_slice) {
- dequeue_task(p, rq->active);
+ dequeue_task(p, p->array);
set_tsk_need_resched(p);
p->prio = effective_prio(p);
p->time_slice = TASK_TIMESLICE(p);
@@ -1396,7 +1399,7 @@
*/
if (likely(current->prio == MAX_PRIO-1)) {
if (current->time_slice <= 1) {
- dequeue_task(current, rq->active);
+ dequeue_task(current, array);
enqueue_task(current, rq->expired);
} else
current->time_slice--;
@@ -1411,7 +1414,7 @@
list_add_tail(&current->run_list, array->queue + current->prio);
__set_bit(current->prio, array->bitmap);
}
- spin_unlock(&rq->lock);
+ rq_unlock (rq);

schedule();

-- 
THAT'S ALL FOLKS!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/