O(1) Scheduler (tuning problem/live-lock)

Jim Houston (jim.houston@attbi.com)
Fri, 6 Sep 2002 14:44:15 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Dave Hansen: "Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000"
Previous message: John Levon: "Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000"

The current O(1) scheduler heuristics for calculating sleep_avg
and assigning process priorities allows a parent and a small group of
compute bound child processes to live-lock the system.
We found this problem running a stress test including the LTP
test suite. In particular the waitpid06 test in the LTP triggered
this problem. We are working with a 2.4.18 kernel with a backport of
the O(1) scheduler, but this problem is present in Linux-2.5.32.

How it happens.

The waitpid06 test forks off 8 child processes. Each child enters
an infinite loop waiting for a signal from the parent. Yes, it's
a stupid test program. The parent (if it gets to run) immediately sends
a signal to each child process and then does a wait() call for each child.

The parent process spends all of its time in wait(). When a child
exits, the parents sleep_avg is adjusted twice.

In sched_exit():
parent->sleep_avg = ((3*parent->sleep_avg) + child->sleep_avg)/4;

In activate_task():
p->sleep_avg += sleep_time;
if (p->sleep_avg > MAX_SLEEP_AVG)
p->sleep_avg = MAX_SLEEP_AVG;

The child->sleep_avg is set initially to 95% of the parent->sleep_avg.
The child->sleep_avg for the running child is decremented in
scheduler_tick(). If you have fewer processors than child processes,
child->sleep_avg will, on average, decrease less than 1 each tick.

The effect is that the parent sleep_avg will approach MAX_SLEEP_AVG giving
it and its children a favorable interactive priority.
Since these processes are judged interactive they go back into the active
array when they use up their time slice but still with a favorable priority
and a new time quantum.

The problem is easy to reproduce with the waitpid06 test. It provides
options so that you can loop repeating the test and also run multiple
copies at once. I have been using:

waitpid06 -c 8 -i 10000

This runs 8 copies of the test (64 unruly child processes) and loops
10,000 times. I also run a top(1) and a:
while true ; do date; sleep 1; done
loop so I can tell if the system has locked up. This sometimes takes
a few minutes.

How do we fix this?

I'm just getting started playing with the code. When I tried changing the
EXIT_WEIGHT to 1, the problem still happened. I tried changing
sched_exit to give the parent the minimum of the two sleep_avg values.
This seems to fix the problem. I suspect that this is really a symptom
of a larger problem, that the system can be over-commited with processes
which are all judged interactive never putting processes in the expired
array and so never triggering the EXPIRED_STARVING case.

Please CC: me on answers/comments since I read the archives.

Jim Houston - Concurrent Computer Corp.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dave Hansen: "Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000"
Previous message: John Levon: "Re: Early SPECWeb99 results on 2.5.33 with TSO on e1000"