CPU scheduler question: processes created faster than

Nicolae P. Costescu (nick@strongholdtech.com)
Thu, 18 Apr 2002 16:06:36 -0400


First, thanks to all for doing a great job with the Linux kernel. We
appreciate your work!

I've been seeing this behavior on our servers (2.4.2 and 2.4.18 kernel) and
would like to ask your opinion. I have read the linux internals doc.
section on the scheduler, and have read through the sched.c source code.

I have a certain # of client processes on various machines that connect to
a linux server box. A forking server (DAServer.t) waits for client
connections, then forks a copy to handle the client request. Each forked
DAServer.t connects to three database server back ends (postgres), all of
which fork from a postmaster process.

So each client connection causes 4 forks on the server.

The forked server does some work for the client, communicating with the 3
database backends, then replies to the client, and the client disconnects.
Now the server does some cleanup and exist.

At the point where the server replied to the client, the client disconnects
and is ready to send another message to the master server, which will cause
another 4 forks, etc.

Given a fixed # of clients (say 14) I'd expect the # of processes to have
some upper bound, but what I see is that sometimes the # of sleeping
processes will grow unbounded (limited only by the 512 limit on the # of
database backends).

I've logged load average, # of processes, # of DAServer.t processes, # of
database server and plotted these at

http://www.strongholdtech.com/plots/

The # of sleeping processes will grow to say 900, and then suddenly 140 or
them will become ready to run, run for a few seconds (driving load avg to
around 100) and then disappear. Then shortly the system returns to normal.

Swap is not an issue, we have 1 gig RAM and 2 gig swap, and the swap isn't
used, we usually only have about 400 meg in use when we hit the 900 process
mark.

Is this just bad design on our part, or is there something in the CPU
scheduler that leads to this behavior - where processes are started quicker
than they die?

This problem is only exacerbated on a dual CPU box (goes unstable quicker).

Any suggestions?

We're going to try throttling the forked DAServer.t when the # of sleeping
processes is large, making it sleep so that hopefully the other processes
(some of which hang around in the sleeping state for 30 minutes or longer)
will be able to run and die.

Thanks for your help,
Nick
****************************************************
Nicolae P. Costescu, Ph.D. / Senior Developer
Stronghold Technologies
46040 Center Oak Plaza, Suite 160 / Sterling, Va 20166
Tel: 571-434-1472 / Fax: 571-434-1478

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/