Re: [PATCH] set_cpus_allowed() for 2.4

Andrew Morton (akpm@digeo.com)
Mon, 02 Dec 2002 11:30:05 -0800


"Martin J. Bligh" wrote:
>
> > now that all commercial vendors ship a backport of Ingo's O(1)
> > scheduler external projects like XFS have to track those projects
> > in addition to the mainline kernel.
>
> There was talk of merging the O(1) scheduler into 2.4 at OLS.
> If every distro has it, and 2.5 has it, and it's been around for
> this long, I think that proves it stable.
>

I have observed two problems with the new scheduler, both serious IMO:

1) Changed sched_yield() semantics. sched_yield() has changed
dramatically, and it can seriously impact existing applications.
A testcase (this is on 2.5.46, UP, no preempt):

make -j3 bzImage
wait 30 seconds
^C
make clean (OK, it's all in cache)
start StarOffice 5.2
make -j3 bzImage
wait 5 seconds
now click on the SO5.2 `File' menu.

It takes ~15 seconds for the menu to appear, and >30 seconds for
it to go away. The application is wholly unusable for the duration
of the compilation.

This is because StarOffice is spinning on sched_yield(). Rumour has
it that this is happening inside the pthread library.

This will affect other things, both in-kernel and out. This includes
ext3, which uses yield() in its transaction batching. ext3's fsync()
operation performs dreadfully with the new yield() if there are
compute-intensive things happening at the same time. If people are
shipping that sched_yield() implementation without having changed ext3,
then they will receive bug reports against this.

Arguably, the new sched_yield() is correct and the old one wasn't,
but the effects of this change make it unsuitable for a 2.4 merge.

2) The interactivity estimator makes inappropriate decisions.

Test case:

start a kernel compile as above
grab an xterm and waggle it about a lot.

The amount of waggling depends on the video hardware (I think). One
of my machines (nVidia NV15) needs a huge amount of vigorous waggling.
Another machine (voodoo III) just needs a little waggle.

When you've waggled enough, the scheduler decides that the X server
is a `batch' process and schedules it alongside the background
compilation. Everything goes silly. The mouse cursor sticks stationary
for 0.5-1.0 seconds, then takes great leaps across the screen. Unusable.
You have to stop using the machine for five seconds or so, wait for the
X server to flip back into `interactive' mode.

This also affects netscape 4.x mailnews. Start the kernel compile,
then select a new (large) folder. Netscape will consume maybe one
second CPU doing the initial processing on that folder, which is
enough for the system to decide it's a "batch" process. The user
interface seizes up and you have to wait five seconds for it to be
treated as an "interactive" process again before you can do anything.

It also affects gdb. Start a kernel compile, then run `gdb vmlinux'.
The initial processing which gdb does on the executable is enough for it
to be treated as a batch process and the subsequent interactive session
is comatose for several seconds. Same deal.

This one needs fixing in 2.5. Please. It's very irritating.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/