It helps by keeping the task in the same node if it cannot keep it in
the same cpu anymore.
Assume task A is sleeping and it last run on cpu 8 node 2. It gets a wakeup
and it gets running and for some reason cpu 8 is busy and there are other
cpus idle in the system. Now with the current scheduler it can be moved in any
cpu in the system, with the numa sched applied we will try to first reschedule
it in the idles cpus of node 2 for example. The per-node runqueue are mainly
necessary to implement the heuristic.
> cpus on nodes where they have allocated most of their memory on. I am
> not sure what the situation will be under huge loads though.
after all cpus are busy we try to reschedule only on the cpus of the local
node, that's why it can generate some unbalance yes, but it will tend to
rebalance over the time because some node will end with all tasks with
zero counter first if it's less loaded, and so then it will start
getting tasks with has_cpu 0 in the runqueues out of other nodes.
You may want to give it a try on your machines and see what difference it
makes, I'd be curious to know of course.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/