Sharing a runqueue for all processors on a node of a NUMA system has the
drawback of not accounting for cache warmth for processes. Ideally, for
a NUMA system there should continue to be individual runqueues per cpu
(or per set of HT processors), and then a grouping of runqueues at the
node level. At load balancing, priority should be to redispatch on the
same processor, followed by on the same node. The pain threshold for
crossing the node boundary will vary depending on the NUMA-ness of the
hardware, so it would be good to account for this in the scheduler.
Erich Focht has a large patch to the O(1) scheduler that implements
this type of scheduling hierarchy. Have you had an opportunity to look
it over? What do you think about getting portions of this into the
O(1) scheduler?
>
> Testreports, comments, suggestions welcome,
>
> Ingo
>
Michael Hohnbaum
hohnbaum@us.ibm.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/