Re: [Lse-tech] Re: NUMA scheduler 2nd approach

Michael Hohnbaum (michael@hbaum.com)
13 Jan 2003 21:50:46 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Stephen Rothwell: "Re: [2.5.55, APM] running "apm -s" doesn't sleep"
Previous message: Richard Stallman: "Re: Why is Nvidia given GPL'd code to use in closed source drivers?"

On Mon, 2003-01-13 at 20:45, Andrew Theurer wrote:
> > Erich,
> >
> > I played with this today on my 4 node (16 CPU) NUMAQ. Spent most
> > of the time working with the first three patches. What I found was
> > that rebalancing was happening too much between nodes. I tried a
> > few things to change this, but have not yet settled on the best
> > approach. A key item to work with is the check in find_busiest_node
> > to determine if the found node is busier enough to warrant stealing
> > from it. Currently the check is that the node has 125% of the load
> > of the current node. I think that, for my system at least, we need
> > to add in a constant to this equation. I tried using 4 and that
> > helped a little.
>
> Michael,
>
> in:
>
> +static int find_busiest_node(int this_node)
> +{
> + int i, node = this_node, load, this_load, maxload;
> +
> + this_load = maxload = atomic_read(&node_nr_running[this_node]);
> + for (i = 0; i < numnodes; i++) {
> + if (i == this_node)
> + continue;
> + load = atomic_read(&node_nr_running[i]);
> + if (load > maxload && (4*load > ((5*4*this_load)/4))) {
> + maxload = load;
> + node = i;
> + }
> + }
> + return node;
> +}
>
> You changed ((5*4*this_load)/4) to:
> (5*4*(this_load+4)/4)
> or
> (4+(5*4*(this_load)/4)) ?

I suppose I should not have been so dang lazy and cut-n-pasted
the line I changed. The change was (((5*4*this_load)/4) + 4)
which should be the same as your second choice.
>
> We def need some constant to avoid low load ping pong, right?

Yep. Without the constant, one could have 6 processes on node
A and 4 on node B, and node B would end up stealing. While making
a perfect balance, the expense of the off-node traffic does not
justify it. At least on the NUMAQ box. It might be justified
for a different NUMA architecture, which is why I propose putting
this check in a macro that can be defined in topology.h for each
architecture.
>
> Finally I added in the 04 patch, and that helped
> > a lot. Still, there is too much process movement between nodes.
>
> perhaps increase INTERNODE_LB?

That is on the list to try. Martin was mumbling something about
use the system wide load average to help make the inter-node
balance decision. I'd like to give that a try before tweaking
ITERNODE_LB.
>
> -Andrew Theurer
>
Michael Hohnbaum
hohnbaum@us.ibm.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stephen Rothwell: "Re: [2.5.55, APM] running "apm -s" doesn't sleep"
Previous message: Richard Stallman: "Re: Why is Nvidia given GPL'd code to use in closed source drivers?"