Re: [Lse-tech] [PATCH 1/2] node affine NUMA scheduler

Martin J. Bligh (mbligh@aracnet.com)
Tue, 24 Sep 2002 14:17:38 -0700


>> > 2: I have no idea how tasks sharing the mm structure will behave. I'd
>> > like them to run on different nodes (that's why node_mem is not in mm),
>> > but they could (legally) free pages which they did not allocate and
>> > have wrong values in node_mem[].
>>
>> Yes, that really ought to be per-process, not per task. Which means
>> locking or atomics ... and overhead. Ick.
>
> Hmm, I think it is sometimes ok to have it per task. For example OpenMP
> parallel jobs working on huge arrays. The "first-touch" of these arrays
> leads to pagefaults generated by the different tasks and thus different
> node_mem[] arrays for each task. As long as they just allocate memory,
> all is well. If they only release it at the end of the job, too. This
> probably goes wrong if we have a long running task that spawns short
> living clones. They inherit the node_mem from the parent but pages
> added by them to the common mm are not reflected in the parent's node_mem
> after their death.

But you're left with a choice whether to base it on the per-task or
per-process information when you make decisions.

1. per-process requires cross-node collation for a data read. Bad.

2. per-task leads to obviously bad decision cases when there's significant
amounts of shared data between the tasks of a process (which was often
the point of making them threads in the first place).

Yes, I can imagine a situation for which it would work, as you describe
above ... but that's not the point - this is a general policy, and I
don't think it works in general ... as you say yourself "it is
*sometimes* ok" ;-)

> The first patch needs a correction, add in load_balance()
> if (!busiest) goto out;
> after the call to find_busiest_queue. This works alone. On top of this
> pooling NUMA scheduler we can put the node affinity approach that fits
> best. With or without memory allocation. I'll update the patches and
> their setup code (thanks for the comments!) and resend them.

Excellent! Will try out the correction above and get back to you.
Might be a day or so, I'm embroiled in some other code at the moment.

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/