Re: [patch[ Simple Topology API

Eric W. Biederman (ebiederm@xmission.com)
14 Jul 2002 20:34:51 -0600


Andi Kleen <ak@suse.de> writes:
>
> At least on Hammer the latency difference is small enough that
> caring about the overall bandwidth makes more sense.

I agree. I will have to look closer but unless there is more
juice than I have seen in Hyper-Transport it is going to become
one of the architectural bottlenecks of the Hammer.

Currently you get 1600MB/s in a single direction. Not to bad.
But when the memory controllers get out to dual channel DDR-II 400,
the local bandwidth to that memory is 6400MB/s, and the bandwidth to
remote memory 1600MB/s, or 3200MB/s (if reads are as common as
writes).

So I suspect bandwidth intensive applications will really benefit
from local memory optimization on the Hammer. I can buy that the
latency is negligible, the fact the links don't appear to scale
in bandwidth as well as the connection to memory may be a bigger
issue.

> > And then you associate that zone-list with the process, and use that
> > zone-list for all process allocations.
>
> That's the basic idea sure for normal allocations from applications
> that do not care much about NUMA.
>
> But "numa aware" applications want to do other things like:
> - put some memory area into every node (e.g. for the numa equivalent of
> per CPU data in the kernel)
> - "stripe" a shared memory segment over all available memory subsystems
> (e.g. to use memory bandwidth fully if you know your interconnect can
> take it; that's e.g. the case on the Hammer)

The latter I really quite believe. Even dual channel PC2100 can
exceed your interprocessor bandwidth.

And yes I have measured 2000MB/s memory copy with an Athlon MP and
PC2100 memory.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/