Figure 1000 runnable process, in sets of two or even sets of ten.
The subsets should be scheduled together if possible, but still
smeared across all processors. No one is suggesting that a UP machine
will always outperform an SMP machine. ;-)
As was mentioned in another aspect of this thread, this is different
from explicit user-specified CPU affinity in that there are more processes
than a user wants to allocate explicitly to CPUs. Instead, the scheduler
can do load balancing by moving/migrating sets of processes from an
overloaded CPU to a less loaded CPU. However, the cache effects can
make a difference of something like 20-50% of overall throughput in
a fairly intensive data sharing workload like this.
> > As I recall it made a significant difference in Oracle performance, and
> > would probably also translate to similar performance in many situations
> > where you had a client and server process doing lots of interaction in
> > an SMP environment.
>
> I've certainly seen a "significant difference" between uni and SMP, but it
> was always in the other direction. Is this particular to some hardware, or
> running multiple servers somehow? I'm only fmailiar with Linux, AIX and
> Solaris, maybe this is Sequent magic? Or were you talking about having
> only one client total on the machine and just making that run fast?
This is an SMP thing, which also benefits NUMA pretty dramatically.
And this is about how processes are scheduled, and how hints can
be provided to the scheduler. It also relates to the overhead of
cache invalidation, the size of CPU caches, etc. Sequent's hardware
might have seen a bigger improvement from this type of change than
other types of hardware might. Or vice versa.
gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/