RE: [Lse-tech] [RFC] per cpu slab fix to reduce freemiss

Luck, Tony (tony.luck@intel.com)
Thu, 1 Aug 2002 14:31:01 -0700


> Furthermore I think this design does not take into consideration of
> multiprocessor issues such as cache-bouncing, cache-warmthness etc.,
> And also the original implementation of the slab cache in the Linux
> kernel did not have per cpu support (I am not sure if the paper takes
> into consideration of SMP etc., also). So this assumption needs to be
> examined in the light of SMP, NUMA etc., I would like to explore the
> possibility of changing this assumption if possible in lieu of SMP/NUMA
> cache effects.
>
> In the present design there is a limit on how many free objects are held
> in the per cpu array. So when an object is freed it might end in another
> cpu more often. The main cost lies in memory latency than execution of
> initializing the fields. I doubt if we get the same gain as explained in
> the paper by preserving the fields between uses on an SMP/NUMA machines.

Bonwick has a newer paper
(http://www.usenix.org/events/usenix01/bonwick.html)
that describes how per cpu support can be added. I've forgotten my Usenix
password, so I can't get the full text of the paper online at the moment.
But, if I recall correctly his magazine layer included support to
dynamically
adjust the size of the per-cpu lists.

The question becomes: Are the performance benefits high enough to justify
this extra code complexity? Especially as tuning using /proc/slabinfo is
already available to mitigate problems that are bad enough for people to
notice.

Can you quantify the SMP/NUMA benefits? I took some measurements a while
ago that showed that a huge percentage of slab allocations were freed by the
same cpu after a very short lifetime. I didn't look into how often the
problems that you cite occur.

> I agree that preserving read only variables that can be used between uses
> will help performance. We still can do that by revising the assumption to
> leave the first 4 or whatever bytes needed to store the links. What do you
> think?

You'd need enough bytes to store your pointer (so "whatever" == 8 on 64-bit
architectures). Users that care to arrange the fields of their structures
in "used together" order for better cache locality tend to put there efforts
into the first elements of a structure. You might get less resistance to
change
if you use the tail end of the object? But this is a potentially big
change.
Drivers can create their own slab caches, and if you change the semantics,
then
you may well break something.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/