Re: [PATCH 4/3] Replace dynamic percpu implementation

Ravikiran G Thirumalai (kiran@in.ibm.com)
Thu, 22 May 2003 16:19:44 +0530


On Thu, May 22, 2003 at 06:36:31PM +1000, Rusty Russell wrote:
> In message <20030522081423.GC27614@in.ibm.com> you write:
> > 4. Extra dereferences in alloc_percpu were not significant, but alloc_percpu
> > was interlaced and kmalloc_percpu_new wasn't. Insn profile seemed to
> > indicate extra cost in memory dereferencing of alloc_percpu was
> > offset by the interlacing/objects sharing the same cacheline part.
> > but then insn profiles are only indicative...not accurate.
>
> Interesting: personally I consider the cacheline sharing a feature,
> and unless you've done something special, the static declaration
> should be interlaced too, no?

Yes, the static declartion was interlaced too. What I meant to say is that
cacheline sharing feature helped alloc_percpu/static percpu, compensate
for the small extra memory reference cost in getting __percpu_offset[]
when you compare with kmalloc_percpu_new.

>...
> Aside: if kmalloc_percpu uses the per-cpu offset too, it probably
> makes sense to make the per-cpu offset to a first class citizen, and
> smp_processor_id to be derived, rather than the other way around as at
> the moment. This would offer further speedup by removing a level of
> indirection.
>
> If you're interested I can probably produce such a patch for x86...

Sure, it might help per-cpu data but will it cause performance
regression elsewhere? (other users of smp_processor_id). I can run it
through the same tests and find out. Maybe it'll make good paper material
for later? ;)

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/