Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)

Andrew Morton (akpm@digeo.com)
Fri, 03 Jan 2003 13:32:27 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Scott Robert Ladd: "RE: Why is Nvidia given GPL'd code to use in closed source drivers?"
Previous message: Larry McVoy: "Re: Nvidia and its choice to read the GPL "differently""
Maybe in reply to: Aniruddha M Marathe: "[BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Next in thread: Andrew Morton: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Next in thread: Andrew Morton: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"

Andi Kleen wrote:
>
> Andrew Morton <akpm@digeo.com> writes:
> >
> > The teeny little microbenchmarks are telling us that the rmap overhead
> > hurts, that the uninlining of copy_*_user may have been a bad idea, that
> > the addition of AIO has cost a little and that the complexity which
> > yielded large improvements in readv(), writev() and SMP throughput were
> > not free. All of this is already known.
>
> If you mean the signal speed regressions they caused - I fixed
> that on x86-64 by inlining 1,2,4,8,10(used by signal fpu frame),16.
> But it should not use the stupud rep ; ..., of the old ersio but direct
> unrolled moves.

Yes, that would help a bit. We should do that for ia32. It's a little
worrisome that the return value from such a copy_*_user() implementation
will be incorrect - it is supposed to return the number of uncopied bytes.
Probably doesn't matter.

Most of the optimisation opportunities wrt signal delivery were soaked up
by replacing the copy_*_user() calls with put_user() and friends.

We could speed up signals heaps by re-lazying the fpu state storage in
some manner.

> x86-64 version in include/asm-x86_64/uaccess.h, could be ported
> to i386 given that movqs need to be replaced by two movls.
>
> -Andi
>
> P.S.: regarding recent lmbench slow downs: I'm a bit
> worried about the two wrmsrs which are in the i386 context switch
> in load_esp0 for sysenter now. Last time I benchmarked WRMSRs on
> Athlon they were really slow and knowing the P4 it is probably
> even slower there. Imho it would be better to undo that patch
> and use Linus' original trampoline stack.

hm. How slow? Any numbers on that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Scott Robert Ladd: "RE: Why is Nvidia given GPL'd code to use in closed source drivers?"
Previous message: Larry McVoy: "Re: Nvidia and its choice to read the GPL "differently""
Maybe in reply to: Aniruddha M Marathe: "[BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Next in thread: Andrew Morton: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"
Next in thread: Andrew Morton: "Re: [BENCHMARK] Lmbench 2.5.54-mm2 (impressive improvements)"