Re: Intel P6 vs P7 system call performance

Jamie Lokier (lk@tantalophile.demon.co.uk)
Sat, 21 Dec 2002 17:18:08 +0000


Linus Torvalds wrote:
> Yes, you can make the "clobbers %eax/%edx/%ecx" argument, but the fact is,
> we quite fundamentally need to save %edx/%ecx _anyway_.

On the kernel side, yes. In the userspace trampoline, it's not required.

> but once we start using sysexit for the signal handler return path too, we
> will need to restore %edx and %ecx too, otherwise our restarted system
> call will have crap in the registers. I already wrote the code, but
> decided that as long as we don't do that kind of restarting, we shouldn't
> have the overhead in the trampoline. But basically the trampoline then
> will become
>
> system_call_trampoline:
> pushfl
> pushl %ecx
> pushl %edx
> pushl %ebp
> movl %esp,%ebp
> syscall
> 0:
> movl %esp,%ebp
> movl 4(%ebp),%edx
> movl 8(%ebp),%ecx
> syscall
>
> restart:
> jmp 0b
> sysenter_return_point:
> popl %ebp
> popl %edx
> popl %ecx
> popfl
> ret

> see? So you _have_ to really save the arguments anyway, because you cannot
> do a sysexit-based system call restart otherwise. And once you save them,
> you might as well restore them too.
>
> And since you have to restore them for system call restart anyway, you
> might as well just make it part of the calling convention.
>
> Yes, I'm thinking ahead. Sue me.

You're optimising the _rare_ case.

The correct [;)] trampoline looks like this:

system_call_trampoline:
pushl %ebp
movl %esp,%ebp
sysenter
sysenter_return_point:
popl %ebp
ret
sysenter_restart:
popl %edx
popl %ecx
movl %esp,%ebp
sysenter

This is accompanied by changing this line in arch/i386/kernel/signal.c:

regs->eip -= 2;

To this (best moved to an inline function):

if (likely (regs->eip == sysenter_return_point)) {
unsigned long * esp = (unsigned long *) regs->esp - 8;
if (!access_ok(VERIFY_WRITE, esp, 8)
|| __put_user(regs->edx, esp)
|| __put_user(regs->ecx, esp+4)) {
if (sig == SIGSEGV)
ka->sa.sa_handler = SIG_DFL;
force_sig(SIGSEGV, current);
}
regs->esp = (long) esp;
regs->eip = sysenter_restart;
} else {
regs->eip -= 2;
}

Thus the common case, system calls, are optimised. The uncommon case,
signal interrupts system call, is penalised, though it's not a large
penalty. (Much less than the gain from using sysexit!) Which is more
common?

By the way, this works with the AMD syscall instruction too.
Then the trampoline is:

system_call_trampoline:
pushl %ebp
movl %ecx,%ebp
syscall
syscall_return_point:
popl %ebp
ret
syscall_restart:
popl %edx
popl %ebp
syscall

Finally, there is no need to restore %ebp in the kernel sysexit/sysret
paths, because it will always be restored in the trampoline. So you
save 1 memory access there too.

(ps. I have a question: why does your trampoline save & restore
the flags?)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/