Re: [PATCH] linux-2.5.43_vsyscall_A0

Elladan (elladan@eskimo.com)
Tue, 22 Oct 2002 22:12:08 -0700


On Tue, Oct 22, 2002 at 09:40:06AM +0200, Andrea Arcangeli wrote:
> On Tue, Oct 22, 2002 at 12:24:38AM -0700, Elladan wrote:
> >
> > This seems somewhat painful all-around, since if I'm reading this right,
> > you take a switch_to hit to find out whether the user redirected the
> > vsyscall, and a vsyscall branch hit as well.
>
> there's no vsyscall branch hit, and no switch_to hit, just a single
> unlikely branch in switch_to, that's minor overehad, for istance the
> segmentation checks (as well unlikely) are more expensive.

It's still more expensive than nothing, so it's still penalizing the
context switch for an obscure vsyscall UML feature.

> > Just do the global flag test in the vgettimeofday code, and when it does
>
> I prefer not to have branches in vgettimeoday, it is better to have a
> single branch in switch_to where it is certainly hidden in the scheduler
> and generic context switch noise, infact if we put in the right place it
> could have zero l1 cache impact, gettimeofday call frequency may be very
> high, much higher than the context switch frequency and the size of the
> gettimeofday is much smaller than the one of the scheduler, so there's
> less stuff to hide the branch in the noise.

Both the vgettimeofday and switch_to are fast paths. So, the right
thing to do, if one of them has to take an unlikely branch penalty, is
to ask which one is entered more often.

gettimeofday() call frequency *can* be very high, but let's test it...

On my system, under a basic desktop user load (eg., browsing the web,
running folding@home), I see numbers like this (sampled every 10
seconds):

initial:
ctx 349728 gett 307946

ctx 21937 gett 11173
ctx 24791 gett 15761
ctx 2715 gett 3714
ctx 6748 gett 3789
ctx 4334 gett 2423
ctx 1575 gett 1002
ctx 4295 gett 2508
ctx 14913 gett 8220
ctx 600 gett 350
ctx 3821 gett 4860
ctx 4948 gett 7601
ctx 4800 gett 10720
ctx 3064 gett 6547
ctx 7716 gett 9592
ctx 3518 gett 4600
ctx 2760 gett 3505
ctx 4892 gett 4349
ctx 6258 gett 6547
ctx 7331 gett 9577
ctx 21701 gett 11674
ctx 3113 gett 1854
ctx 1502 gett 1063
ctx 27841 gett 14406

final:
ctx 536121 gett 454398

So, there were more context switches than calls to gettimeofday.
However, the numbers were close to each other. Any idea what the
numbers are like for other workloads?

> > This way, the main system just sees vsyscalls degrade to normal system
> > calls, which is ok, and programs that want to virtualize like UML get to
> > bounce execution into some special user-specified vsyscall code of their
> > own, with the cost being just one system call transition for UML as
> > well - a big speedup.
>
> you're optimizing the system for strace? What's the point of optimizing
> strace and penalizing the normal syscall fast path?

No, I'm penalizing strace to provide UML with a fast(er) syscall
mechanism. This is totally optional, but may be interesting for
virtualization in general. strace is not in the normal syscall fast
path, so this is a reasonable place to put optimizations for
virtualizing programs.

I'm also penalizing the vsyscall fast path, but that was just to avoid
the switch_to penalty. Since both are rather critical, here's another,
even uglier scheme, which should have no overhead on switch_to or
vgettimeofday, but adds a bit of overhead to the page fault handler
(though, with the -ENOSYS fixup mentioned in the comment there, maybe
nothing relevant).

Of course, it hurts systems which run UML with virtualized time.

Try 2:

Create a second mapping of the vsyscall page in some special location
above the normal page. Make a new sysctl, which globally invalidates
the page that the standard mapping is on. Basically, this disables
vsyscalls for everyone when turned on.

Now, obviously this won't work without some trick. What we do now is,
we make the page fault handler path for vsyscalls (to be added anyway)
work like so:

If the pc is within the allocated vsyscall page(s), then:

If the pc is on the entrypoint to a vsyscall function, check whether the
process is being traced. If so, turn this into a somewhat normal
looking syscall so it can be virtualized (or do something else, if you
want - have userspace jump somewhere, etc).

If not traced, or if the pc is not at the entrypoint, reset the pc to be
on the second vsyscall copy, with the same offset, and return to
userspace.

This lets us do a global vsyscall disable, but (I hope) fixes up the
problem of userspace going to sleep inside a vsyscall. The process
wakes up, faults, and gets shunted off to identical code in another
location, which should have the same behavior.

Downside: vgettimeofday takes a performance penalty for everyone in the
special case where UML is running with full time virtualization, because
of the page fault. This is the very unusual case, so who cares?

Downside 2: Would this actually work? It's a bit scary sounding...

-J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/