Re: [PATCH] linux-2.5.43_vsyscall_A0

Elladan (elladan@eskimo.com)
Sat, 19 Oct 2002 23:44:33 -0700


On Sun, Oct 20, 2002 at 04:59:14AM +0200, Andi Kleen wrote:
> On Sat, Oct 19, 2002 at 06:16:59AM +0200, Andrea Arcangeli wrote:
> > see my last email. And I think he needed it as an additional syscall
> > after execve that he could trap and revirtualize with ptrace as usual
> > and that would return variable addresses of pointer to functions (that
> > would be revirtualized inside the uml kernel of course), not an ELF
> > information that should be valid for both UML and host kernel.
>
> Implementing it per process is tricky. How do you access the per process
> state in the vsyscall area ? To do it properly you would need one dedicated
> page per mm_struct that is mapped in there. But it could not be in the
> normal vsyscall area (otherwise you couldn't share the kernel pagetables
> anymore), but somewhere else in the address space.
>
> I think a global sysctl that just modifies the global vsyscall pages is more
> suitable here. It avoids the overhead of needing a per process page.
> I see no real need anyways to do it per process. When you have one process
> that cannot deal with vsyscalls the whole system will get a bit slower,
> but the slowdown shouldn't be noticeable anyways. If you run your highend
> database which does thousands of gettimeofday each second just don't set
> the sysctl.

The problem with modifying the executable code/pages in the vsyscall
area is that it's going to be very tricky to implement, if I understand
this discussion properly.

There may be any number of user processes idling in these pages on the
runqueue (or off it if say one received a SIGSTOP), and if you just go
change the instruction code on them, unless you're incredibly careful
and come up with some subtly safe machine code sequence, they're going
crash when you call this sysctl().

It seems like this indicates that you have to start getting crazy at
that point. That is, what you need to do is scan through all processes
on the runqueue (and also any that might be eg. frozen) and examine
their pc. If it's in the vsyscall area, either complete the system call
for them, or somehow roll-back their register state and reset their PC
to the start of the vsyscall function.

Just using a test in the vsyscall to check a variable seems like a much
cleaner global approach. It has its own problem though, since processes
that are idling in the vsyscall pages may wake up after vsyscalls have
been disabled. It seems like they could then be prone to return the
wrong result, if say the offset data was no longer being updated
properly by the kernel because of the mode change.

Making it per-process should avoid these problems nicely, at least, so
long as the process disabling vsyscalls knows what it's doing and
doesn't try to call the sysctl from a signal handler or something.

-J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/