Re: Better CLONE_SETTLS support for Hammer

Ulrich Drepper (drepper@redhat.com)
Thu, 06 Mar 2003 10:58:29 -0800


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andi Kleen wrote:

> We're using %fs enabled glibc (currently with arch_prctl, but I would
> like to change that because it's slow)

It's already part of the documented ABI. Yes, if right now somebody
said r11 or whatever is declared the thread register, we could change
it. But give it a few more weeks and it'll probably be impossible.

> It's hardcoding the magic fs register in the kernel for once.

The kernel just corresponds to what userland expects. It's the same for
otherarchitectures where clone() loads the appropriate register.

> Now with it getting reloaded it would make sense to set it in the GDT
> too if possible, I agree. I'll implement that.

I'm not sure what your plans wrt to set_thread_area in 64-bit mode are
but I would strongly suggest to remove it completely.

> It will also transparent speed up the glibcs already using arch_prctl.
> I like that.

Yes.

> I can do a similar thing in clone. It unfortuately also hardcodes fs there,
> but I guess that ugly hack will be needed to get the broken NPTL design for this
> to work.

You don't understand threads. There is nothing broken or NPTL-specific
about the design. Every thread creation mechanism must be signal save.
LinuxThreads's is not, which is a bit less of a problem since the
signal handling is broken, too, and signals are unlikely to arrive at
the new thread. Nevertheless, there are certainly spurious crashes which
are hard to reproduce and explain which can be attributed to this problem.

With fixed signal handling the newly created thread needs to have a
valid and correct thread register from the first instruction on and this
can only be implemented in the kernel. Again, the other archs implement
similar dependencies. ia-64 loads the tp register with the given value.
The kernel doesn't live in a vacuum so this kind of dependency is OK.

> You just have to guarantee from user space that you don't do nasty
> things with the selector.

Sure. Write this in the ABI docs. Or leave the whole segment register
handling undefined so that nobody gets the idea it's useful to use.

>>- - automatically load the %fs or %gs register with the correct value
>> before returning from prctl(). This introduces no binary
>> incompatibilities and it's really the expected behavior.
>
>
> It's already done (set to zero) to not confuse the lazy switch logic.

Switching to zero is only correct when using the MSR. I'm talking about
the case when the value is used as the base address of a segment. In
that case %fs has to be loaded with the value corresponding to the
apppropriate GDT index. The loading can happen in the kernel (or when
returning from it) or left to the user. I think the former is better
since it corresponds more cleanly to what happens if the MSR is used.
If you don't want it return the GDT index and the user code will have to
issue the mov instruction if the return index is != 0 (indicating the
MSR is used).

> 2.5.64 currently doesn't boot (known issue); 2.5.63 works however.
> I'll look into the .64 problems later today and put a fix onto the
> usual place when done (ftp://ftp.x86-64.org/pub/linux/v2.5/)

Thanks, looking forward to that.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+Z5pV2ijCOnn/RHQRAsmhAJ9A9JXlw6YJ/5RAG+Y5NF708UChVwCgv7Cc
1JoFN0hrXYjZ7eynW3Nyj8I=
=+qtg
-----END PGP SIGNATURE-----

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/