intermittent hangs with threads (clone() bug?/linuxthreads bug?)

Ed Connell (Ed.Connell@sas.com)
Tue, 19 Jun 2001 17:23:48 -0400


Hi,

I am experiencing intermittent hangs running LinuxThreads programs from a shell script. This happens with any combination of stock RH 7.1 SMP kernel (2.4.2) or my own 2.4.5 kernel and stock RH 7.1 libc (2.2.2), redhat rawhide libc (2.2.3) and my own 2.2.3 libc.

If I run, for example, linuxthreads/Examples/ex1 (one thread prints 'a', one prints 'b') it will run fine. If I run it from a shell script (bash or ksh) with
exec ex1
it almost always hangs. When I do a "ps" I see the original "ex1" process plus another defunct "ex1" process with a higher pid. This defunct process was supposed to be the LinuxThreads manager thread but it seems that clone() is silently failing to create a valid thread. Attaching a debugger to the original "ex1" and putting print statements in libc/linuxthreads and the kernel (do_fork() and friends) all indicate things are proceeding normally. Yet the manager thread/process is never scheduled to run...it is immediately defunct.

My hardware is an 8-way i686. I tried removing CPU's but the problem remains until I get down to a single CPU (or boot a uniprocessor kernel). When I got down to 2 CPU's I noticed that running my script from console almost always produced a hang while running from an xterm always worked. Obviously a timing issue. Also things run fine on other SMP hardware I have access to.

If anyone has any idea what is going on, I would love to hear it. If you need more information please let me know, as well.

Thanks
Ed Connell

Healthy traceback from original "ex1" process where it is waiting for the manager thread to take over.
(gdb) where
#0 0x40067ff5 in __sigsuspend (set=0xbffff2e8)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1 0x4002d25f in __pthread_wait_for_restart_signal (self=0x40036300)
at pthread.c:958
#2 0x4002cbc4 in __pthread_create_2_1 (thread=0xbffff454, attr=0x0,
start_routine=0x8048580 <process>, arg=0x804871d) at restart.h:34
#3 0x080485e7 in main () at ex1.c:29
#4 0x400565e7 in __libc_start_main (main=0x80485cc <main>, argc=1,
ubp_av=0xbffff4c4, init=0x80483d0 <_init>, fini=0x80486e0 <_fini>,
rtld_fini=0x4000e154 <_dl_fini>, stack_end=0xbffff4bc)
at ../sysdeps/generic/libc-start.c:129

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/