Re: /proc/<pidnumber>/stat hangs reading process

Eric W. Biederman (ebiederm@xmission.com)
18 Nov 2001 13:38:41 -0700


"Marcelo Roberto Jimenez" <mroberto@cetuc.puc-rio.br> writes:

> Mika,
>
> > Hello,
> > basically this posting is about the same problem as one I posted in
> > September:
> >
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.0/0764.html
> >
> > It's essentially the same situation: I was running mozilla and it stopped
> > responding to any input. I tried to kill it with control-c, kill and
> > finally with kill -9, but none helped. When I tried to look at the output
> > of top and ps, the exactly same symptons appeared; those processes didn't
> > finish and can't be killed either. When I do strace ps the output ends
> > at:
> >
> > stat64("/proc/16515", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> > open("/proc/16515/stat", O_RDONLY) = 7
> > read(7,
>
> I'm having this problem too, for a long time. It's usually associated with big
> loads ( for my machine, of course, a PII-233 ). It has happened while opening
> lot's of pages with opera, but has also happened while compiling 3 kernels at
> the same time and playing a video with xine or aviplay.
>
>
> The behavior is the same: ps blocks, gtop blocks, killall blocks, anything that
> tries to get the process information blocks too.
>
>
> The machine can be used as long as a program does not try to call the
> problematic function, whitch I wasn't able to trace down.
>
>
> I haven't had this problem for a while, basically because I try not to stress
> these ``hanging'' applications anymore, so that I can work, but I'll try to see
> if I can reproduce the bug with the new VM.
>
>
> The problem is: what can we do, to investigate the problem, once ps starts to
> block?

Try using Alt-Sysrq and find the address in the kernel where the processes
are blocking. The you should be able to trace back and figure out which
lock things are blocking on.

I have only seen this once on buggy hardware. (At least on a recent kernel).
Earlier kernels had a case where they contended with process that in
certain circumstances had locks normally held. And the ps never managed
to grab the lock.

Additionally there are a few other pieces like spin lock debugging in 2.4.14
that you might want to compile in as well.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/