Re: Killing process with SIGKILL and ncpfs

Petr Vandrovec (VANDROVE@vc.cvut.cz)
Wed, 17 Jan 2001 17:07:08 MET-1


On 17 Jan 01 at 13:41, Urban Widmark wrote:
> SIGKILL or SIGSTOP can be already pending, or perhaps received while
> waiting in socket->ops->recvmsg(). recvmsg will then return -ERESTARTSYS
> because signal_pending() is true and the smbfs code treats that as a
> network problem (causing unnecessary reconnects and sometimes complete
> failures requiring umount/mount).

Yes. I was going to rewrite this code sometime around 2.3.40, when
I wrote independent NCP socket layer for some other project. Unfortunately,
I found that returning -ERESTARTSYS from some procedures (read_inode)
is converted to bad_inode, instead of just dropping/reverting all
changes which were done :-( So I left it as is.

> Running strace on a multithreaded program causes problems for smbfs.
> Someone was nice enough to post a small testprogram for this (you may want
> to try it on ncpfs, if you want it I'll find it for you).
>
> These problems go away if all signals are blocked. Of course the smbfs
> code would need to be changed to not block on recv, else you may end up
> with a program waiting for network input that can't be killed ... (?)

I'm going to use:

if (current->flags & PF_EXITING)
mask = 0;
else
mask = sigmask(SIGKILL);

It causes following:
(1) you can still kill bad task locked in ncpfs with SIGKILL
(2) except when connectivity problem happens during exit(). In that
case timeout should take a care. No way around, except more
complicated code:

if (current->flags & PF_EXITING) {
lock(...)
rm_sig_from_queue(SIGKILL, &current->pending);
unlock(...)
}
mask = sigmask(SIGKILL);

which allows you to 'kill -9' exiting task again...
But I cannot convice myself that SIGKILL is correctly delivered
to PF_EXITING tasks... And rm_sig_from_queue is signal.c internal
function.
(3) attaching/detaching debugger causes no longer problems, as SIGSTOP
is ignored by ncpfs - I have no idea why original code included
SIGSTOP... Now it just stops task after NCP request is done, so
only problem is that you cannot immediately stop program which
waits for server reply. But I think that waiting few milliseconds
is better than killing connection for whole server
(4) so you can happilly debug programs from ncpfs volumes

If it will not cause too much troubles for you, can you find multithreaded
test program for me? Current solution, which uses only SIGKILL, and only
when task does not exit, looks good for me, and works for my testcases.
There are some corner cases, such as when SIGKILL is already pending when
ncpfs is entered, but I'm not sure whether it is worth of adding check
of signal_pending() before call to do_ncp_*rpc_call(), as it is not four
line patch then.
Best regards,
Petr Vandrovec
vandrove@vc.cvut.cz

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/