Re: NFS deadlock explained (on S/390)

Manfred Spraul (manfred@colorfullife.com)
Fri, 31 Aug 2001 20:02:17 +0200


> Specifically, what happens is that the first CPU runs the QDIO
> bottom half (from ksoftirqd, but this is probably not important).
> That bottom half grabs the QDIO private spinlock, and then
> executes a bunch of code including a dev_kfree_skb_any().
> As we are only in a softirq, not a hard irq, dev_kfree_skb_any()
> call kfree_skb() directly, which happens to invoke the
> udp_write_space() callback in xprt.c. This routine then spins
> trying to acquire the xprt_sock_lock.

I think in_irq() and in_interrupt() should check the cpu interrupt flag
and return TRUE if the per-cpu interrupts are disabled.

The current behavious is just weird:
spin_lock_bh();
in_interrupt(); --> true
spin_unlock_bh();
spin_lock_irq();
in_interrupt(); --> false
spin_unlock_irq();

> Whether this same situation can explain the deadlocks seen on
> other platforms depends on whether the drivers used there exhibit
> similar locking behaviours as the QDIO driver, of course.

It should be possible to detect that automatically: if
dev_kfree_skb_any() is called outside irq context with disabled per-cpu
interrupts then it's probably due to a spin_lock_irq() and could
deadlock.

--
    Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/