Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd

Jean Tourrilhes (jt@bougret.hpl.hp.com)
Thu, 15 May 2003 13:12:55 -0700


Greg,

This is a HotPlug problem, so would you mind forwarding this
to the relevant person and help Martin ?
Thanks in advance...

Jean

On Thu, May 15, 2003 at 03:14:36PM +0200, Martin Diehl wrote:
>
> Hi,
>
> seems we may run into mutual deadlock in the unregister_netdev() path with
> CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to
> the following description:
>
> * killing irattach (userland daemon comparable to pppd) starts closing the
> irda tty-ldisc
>
> * there we call unregister_netdev() on behalf of the (already closed)
> irda0 network device.
>
> * unregister_netdev() takes rtnl_lock
>
> * further down in unregister_netdevice() with CONFIG_HOTPLUG the network
> layers wants to call userland hotplug stuff
>
> * the request to fork the usermodehelper gets queued for the event/0
> workqueue (aka keventd) and we are blocking with rtnl still acquired for
> completion.
>
> * at this moment for some reason keventd has a linkwatch_event()
> apparently already scheduled before the usermode helper. So we run into
> linkwatch_event() with tries to get rtnl_lock.
>
> -> mutual deadlock: keventd waiting for rtnl_lock which is still hold by
> unregister_netdev blocking for completion of work scheduled for keventd.
>
> I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter
> what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no
> idea why I get this with irda0 but not with eth0 for example.
> FWIW kernel is SMP running on UP without preempt.
>
> As I don't see how the irda stuff could cause unregister_netdev() to
> schedule the hotplug stuff with some linkwatch_event already scheduled
> I've no idea what the real problem and fix might be.
>
> Below a commented calltrace catched right when it hangs as described.
>
> Thanks
> Martin
>
> -----------------------------
>
> > May 14 13:14:17 laptop kernel: events/0 D C12FDF04 412092 3 1 4 2 (L-TLB)
> > May 14 13:14:17 laptop kernel: Call Trace:
> > May 14 13:14:17 laptop kernel: [__down+150/256] __down+0x96/0x100
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [__down_failed+8/12] __down_failed+0x8/0xc
> > May 14 13:14:17 laptop kernel: [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36
> > May 14 13:14:17 laptop kernel: [linkwatch_event+29/48] linkwatch_event+0x1d/0x30
> > May 14 13:14:17 laptop kernel: [worker_thread+511/736] worker_thread+0x1ff/0x2e0
> > May 14 13:14:17 laptop kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [worker_thread+0/736] worker_thread+0x0/0x2e0
> > May 14 13:14:17 laptop kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
>
> This is the keventd-thread. It has some work scheduled for the network
> layer, namely linkwatch_event(). This is currently blocking to get the
> rtnl_lock semaphore.
>
>
> > May 14 13:14:17 laptop kernel: irattach D 00000000 4283667124 400 1 537 396 (NOTLB)
> > May 14 13:14:17 laptop kernel: Call Trace:
> > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> > May 14 13:14:17 laptop kernel: [wait_for_completion+153/224] wait_for_completion+0x99/0xe0
>
> (5)
>
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [queue_work+132/160] queue_work+0x84/0xa0
>
> (4)
>
> > May 14 13:14:17 laptop kernel: [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110
> > May 14 13:14:17 laptop kernel: [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70
> > May 14 13:14:17 laptop kernel: [vsprintf+39/48] vsprintf+0x27/0x30
> > May 14 13:14:17 laptop kernel: [sprintf+31/48] sprintf+0x1f/0x30
> > May 14 13:14:17 laptop kernel: [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3
>
> (3)
>
> > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> > May 14 13:14:17 laptop kernel: [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0
> > May 14 13:14:17 laptop kernel: [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0
> > May 14 13:14:17 laptop kernel: [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260
> > May 14 13:14:17 laptop kernel: [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev]
>
> (2)
>
> > May 14 13:14:17 laptop kernel: [unregister_netdev+24/48] unregister_netdev+0x18/0x30
>
> (1)
>
> > May 14 13:14:17 laptop kernel: [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev]
> > May 14 13:14:17 laptop kernel: [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev]
> > May 14 13:14:17 laptop kernel: [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0
> > May 14 13:14:17 laptop kernel: [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0
> > May 14 13:14:17 laptop kernel: [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110
> > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
> > May 14 13:14:17 laptop kernel: [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20
> > May 14 13:14:17 laptop kernel: [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir]
> > May 14 13:14:17 laptop kernel: [sys_ioctl+256/656] sys_ioctl+0x100/0x290
> > May 14 13:14:17 laptop kernel: [syscall_call+7/11] syscall_call+0x7/0xb
>
> Ok, nice trace btw: The last printk from sir_dev was at (1) before we
> called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to
> the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer
> decided to call the hotplug stuff (3). For this to fork the usermode
> helper, it scheduled some work for keventd (4). Finally we are blocking
> for completion until keventd finishes wait4 usermodehelper (5).
>
> Unfortunately we are blocking for completion with rtnl still locked and
> keventd apparently having the linkwatch_event() scheduled before the
> usermodehelper -> mutual deadlock between irattach and keventd!
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/