> Good report. To tell the truth, I know that this lockup exists,
> there's an RH issue-tracker item against me on this.
Several of us at HP have been chasing this problem as well. Here is why
there is a deadlock: deliver_to_old_ones()attempts to stop all timers
from running and then blocks until all the timers are no longer running.
This code is called from netif_receive_skb which is called from tg3_poll
while it is holding a lock in the tg3 driver. On another CPU, the
tg3_timer routine is run but is blocked by the lock held in the tg3_poll
routine. The tg3_timer routine never finishes because it can't acquire
the lock being held by tg3_poll on another CPU. That prevents
deliver_to_old_ones from executing because there is still a timer
routine executing.
Here is the call stack of the deadlocked CPUs on a RH8.0 system with a
2.4.18-24.8.0 smp kernel:
CPU 2:
deliver_to_old_ones+45
netif_receive_skb
tg3_rx+27b
tg3_poll+81
net_rx_action
do_softirq
do_IRQ
call_do_IRQ
CPU 6:
tg3_timer (tg3+9fc4)
run_timer_list+0x112
bh_action+55
tasklet_hi_action+67
do_softirq+d9
do_IRQ
call_do_IRQ+5
Thanks
Tom Rhodes
tom.rhodes@hp.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/