Re: [BUG][2.5.66bk9+] - changes to timers still broken - we don't oops anymore

Roland Dreier (roland@topspin.com)
06 Apr 2003 13:20:28 -0700


>>>>> "Shawn" == Shawn Starr <spstarr@sh0n.net> writes:

Shawn> It just caused sshd to hang. I don't know why Here's what
Shawn> strace reports: Sshd is stuck in 'D' and a child in zombie
Shawn> state. The machine has been up for 2 days 18 hours 50 mins.

I may have missed some emails on this topic, so I apologize if this
objection was already answered. In any case, if you're running with
Andrew's patch that added a del_timer_sync to release_dev(), I think I
see how that could cause problems. Here's what I said in my previous
email:

From looking at workqueue.c (especially the comment in
queue_delayed_work() that says "Increase nr_queued so that the flush
function knows that there's something pending."), it seems like
flush_scheduled_work() should wait until even delayed work is done.
Given that, I don't think the del_timer_sync() should be there --
wouldn't flush_scheduled_work() block forever, since nr_queued can
never reach 0 now?

Also, I didn't see an answer to the worry I expressed below:

I don't see how it's _ever_ safe to call flush_scheduled_work().
The comment in workqueue.c before flush_workqueue() says "NOTE: if
work is being added to the queue constantly by some other context
then this function might block indefinitely." But
flush_scheduled_work() is flushing the keventd_wq, which other code
will definitely add work to. If we're unlucky,
flush_scheduled_work() could block forever. Am I just being
paranoid?

I admit that flush_scheduled_work() is unlikely to cause problems in
most real world situations, but I'd be curious to know if anyone
thinks this could ever lead to a livelock under some strange condition.

- Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/