If the wakeup on tty->write_wait occurs in an interrupt, as it does
with our Digi Acceleport USB serial driver, the wake up is sometimes
lost and write_chan() sleeps forever. We had to scheduled the
wakeups to occur after the interrupt, to fix this. This has also
been a problem in some other USB serial drivers.
Our fix corrects the problem on UP, but theoretically (I think) there
is still a race on SMP. We have not actually seen the problem in our
testing on SMP, however.
Here is a sketch of the code in write_chan()
static ssize_t write_chan(struct tty_struct * tty ... )
{
...
add_wait_queue(&tty->write_wait, &wait);
while (1) {
set_current_state(TASK_INTERRUPTIBLE);
...
current->state = TASK_RUNNING;
c = tty->driver.write(tty, 1, b, nr);
current->state = TASK_INTERRUPTIBLE;
...
schedule();
}
break_out:
current->state = TASK_RUNNING;
remove_wait_queue(&tty->write_wait, &wait);
...
}
The problem occurs when the wakeup on tty->write_wait comes before
the line "current->state = TASK_INTERRUPTIBLE". If that happens,
the wakeup is lost when the state is set back to TASK_INTERRUPTIBLE,
and write_chan goes to sleep in schedule() waiting for a wakeup that
has already occured.
As I mentioned, this happened with our driver when the wakeup occured
on an interrupt and we saw this repeatedly in our testing. When we
scheduled the wakeup to run later, then we could be sure that write_chan
was really asleep, having called schedule(), before the wakeup would
happen, and so the wake up would not be lost--on UP anyway.
It seems possible on SMP, even if the wakeup happens on scheduler time,
the wakeup could still be lost.
What is the correct solution to this problem? Should all serial drivers
be sure NOT to wakeup tty->write_wait on interrupts? Is that sufficient
(for reasons not clear to me) on SMP? Is write_chan() really correct,
or does it need to be rewritten?
The lines around the call to driver.write that set the state to RUNNING
and back to INTERRUPTIBLE were added after 2.2.14. We have seen the
problem in 2.2.16, 2.3.99, and 2.4.0, but not in 2.2.14.
This problem has been commented on briefly in linux-usb:
Aki Laukkanen <amlaukka@cc.helsinki.fi> wrote:
> 2.2.15-pre5 and later too if the cause of the hangs is that
> TASK_RUNNING/TASK_INTERRUPTIBLE ping-pong.
>
> 2.2.15pre5
> o Fix cases where things write to user space
> in TASK_INTERRUPTIBLE as well as some other
> odd quirks (Ben LaHaise et al)
>
> I believe this was the point:
>
> void ___wait_on_page(struct page *page)
> {
> struct task_struct *tsk = current;
> DECLARE_WAITQUEUE(wait, tsk);
>
> add_wait_queue(&page->wait, &wait);
> do {
> sync_page(page);
> set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> if (!PageLocked(page))
> break;
> schedule();
> } while (PageLocked(page));
> tsk->state = TASK_RUNNING;
> remove_wait_queue(&page->wait, &wait);
> }
>
> Hmm. task state is set UNINTERRUPTIBLE explicitly before going to schedule().
> Makes me wonder why it is then set RUNNING before accessing user space
> in n_tty.c.
>
> Note to Alan, changes to es1370 look at a glance problematic in this
> respect too. I believe there are recent bug reports which fit the bill.
Thanks for any help on these questions,
-- Al Borchers and Peter Berger
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/