Re: [patch] sys_epoll ...

Andrew Morton (akpm@digeo.com)
Sun, 20 Oct 2002 19:01:20 -0700


Davide Libenzi wrote:
>
> On Sun, 20 Oct 2002, Andrew Morton wrote:
>
> > + if (ep->eventcnt || !timeout)
> > + break;
> > + if (signal_pending(current)) {
> > + res = -EINTR;
> > + break;
> > + }
> > +
> > + set_current_state(TASK_INTERRUPTIBLE);
> > +
> > + write_unlock_irqrestore(&ep->lock, flags);
> > + timeout = schedule_timeout(timeout);
> >
> > You should set current->state before performing the checks.
>
> Why this Andrew ?
>

Well I'm assuming that you don't want to sleep if, say,
ep->eventcnt is non-zero. The code is currently (simplified):

add_wait_queue(...);
if (ep->eventcnt)
break;
/* window here */
set_current_state(TASK_INTERRUPTIBLE);
schedule();

If another CPU increments eventcnt and sends this task a wakeup in that
window, it is missed and we still sleep. The conventional fix for that
is:

add_wait_queue(...);
set_current_state(TASK_INTERRUPTIBLE);
if (ep->eventcnt)
break;
/* harmless window here */
schedule();

So if someone delivers a wakeup in the "harmless window" then this task
still calls schedule(), but the wakeup has turned the state from
TASK_INTERRUPTIBLE into TASK_RUNNING, so the schedule() doesn't actually
take this task off the runqueue. This task will zoom straight through the
schedule() and will then loop back and notice the incremented ep->eventcnt.

So it is important that the waker increment eventcnt _before_ delivering
the wake_up, too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/