No, the problem goes deeper than this.
I have code which does, effectively:
sleeper()
{
	spin_lock(&some_lock);
	set_current_state(TASK_UNINTERRUPTIBLE);
	some_flag = 0;
	spin_unlock(&lock);
	schedule();
	if (some_flag == 0)
		i_am_horribly_confused();
}
waker()
{
	spin_lock(&some_lock);
	some_flag = 1;
	wake_up_process(sleeper);
	spin_unlock(&some_lock);
}
or something like that.  See __pdflush() and 
pdflush_operation() in http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8-pre1/delalloc/dallocbase-60-pdflush.patch
The above code work fine, is nice and I want to keep
it that way.  But it fails on preempt.
The spin_unlock() in sleeper() can sometimes set
task->state to TASK_RUNNING(), so my schedule() call
just falls straight through.
Probably nobody has noticed this in other places because
most sleep/wakeup stuff tends to be done inside a loop;
the bogus "wakeup" is ignored.
Although it can be worked around at the call site, I
think this needs fixing.  Otherwise we have the rule
"spin_unlock will flip you into TASK_RUNNING 0.0001%
of the time if CONFIG_PREEMPT=y".  ug.
I have thought deeply about this, and I then promptly
forgot everything I thought about, but I ended up
concluding that the sanest way of resolving this is
inside __set_current_state().  If the new state is
TASK_RUNNING and the old state is not TASK_RUNNING
then enable preemption, call schedule() if necessary, etc.
It is not acceptable to just say "don't preempt a task
which is not in state TASK_RUNNING", because if an
interrupt happens against a CPU which is running a task
which is in state TASK_INTERRUPTIBLE (say), then that
wakeup won't be serviced until the task exits the kernel.
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/