Re: Patch: linux-2.5.8-pre1/kernel/exit.c change caused BUG() atboot

Andrew Morton (akpm@zip.com.au)
Thu, 04 Apr 2002 12:54:12 -0800


Robert Love wrote:
>
> ...
> Do you think it is better to deny preemption if state==TASK_ZOMBIE (note
> this requires code in preempt_schedule and the interrupt return path,
> since Ingo decoupled the two) or just disable preemption around critical
> regions caused by setting state to TASK_ZOMBIE ?
>
> I suspect this is the first occurrence of a problem of this kind ... and
> the attached patch handles it.
>

No, the problem goes deeper than this.

I have code which does, effectively:

sleeper()
{
spin_lock(&some_lock);
set_current_state(TASK_UNINTERRUPTIBLE);
some_flag = 0;
spin_unlock(&lock);
schedule();
if (some_flag == 0)
i_am_horribly_confused();
}

waker()
{
spin_lock(&some_lock);
some_flag = 1;
wake_up_process(sleeper);
spin_unlock(&some_lock);
}

or something like that. See __pdflush() and
pdflush_operation() in http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.8-pre1/delalloc/dallocbase-60-pdflush.patch

The above code work fine, is nice and I want to keep
it that way. But it fails on preempt.

The spin_unlock() in sleeper() can sometimes set
task->state to TASK_RUNNING(), so my schedule() call
just falls straight through.

Probably nobody has noticed this in other places because
most sleep/wakeup stuff tends to be done inside a loop;
the bogus "wakeup" is ignored.

Although it can be worked around at the call site, I
think this needs fixing. Otherwise we have the rule
"spin_unlock will flip you into TASK_RUNNING 0.0001%
of the time if CONFIG_PREEMPT=y". ug.

I have thought deeply about this, and I then promptly
forgot everything I thought about, but I ended up
concluding that the sanest way of resolving this is
inside __set_current_state(). If the new state is
TASK_RUNNING and the old state is not TASK_RUNNING
then enable preemption, call schedule() if necessary, etc.

It is not acceptable to just say "don't preempt a task
which is not in state TASK_RUNNING", because if an
interrupt happens against a CPU which is running a task
which is in state TASK_INTERRUPTIBLE (say), then that
wakeup won't be serviced until the task exits the kernel.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/