[RFC] 2.5 TASK_INTERRUPTIBLE preemption race

Joe Korty (joe.korty@ccur.com)
Mon, 14 Apr 2003 17:54:10 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Robert Love: "Re: 2.5 'what to expect' document."
Previous message: Måns Rullgård: "3c59x module from 2.5.67 broken on Alpha"

Hi Andrew, Robert, Alan, Everyone,
The following 2.5 code fragment seems unsafe in a preempt
environment. A preemption could occur between the time when
TASK_UNINTERRUPTIBLE is set and the `spin_lock_irqsave'. This would
cause the task to switch out and never come back, as it would have
been switched away while in TASK_UNINTERRUPTIBLE without yet having
put itself onto the semaphore wait queue, where it could be found
later by a wakeup service.

void __down(struct semaphore * sem)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
unsigned long flags;

tsk->state = TASK_UNINTERRUPTIBLE;
spin_lock_irqsave(&sem->wait.lock, flags);
add_wait_queue_exclusive_locked(&sem->wait, &wait);
....
}

Is this analysis correct? If it is, perhaps there is an alternative
to fixing these cases individually: make the TASK_INTERRUPTIBLE/
TASK_UNINTERRUPTIBLE states block preemption. In which case the
'set_current_state(TASK_RUNNING)' macro would need to include the
same preemption check as 'preemption_enable'.

I suspect there is already some mechanism in place to prevent this
problem, as I have never seen this lockup happen on any of my
2.4-preempt systems.

Joe

PS: here is an example where the preemption race appears harmless.
If a preemption happens between the 'set_current_state' and
'schedule', it only causes the 'schedule' to NOP: the preemption, on
return, would have changed the state from TASK_UNINTERRUPTIBLE back
to TASK_RUNNING.

void __wait_on_inode(struct inode *inode)
{
DECLARE_WAITQUEUE(wait, current);
wait_queue_head_t *wq = i_waitq_head(inode);

add_wait_queue(wq, &wait);
repeat:
set_current_state(TASK_UNINTERRUPTIBLE);
if (inode->i_state & I_LOCK) {
schedule();
goto repeat;
}
remove_wait_queue(wq, &wait);
__set_current_state(TASK_RUNNING);
}

PPS: the above may need a 'mb()' between the 'add_wait_queue' and
'set_current_state' regardless.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Robert Love: "Re: 2.5 'what to expect' document."
Previous message: Måns Rullgård: "3c59x module from 2.5.67 broken on Alpha"