Re: Scheduler Bug (set_cpus_allowed)

Mike Kravetz (kravetz@us.ibm.com)
Mon, 10 Jun 2002 11:50:20 -0700


Robert,

It looks like you got the set_cpus_allowed() optimization removed
in the 2.5.21 release. Thanks!

However, even with this code removed there are still some races in
the scheduler. Specifically, load_balance() can race with schedule()
in the code around the call to context switch(). Below, is a patch
that addresses these races. It may not be the most elegant solution,
but is seemed the most straight forward. It also reintroduces the
set_cpus_allowed() optimization, but my main concern is with the
other races. Whether or not the optimization goes in is no big deal.

-- 
Mike

diff -Naur linux-2.5.21/kernel/sched.c linux-2.5.21-fix/kernel/sched.c --- linux-2.5.21/kernel/sched.c Sun Jun 9 05:28:13 2002 +++ linux-2.5.21-fix/kernel/sched.c Mon Jun 10 18:26:55 2002 @@ -138,7 +138,7 @@ spinlock_t frozen; unsigned long nr_running, nr_switches, expired_timestamp; signed long nr_uninterruptible; - task_t *curr, *idle; + task_t *prev, *curr, *idle; prio_array_t *active, *expired, arrays[2]; int prev_nr_running[NR_CPUS]; task_t *migration_thread; @@ -152,6 +152,7 @@ #define task_rq(p) cpu_rq((p)->thread_info->cpu) #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define rt_task(p) ((p)->prio < MAX_RT_PRIO) +#define running_on_rq(p, rq) (((p) == (rq)->curr) || ((p) == (rq)->prev)) static inline runqueue_t *task_rq_lock(task_t *p, unsigned long *flags) { @@ -284,12 +285,12 @@ repeat: preempt_disable(); rq = task_rq(p); - while (unlikely(rq->curr == p)) { + while (unlikely(running_on_rq(p, rq))) { cpu_relax(); barrier(); } rq = task_rq_lock(p, &flags); - if (unlikely(rq->curr == p)) { + if (unlikely(running_on_rq(p, rq))) { task_rq_unlock(rq, &flags); preempt_enable(); goto repeat; @@ -604,7 +605,7 @@ #define CAN_MIGRATE_TASK(p,rq,this_cpu) \ ((jiffies - (p)->sleep_timestamp > cache_decay_ticks) && \ - ((p) != (rq)->curr) && \ + (!running_on_rq(p, rq)) && \ ((p)->cpus_allowed & (1UL << (this_cpu)))) if (!CAN_MIGRATE_TASK(tmp, busiest, this_cpu)) { @@ -827,6 +828,7 @@ if (likely(prev != next)) { rq->nr_switches++; + rq->prev = prev; rq->curr = next; spin_lock(&rq->frozen); spin_unlock(&rq->lock); @@ -840,6 +842,7 @@ */ mb(); rq = this_rq(); + rq->prev = NULL; spin_unlock_irq(&rq->frozen); } else { spin_unlock_irq(&rq->lock); @@ -1687,6 +1690,16 @@ task_rq_unlock(rq, &flags); goto out; } + + /* + * If the task is not on a runqueue (and not running), then + * it is sufficient to simply update the task's cpu field. + */ + if (!p->array && !running_on_rq(p, rq)) { + p->thread_info->cpu = __ffs(p->cpus_allowed); + task_rq_unlock(rq, &flags); + goto out; + } init_MUTEX_LOCKED(&req.sem); req.task = p; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/