This means that if the events below happen, then the kernel could
"forget" to send tlb flush IPI's.
I'd propose a new hook in the main scheduler, do you have a better idea
how to fix the problem?
linux/kernel/sched.c:
> struct mm_struct *mm = next->mm;
> struct mm_struct *oldmm = prev->active_mm;
> if (!mm) {
> if (next->active_mm) BUG();
> next->active_mm = oldmm;
> atomic_inc(&oldmm->mm_count);
> + enter_lazy_mm(oldmm, next, this_cpu);
> } else {
> if (next->active_mm != mm) BUG();
> switch_mm(oldmm, mm, next, this_cpu);
> }
>
> if (!prev->mm) {
> prev->active_mm = NULL;
> mmdrop(oldmm);
> }
And here's the problem:
CPU0: CPU1
[user program]
[page stealer]
switch_mm()
* changes %cr3
* sets bit0 in "next_mm->cpu_vm_mask"
flush_tlb_page()
* resets "tsk->cpu_vm_mask" to 0.
* sends an IPI to all cpu's.
* the IPI arrives, but "current" still points to the old thread.
Therefore IPI doesn't update "next_mm->cpu_vm_mask", instead
it updates "oldmm->cpu_vm_mask" [wrong, but harmless]
* switch_to() changes current.
* the cpu returns to user space.
flush_tlb_page()
* "tsk->cpu_vm_mask" is zero
(at least bit0 is zero)
no IPI.
* cpu0 is not flushed
* the cpu could use outdated tlb entries.
-- Manfred- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/