Because the second cpu can use a stale tlb entry. We must free the page
_after_ the flush, not before the flush.
Example:
cpu1, cpu2: execute 2 threads from one process.
cpu3: unrelated thread that allocates a new page.
cpu1:
1) writes to one page in a tight loop. The tlb entry won't be discared
by the cpu without an explicit flush.
cpu2:
2) sys_munmap()
* zap_page_range(): calls free_page() for each page in the area.
do_munmap() for a 500 MB block will take a few milliseconds.
cpu3:
3) somewhere: get_free_page(). Now it gets a pointer to a page that cpu1
still writes to.
cpu2:
4) zap_page_range() returns, now the tlb is flushed.
cpu1:
5) received the ipi, the local tlb is flushed, page fault.
* but: cpu1 stomped on the page that was allocated by cpu3.
>
> But a race does exist in establish_pte(), when the flush_tlb happens
> _before_ the set_pte(), another thread might drag in the old translation
> on a different cpu.
>
Yes, establish_pte() is broken. We should reverse the calls:
set_pte(); /* update the kernel page tables */
update_mmu(); /* update architecture specific page tables. */
flush_tlb(); /* and flush the hardware tlb */
> >
> > filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
> > really bad idea, the performance will suck with multi-threaded apps on
> > SMP.
>
> The best you can do probably is a flush_tlb_range?
filemap_sync() calls both flush_tlb_range() and flush_tlb_page() on each
page, I just don't know which call I should remove :-/
Btw, I couldn't find an update_mmu_range() function, what's the exact
purpose of update_mmu_cache()? mips64 uses this function.
Can't we merge update_mmu_cache() with flush_tlb_page()?
-- Manfred
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/