Re: shm bugs revisited

Manfred Spraul (manfreds@colorfullife.com)
Thu, 27 Jan 2000 21:51:50 +0100


Christoph Rohland wrote:
>
> Hi folks,
>
> I am still trying to hunt down the shm bugs I am experiencing under
> high smp load. I reported kernel oopses during swapping.
>
> But I now realized that errors occur without even swapping. My test
> program shows inconsistent data on rereading the segment.
>

I think I found a possible source: there's a tiny window where the tlb's
are not flushed properly. The window is about as large as the
"current->active_mm == NULL" window, ie. you could find it on your
8-way box:

CPU0: CPU1
[shm test program]
[page stealer]
switch_mm()
* changes %cr3
* sets "next_mm->cpu_vm_mask"

flush_tlb_page()
* resets "current->cpu_vm_mask"
* sends an IPI to all cpu's.
* the IPI arrives, but "current" still points to the old thread
* "next_mm->cpu_vm_mask" is not updates.
* switch_to() changes current.
* the cpu returns to user space.

flush_tlb_page()
* "current->cpu_vm_mask" is zero
(at least bit0 is zero)
no IPI.
* cpu0 is not flushed
* the cpu could use outdated tlb entries.

I'm working on a solution. I think the tlb flush interrupt must not
access "current->{active_,}mm", because %cr3 and %esp are not exchanged
in one atomic operation.

--
	Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/