Re: odd memory corruption in 2.5.27?

george anzinger (george@mvista.com)
Tue, 23 Jul 2002 13:32:11 -0700


Zwane Mwaikambo wrote:
>
> On Tue, 23 Jul 2002, Trond Myklebust wrote:
>
> > Just means that some RPC message reply from the server was crap. We should
> > deal fine with that sort of thing...
> >
> > AFAICS The Oops itself happened deep down in the socket layer in the part
> > which has to do with reassembling fragments into packets. The garbage
> > collector tried to release a fragment that had timed out and Oopsed.
> >
> > Suggests either memory corruption or else that the networking driver is doing
> > something odd ('cos at that point in the socket layer *only* the driver + the
> > fragment handler should have touched the skb).
>
> Thanks, that helps quite a bit, i'll see if i can pinpoint it and send it
> to the relevant people.
>
I just spent a month tracking down this issue. It comes
down to the slab allocater using per cpu data structures and
protecting them with a combination of interrupt disables and
spin_locks. Preemption is allowed (incorrectly) if
interrupts are off and preempt_count goes to zero on the
spin_unlock. I will wager that this is an SMP machine.
After the preemption interrupts will be on (schedule() does
that) AND you could be on a different cpu. Either of these
is a BAD thing.

The proposed fix is to catch the attempted preemption in
preempt_schedule() and just return if the interrupt system
is off. (Of course there is more that this to it, but I do
believe that the problem is known. You could blow this
assertion out of the water by asserting that the machine is
NOT smp.)

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/