Re: Oops in sched.c on PPro SMP

Peter Waechtler (pwaechtler@mac.com)
Mon, 16 Sep 2002 23:16:20 +0200


Am Montag den, 16. September 2002, um 17:44, schrieb Andrea Arcangeli:

> On Mon, Sep 16, 2002 at 03:49:27PM +0100, Alan Cox wrote:
>> Also does turning off the nmi watchdog junk make the box stable ?
>
> good idea, I didn't though about this one since I only heard the nmi to
> lockup hard boxes after hours of load, never to generate any
> malfunction, but certainly the nmi handling isn't probably one of the
> most exercised hardware paths in the cpus, so it's a good idea to
> reproduce with it turned off (OTOH I guess you probably turned it on
> explicitly only after you got these troubles, in order to debug them).
>

I only turned the nmi watchdog on, on the one "unknown" version Oops.

This box was running fine with 2.4.18-SuSE with uptimes 40+days. _Now_
I am almost sure, that it's _not_ a hardware problem (FENCE counting
here as software - since there is a software workaround).

I had 3 lockups in 2 days, when I switched to 2.4.19 - and even lower
room temperature. No, there _must_ be a bug :)

With the relocation you are right - I thought it would test against
NULL :-(

I think that the tasklist is broken inbetween - either due to broken
readlocks (no working EFENCE on PPRO)

Can someone explain me the difference for label 1 and 2?
Why is the "js 2f" there? This I don't understand fully -
it looks broken to me.

include/asm-i386/rwlock.h

#define __build_read_lock_ptr(rw, helper) \
asm volatile(LOCK "subl $1,(%0)\n\t" \
"js 2f\n" \
"1:\n" \
LOCK_SECTION_START("") \
"2:\tcall " helper "\n\t" \
"jmp 1b\n" \
LOCK_SECTION_END \
::"a" (rw) : "memory")

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/