> Gary Sandine of Los Alamos Computers and I are attempting to qualify
> Linux on a Tyan 2462 K7 Thunder motherboard -- dual Athlon 1200 MP
> chips supported by an AMD 760 chipset.  We have been seeing mysterious
> lockups during commands to build things from source, like kernels and X.
> 
> We've been trying to track down the problem for about sixteen hours
> and have gathered quite a bit of data, but don't have a theory to  explain
> it.
What kind of case are you running in?  I have heard of one other case
that sounds similiar and in that case the system was in a 1U.
 
> First, we have established that this is a real kernel hang, not just a 
> bad device state:
> 
> A. Lockups can be induced in either console or X mode.  A reliable way to 
>    induce them is to run `make clean' on an X tree (any sufficiently 
>    long-running command seems to do it).
> 
> B. We logged in over the network, started a top(1) in the network
>    session, induced the hang on the console, and watch top(1) freeze.
>    So 
> 
> C. The magic AltSysRq command is ineffective when the lockups happen.
> 
> Here's what we know about it:
> 
> 1. Lockups never occur under a uniprocessor kernel.
> 
> 2. Configuring APM and ACPI out of the kernel does not prevent the lockups.
>    Disabling ACPI and power management doesn't stop them either.
> 
> 3. Changing kernels from 2.4.3 to 2.4.7 doesn't prevent the lockups.
> 
> 4. The SMP kernel built for either PII or AMD (no APM, no ACPI) locks up.
> 
> 5. There is an undocumented BIOS setting "Use PCI Interrupt Entries in 
>    MP table."  By default it is on.  Turning it off doesn't prevent the
>    lockups.
This switches between listing the 4 interrupts that the board uses for pci
between either in the ISA range if interrupts or routing them to the IOAPIC
above the normal 16 ISA interrupts.
 
> 6. Here's a weird one.  When the kernel is running, the power switch
>    has to be pressed down for 4 seconds to power down the machine.  But
>    during a lockup it powers down the machine instantly.
> 
> What we're seeing suggests some bad interaction between the SMP
> support and the hardware.  But item 7 hints that power management
> could be involved, even though we have it configured out.
The board only uses ACPI so power management isn't a large canidate.
I think I have to go with Alan that the most likely case is that the
board is marginal in respect.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/