Re: Dazed and Confused

Gilad Ben-Yossef (gilad@benyossef.com)
08 Dec 2002 13:59:41 +0200


On Fri, 2002-12-06 at 16:55, Greg Boyce wrote:

> I have an issue that I've been trying to track down for some time, and I
> was hoping that someone might be able to provide me with a definitive
> awnser.
>
> I work in a company with a large number of Linux machine deployed all
> around the country, and in some of the machines we've been seeing the
> following error:
>
> Uhhuh. NMI received. Dazed and confused, but trying to continue
> You probably have a hardware problem with your RAM chips

I have had the exact same error happen a while back on a 2.2.x kernel.
It did not seem to hurt anything but it made the QA dept. go bonkers so
I've spent some time chasing it down and found out what caused it back
then - perhaps the same, or similar, applies to your setup as well:

The machines in question were Intel ISP1100 1U servers and for various
non important reasons I have built the kernel which they were running
without APM support. Now these machines have 3 small non marked buttons
on their front - one is the power button, one is the reset button and
one was a suspend button.

What I found out was that whenever anyone pressed the "suspend" button
(usually because they meant to press the power or reset buttons and
missed) the error in questions was logged. It seems that APM suspend is
implemented (at least on those machines) as an NMI, and if you compiled
the kernel sans APM support the NMI handling code simply did not grok
that specific NMI and thus reported said error, which was otherwise
harmless.

Hope this helps,
Gilad.

-- 
 Gilad Ben-Yossef <gilad@benyossef.com> 
 http://benyossef.com 
 "Denial really is a river in Egypt."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/