Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]

Rob Landley (landley@webofficenow.com)
Tue, 10 Jul 2001 11:28:25 -0400


On Tuesday 10 July 2001 05:17, Ville Herva wrote:
> On Mon, Jul 09, 2001 at 12:48:59PM -0400, you [Rob Landley] claimed:
> > (P.S. What kind of CPU load is most likely to send a processor into
> > overheat? (Other than "a tight loop", thanks. I mean what kind of
> > instructions?) This is going to be CPU specific, isn't it? Our would a
> > general instruction mix that doesn't call halt be enough? It would need
> > to keep the FPU busy too, wouldn't it? And maybe handle interrupts.
> > Hmmm...)
>
> See Robert Redelmeier's cpuburn:
>
> http://users.ev1.net/~redelm/

Cool. If nothing else, this is a much better starting point for further work
than starting from scratch...

> It is coded is assembly specificly to heat the CPU as much as possible. See
> the README for details, but it seems that floating point operations are
> tougher than integers and MMX can be even harder (depending on CPU model,
> of course). Not sure what kind of role SSE, SSE2, 3dNow! play these days.
> Perhaps Alan knows?

There's at least three seperate things that need testing here. memtest86
tests whether your memory is OK. CPUburn seems to do a good job testing
processor heat (not that I'm running it on my laptop, which doesn't seem to
have a thermal readout thingy anyway...)

The third thing (which started this thread) was memory bus. The new 3DNow
optimizations drove a memory bus into failure, and that IS processor
specific...

> The gcc compile is a good test for many other tests - it uses a lot of
> memory with complex pointers references (tests memory, and bit errors in
> pointers are likely to sig11 rather than produce subtle errors in output),
> stresses chipset somewhat (memory throughput), and cpu somewhat. But to
> test CPU overheating and nothing else, cpuburn should be a lot better.
> (Even seti@home is better as it uses FPU). Just run them an observe the
> sensors readings. Cpuburn gets several degrees higher.

The downside of a test like gcc is that it does test many things, meaning
when it fails you still don't know why.

memtest86 is great becuase it ONLY tests memory. CPUburn is similarly
specific. A memory bus buster would be a good tool to add to the mix. (DMA
is another common problem, but the more I look into it, the more it seems to
be dependent on whatever peripheral you're talking to, which is more
complication than I'm looking to bite off...)

The downside of memtest86 is that your system can pass it and still have an
obvious problem (for example, overclocking stresses both memory bus AND
heat...)

It might be possible to put all three testers into a menu where you could
switch on and off what you wanted to test, and run them overnight. That way,
if you are testing for three things (perhaps alternating tests every few
minutes?), and you get it to fail, you can switch some off to get more
specific tests to narrow down the problem...

> > the compile in a loop, add in a processor temperature detector daemon to
> > kill the test and HLT the system if the temperature went too high...
>
> Cpuburn exists when CPU miscalculates something (sign of overheat).
>
> I'm not sure if cpuburn is included in cerberus these days (istr it is),
> but a nice test set for memory, cpu, disk etc to run over night or over
> weekend to catch most of the hw faults would definetely be nice.

I've heard of ceberus but thought it was just a disk test suite... One more
thing to download and look into... (If the tests in it can be switched
on/off, maybe this is what I'm looking for...)

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/