Re: random reboots of diskless nodes - 2.4.7 (fwd)

n0ano@indstorage.com
Fri, 19 Oct 2001 09:52:07 -0600


My first guess would be power. You said you tested the power source.
Can you get ahold of a power line monitor with a strip chart recorder?
You might have a situation where the power is normally fine but for some
reason it could fluctuate at times and kick a machine into reset.
I assume you've eliminated the possibility of the janitor who unplugs
a machine to find an outlet for his floor polisher (don't laugh, it's
happened).

How's the temperature on the machines? Even if it's OK it would be
good to get another strip chart recorder on it to make sure the temp
stays within bounds 24hrs/day.

Also, do you have a serial console attached to each machine? This is
the only reliable way to make sure you have every console message that
came out right before the reboot.

On Tue, Oct 16, 2001 at 02:28:46AM +0200, Ryan Sweet wrote:
>
> I've posted about this problem before, but in the meantime I've managed to
> test under several different configurations to help rule out some possible
> causes.
>
> Short version: 2.4.7 on nfsroot diskless nodes randomly re-boots and I
> don't think it is a hardware problem or a problem with the server (which
> is stable). Rather than "try this, try that..." I (and more importantly
> my boss) would really like to find (and then hopefully fix) the root cause
> of the problem.
>
>
>...
>
> - upgraded (replaced) the power supply in all nodes
> - tested power source to computer room, moved to another computer
> room with better available power, etc...

-- 
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
n0ano@indstorage.com
Ph: 303/652-0870x117
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/