Re: random reboots of diskless nodes - 2.4.7 (fwd)

Keith Owens (kaos@ocs.com.au)
Tue, 16 Oct 2001 14:58:37 +1000


On Tue, 16 Oct 2001 02:28:46 +0200 (CEST),
Ryan Sweet <rsweet@atos-group.nl> wrote:
>Questions:
>- what the heck can I do to isolate the problem?

Debugger over a serial console.

>- why would the system re-boot instead of hanging on whatever caused it to
>crash (ie, why don't I see an oops message?)

Probably triple fault on ix86, which forces a reboot. That is, a fault
was detected, trying to report the fault caused an error which caused a
third error. Say goodnight, Dick. The other main possibility is a
hardware or software watchdog that thinks the system has hung and is
forcing a reboot, do you have one of those?

>- how can I tell the system not to re-boot when it crashes (or is it
>crashing at all???)

If it is a triple fault, you have to catch the error before the third
fault. Tricky.

>- is it worth trying all the newer kernel versions (this does not sound
>very appealing, especially given the troubles reported with 2.4.10 and
>also the split over which vm to use, etc..., also the changelogs don't
>really point to anything that appears to precisely describe my problem)?

Maybe. OTOH if you wait until you capture some diagnostics it will
give you a better indication if the later kernels actually fix the
problem.

>- if I patch with kgdb and use a null modem connection from the gateway to
>run gdb can I expect to gain any useful info from a backtrace?

It is definitely worth trying kgdb or kdb[1] over a serial console. I
am biased towards kdb (I maintain it) but either are worth a go.

Unfortunately the most common triple fault is a kernel stack overflow
and the ix86 kernel design has no way to recover from that error, the
error handler needs stack space to report anything, both kgdb and kdb
need stack space as well. If you suspect stack overflow, look at the
IKD patch[2], it has code to warn about potential stack overflows
before they are completely out of hand.

[1] ftp://oss.sgi.com/projects/kdb/download/ix86, old for 2.4.7.
[2] ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/ikd/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/