Re: Faster reboots (and a better way of taking crashdumps?)

Jeremy Jackson (jerj@coplanar.net)
Fri, 5 Apr 2002 18:55:33 -0800


----- Original Message -----
From: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
Sent: Friday, April 05, 2002 5:48 PM

> I need to avoid going through the BIOS ... this is a
> multiquad NUMA machine, and it doesn't take kindly
> to the reboot through the BIOS for various reasons.
> It also takes about 4 minutes, which is a pain ;-)
>
> I have source code access to our BIOS if I really wanted,
> I just want to avoid modifying it if possible.

well keep in mind that the fastest LinuxBIOS boot is 3 seconds...
a large part of the boot time on most PCs is the BIOS setting up
DOS support and painting silly logos on the screen, all of which
can go away. I'm guessing your NUMA system has a bit more
to do at this stage due to the hardware, but still...

>
> > there are patches where a kernel can load another
> > kernel, also.
>
> Hmmm ... sounds interesting ... any pointers?

LOBOS,

http://www.acl.lanl.gov/linuxbios/papers/index.html
http://www.usenix.org/publications/library/proceedings/als2000/minnichLOBOS.
html

two kernel monte,

http://www.scyld.com/products/beowulf/software/monte.html

There's also suspend to disk support, which is closely related.
Kind of a restartable crash dump, without the crash:

http://falcon.sch.bme.hu/~seasons/linux/swsusp.html

>
> > As for taking crashdumps on the way up, I believe
> > (SGI's ?) linux kernel crash dumps does *exactly*
> > this.
>
> I was under the impression that most BIOSes reset
> memory on reboot, so this was impossible during a
> BIOS reboot?

oss.sgi.com seems to be down today... but iirc,
it doesn't boot through bios, but stashes some critical state
in a buffer previously reserverd, uses one of the above methods
to boot a new kernel, lets this kernel do the dump, then boots
through the bios to make sure hardware is completely restored
after the crash. I'm sure it could be tailored to suit though.

I'm currently researching combining the two, to create a LinuxBIOS
firmware debug console, which will allow complete crash dump to
be taken after a hardware reset, with the smallest possible Heisenburg
effect, aside from a hardware debugger.

Basically when the kernel panics, it dumps the CPU registers and
resets the CPU. The firmware console makes no alterations to
the state of the hardware, instead running a modified kgdb stub
like routine, possibly without even touching RAM. Also, if an SMI
button is available, this can be used as a hardware break switch,
allowing panics to use an even less invasive HLT instruction.

Jeremy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/