Re: Kexec, DMA, and SMP

Eric W. Biederman (ebiederm@xmission.com)
09 Feb 2003 11:39:27 -0700


Corey Minyard <cminyard@mvista.com> writes:

> The panic case is actually the most interesting for us. We are using bootimg
> with the MCL coredump to take a kernel core to memory and pick it up on the next
> boot.

[snip]

With respect to DMA and SMP handling for kexec on panic that case is
much trickier. A lot of the normal methods simply don't apply because
by definition in a panic something is broken, and that something may
be the code we need to cleanly shutdown the hardware. But I an not
ready to sacrifice a method that works well in a properly working
kernel just because the panic case can't use it.

In getting it working I suggest we start with the easy cases, where
DMA and SMP are not big issues. And then we can have a working
framework.

I am still digesting the crash dump code I have seen, but as far as I
can tell what it does is to compress the contents of memory, for
writing out later.

To handle the hard cases for kexec on panic I would recommend the
following.

- Place the recovery code in a reserved area of memory that the normal
kernel will not touch, and actually run the code there. This
trivially solves the DMA problem because the hardware is not DMA'ing

- Setup the kernel that does the recovery so that the pool of memory
it uses for dynamic allocations is also in the reserved area of
memory so that it is equally free of DMA dangers.

- Modify the kernel that does the recovery so it can be run at
different physical address from the standard kernel, so it will not
need to be moved out of the reserved area of memory.

- Modify the kernel that does the recovery to not care about
which cpu in a SMP system it comes up on first.

- Modify the kernel that does the recovery so that it is very robust
in reinitializing devices. So it can cope with devices in a random
state. Though most devices can be handled by simply ignoring them.

- Possibly preserve in the reserved area a separate copy of the tables
ACPI/MP/etc that the kernel needs for coming up. I actually don't
think this needs to happen as the kernel preserves those in place
already.

At that point I believe a full memory core dump can be achieved
without needing to do anything except to jump to the other kernel
on panic. All of the memory can be preserved because the kexec case
would not have touched it.

I find this very attractive because it can be done with a very low
impact on the primary kernel whose panic we want to capture, plus it
is an extremely robust solution.

The one piece I don't know about is how to prioritize which pieces of
memory are written out first. It is certainly a desirable feature
but do we need that, if we can preserve everything? Or is it easy
enough to get the prioritizing information that we don't care.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/