Re: [lkcd-devel] Re: What's left over.

Suparna Bhattacharya (suparna@in.ibm.com)
Wed, 6 Nov 2002 12:08:54 +0530


On Tue, Nov 05, 2002 at 10:25:35PM -0800, Linus Torvalds wrote:
>
> On 5 Nov 2002, Eric W. Biederman wrote:
> >
> > In replying to another post by Al Viro I managed to think this through.
> > kexec needs:
>
> Note that kexec doesn't bother me at all, and I might find myself using it
> myself.
>
> >From a sanity standpoint, I think the thing already _has_ a system call,
> though: clearly "sys_reboot()" is the place to add a case for "reboot into
> this image". No? That's where we shut down devices anyway, and it's the
> sane place to say "reboot into the kexec image"
>
> Which still leaves you with a real sys_kexec() to actually _load_ the
> image, or course. I think loading of the image should be a totally
> separate event from the actual booting of the image, since we may want to
> load the image early, then do various user-level shutdown (unmounting
> etc), and then reboot.
>
> Right now the kexec() stuff seems to mix up the loading and rebooting, but
> I didn't take a very deep look, maybe I'm wrong.
>
> Anyway, I don't really get why the kexec() system call would not just be
>
> void *kexec_image = NULL;
> unsigned long kexec_size;
>
> int sys_kexec(void *uaddr, size_t len)
> {
> void *new;
>
> if (!capable(CAP_ADMIN))
> return -EPERM;
>
> /* Get rid of old image if any.. */
> if (kexec_image) {
> vfree(kexec_image);
> kexec_image = NULL;
> }
>
> /* Zero length just meant "get rid of it" */
> if (!len)
> return 0;
>
> if (!access_ok(VERIFY_READ, uaddr, len))
> return -EFAULT;
>
> new = vmalloc(len);
> if (!new)
> return -ENOMEM;
>
> if (memcpy_from_user(new, uaddr, len)) {
> vfree(new);
> return -EFAULT;
> }
>
> kexec_image = new;
> kexec_size = len;
> return 0;
> }
>
> and be done with it that way? Then the actual "reboot" (and that would be
> in the existing "sys_reboot()") basically just does something like
>
> memcpy(kernelbase, kexec_image, kexec_size);
>
> at the very end (while obviously having to be careful about itself being
> out of the way. It can avoid the page table issue by using the "page *"
> array that vmalloc uses internally anyway: see "area->pages[]" in
> vmalloc).
>
> Note that the two-phase boot means that you can load the new kernel early,
> which allows you to later on use it for oops handling (it's a bit late to
> try to set up the kernel to be loaded at that time ;)

Yes, that's exactly what we need to support a soft-boot based dump
mechanism, much like the Mission Critical folks split up the bootimg
syscall to do the early load on a sane system, and the actual soft-boot
at crash time. And it fits in naturally as you point out ..

Regards
Suparna

>
> Linus
>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/