Re: [lkcd-devel] Re: What's left over.

Werner Almesberger (wa@almesberger.net)
Tue, 5 Nov 2002 15:00:48 -0300


Suparna Bhattacharya wrote:
> Yes, we are putting [MCORE] in as one of the alternative dump targets
> available.

Great !

> Its not quite ready yet and we need something like kexec to be
> available which we can use on Intel systems to achieve the softboot
> (the acceptance status of that still doesn't seem to be clear),

Yes, I've just checked with Eric, and he hasn't received any
indication from Linus so far. I posted a reminder to linux-kernel.
I'd really hate to see kexec miss 2.6.

> Why do we even consider the other options when we are doing
> this already ? Well, as we discussed earlier there's non-disruptive
> dumps for one, where this wouldn't work.

But they're very different anyway, aren't they ? I mean, you could
even implement them (well, almost) from user space, with today's
kernels.

> The other is that before overwriting
> memory we need to be able to stop all activity in the system for certain
> (system may appear hung/locked up) and I'm not fully certain about
> how to do this for all environments. Maybe an answer lies in
> rethinking some parts of the algorithm a bit.

This is certainly the hairiest part, yes. I think we have about
four types of devices/elements to worry about:

- those that just sit there, and never talk unless spoken to
- those that may generate interrupts
- those that DMA if you ask them nicely
- those that DMA when they feel like it (e.g. copy an incoming
network packet to the next buffer in the free list)

The latter are the real problem. I see the following possibilities
for dealing with them:

- faith-based computing: pray that nothing bad will befall your
system :-)
- de-activate them individually. There should be a lot of work
that can be shared with power management. And that's one of
the reasons why I think the memory target should be available
early, or convergence will take forever.
- try to reset them, without necessarily knowing what they are
or what they do. I don't know is there is a useful way for
resetting the PCI bus by software, but if there is one, this
looks like the most promising strategy to me, even if it may
be somethat lacking in elegance.
- if all else fails, maybe introduce an "unsafe" memory type
for potential DMA targets of unpredictable devices, that will
not be re-used. I hope we won't need this, though. (But in case
such a memory type gets introduced by the memory-scrubbers, at
least you could blame _them_ :-)

> The general feeling here is that a one solution fits all thing
> may not work best right now ... and hence the focus on an interface
> based approach that gives us the needed flexibility.

Yes, this is certainly important. I just think that the "memory
target" concept is closer to a general solution than disk dumps.
But there are always those 5% with special needs, and it's good
if they can use the same framework.

Thanks,
- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/