Re: OOM killer???

Sean Hunter (sean-lk@dev.sportingbet.com)
Thu, 29 Mar 2001 13:02:38 +0100


On Thu, Mar 29, 2001 at 01:01:54PM +0200, Guest section DW wrote:
> [Never use planes where the company's engineers spend their
> time designing algorithms for selecting which passenger
> must be thrown out when the plane is overloaded.]

This is (as far as I can see) a fantastically specious argument. A plane is
designed to function in an entirely constrained mode of operation and in an
entirely well-understood and circumscribed problem space, whereas a linux host
is a general-purpose device which can be used for many different applications.

The reason the aero engineers don't need to select a passanger to throw out
when the plane is overloaded is simply that the plane operators do not allow
the plane to become overloaded. If I put a 100 people in a trilander it may
take off, but won't fly, and will probably crash. The plane's designers don't
have to do anything about that- we do.

Furthermore, why do you suppose an aeroplane has more than one altimeter,
artifical horizon and compass? Do you think it's because they are unable to
make one of each that is reliable? Or do you think its because they are
concerned about what happens if one fails _however unlikely that is_.

In fact, aeroplane engineers do design in ways of mitigating the effects of all
kinds of failures, including lessening the impact of a crash (directly
analogous to our OOM killer). For example, they provide means of jettisonning
fuel prior to crash landing to attempt to minimise explosions.

Risk management is about lessening impact as well as lessening probability. If
something is important, you don't only make it work as well as you can, you
mitigate the effect of failure. A reliable system is not just a strong belt,
it is belt, braces, suspenders and bicycle clips.

I have seen the OOM killer in operation three times on our production servers.
In each case it kept the machine alive in the face of hostile runaway
processes. I don't want to see things killed, but if that is the only way to
keep the host alive, I vote to keep it alive.

When I'm on a plane, I want more than one engine _and_ lifejackets.

Sean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/