Re: OOM killer???

Andreas Dilger (adilger@turbolinux.com)
Tue, 27 Mar 2001 12:55:50 -0700 (MST)


Martin Dalecki writes:
> Ingo Oeser wrote:
> > So as Rik stated: The OOM killer cannot suit all people, so it
> > has to be configurable, to be OOM kill, not overkill ;-)
>
> Irony: Why then not store this information permanently - inside
> the UID of the application?

Because in some cases (large companies and such) the UID is centrally
controlled across all machines in the company, so there are > 100 (or
500 or 1000) "system" UIDs. At one company I did work for, there were
dozens (maybe > 100) oracle instances alone (each with different UID
and passwords for security), and lots more "system" application UIDs,
each unique.

Encoding more information into the UID is getting back to the bad old
days of "uid 0" is can do anything, rather than the capability model we
are working towards. Even so, encoding process killability info in the
UID is _still_ not putting policy in user space, because if you don't
like how the OOM killer works you still need to recompile and reboot.

Having a configurable OOM killer is not overkill, IMHO, because it is
only called in very rare cases (i.e. OOM is hopefully a rare event),
so it is definitely not on the fast path. I'm sure people will agree
that spending a few extra cycles to kill the correct process is far
better than killing a lot of incorrect processes quickly.

Every time this subject comes up, I point to AIX and SIGDANGER - a signal
sent to processes when the system gets OOM. If the process has registered
a SIGDANGER handler, it has a chance to free cache and such (or do a clean
shutdown), otherwise the default signal handler will kill the process.

SIGDANGER would fix the original problem (killing numerical methods
application running for weeks) perfectly - the application can freely
allocate cache memory to speed up the calculations. When system gets
OOM (for whatever reason), it sends SIGDANGER to applications first and
they can free buffers or do safe shutdown, and this may get system out
of OOM case without having to kill anything.

Granted, I'm not against fixing the VM to reducing OOM conditions in the
first place. Having SIGDANGER still gives the application a chance to
save itself before it is killed, which none of the OOM changes have
addressed at all. It is _still_ possible to get a system into OOM from
network buffers and such, regardless of whether an application is
calling malloc() returns NULL or not.

Also, having a SIGDANGER handler _could_ reduce a process "badness"
value when looking for processes to SIGKILL, when calling all of the
SIGDANGER handlers has not freed enough memory to get out of OOM. This
assumes that programs which register SIGDANGER handlers are important,
rather than malicious (in which case your system has other problems).

Cheers, Andreas

-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/