Re: VM nuisance

Szabolcs Szakacsits (szaka@f-secure.com)
Fri, 17 Aug 2001 16:34:02 +0300 (MET DST)


On Tue, 14 Aug 2001, Helge Hafting wrote:
> dean gaudet wrote:
> > i would happily give up 10 to 20% system resources for checkpoint
> > overhead if it meant that i'd be that much closer to a crashproof
> > system.

It's not necessarily needed so much and not even checkpointing using
(optional) non-overcommiting. See e.g. the Solaris numbers for yourself,
http://www.google.com/search?q=cache:http://lists.openresources.com/NetBSD/tech-userlevel/msg00722.html
All big guys support this optionally, sometimes even in a finer graded
level, like per process settable by developer overjudgable (sp?) by
admin (AIX) or using virtual swap space where the degree of the memory
overcommitment controllable (IRIX), etc. Linux had the "quasi"
non-overcommiting [it didn't reserved the demanded memory, just checked
if there is enough free(able) in user space allocation time]
controllable by /proc/sys/vm/overcommit_memory in 2.2 but in 2.4
vm_enough_memory() overestimates freeable memory (in contrast to 2.2
kernels) so basically it's useless - this is one of the reasons why OOM
so trendy topic recently.

Although non-overcommit prevents running out of VM but when VM is full
then system can either livelock or start arbitrary process killing so
non-overcommit becomes useless. How others solved this? They reserve
some VM for root so he can act whatever he wants. Well written apps (I
could mention e.g. apache) don't really care about system is in an OOM
situation - they happily do their jobs just as before (proved in
practice ;)

When root reserved VM is also used up, welcome OOM killer - however with
the above two protection bar the chance for this is pretty around 0 if
admin don't prefer running stuffs as root.

Note, these are optional for those who are willing to sacrifice a couple
of percent system resources.

> > so why not just use the most simple OOM around: shoot the first app which
> > can't get its page. app writers won't like it, and users won't like it
> > until the app writers fix their bugs, but then nobody likes the current
> > situation, so what's the difference?
> It used to be like that. Unfortunately, the first app unable to
> get its page might very well be init, and then the entire system goes
> down in flames.

Well, I ported the 2.4 OOM killer to 2.2.19 and added reserved root VM
http://mlf.linux.rulez.org/mlf/ezaz/reserved_root_memory.html

It works like it kills the process chosen by OOM killer when the first
app can't get its page in page fault. OOM killing in 2.4 works
differently. Apps loop forever in __alloc_pages() until they can get a
page or out_of_memory() decides its time to kill somebody. So whenever
there is some VM tuning, out_of_memory() should be tuned. Of course it's
usually missed. Futhermore it's also a heuristic, not an "exact"
decision made by e.g. 2.2 kernels. So in short, IMHO it will never work.
I asked explanation for a couple of times for this 2.4 behavior but
nobody bothered. I think partly it's because of the aggressive caching.
One easy solution could be to drop out_of_memory() completely, put back
oom_kill() to page fault from kswapd() and make tunable the number of
looping in __alloc_pages().

> The real solution is to have enough memory for the task at hand.

Define to the user the "enough memory" when he wants to open different
kind of documents, run scientific applications with different size data
sets or whatever. What would you expect from your computer either "Hey,
your resources is not enough for this task" or just crash the
application?

> Failing that, get so much swap space that people will be happy when

The "buy more disk, buy more RAM!" kind of answers were one of the
reasons Linux user base growth so big as it is today escaping from
the old advisers ...

2.4 is killer if expertise is given [not to install Linux but to
carefully setup the box for its job] but it fails otherwise because of
its OOM handling.

Szaka

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/