Re: the oom killer

Andrew Morton (akpm@zip.com.au)
Thu, 11 Apr 2002 12:41:24 -0700


Andrea Arcangeli wrote:
>
> > How did you fixed this specific problem?
>
> I didn't really fixed it, it's just that the problem never existed in my
> tree. I don't" min += z->pages_min" and so the
> check_classzone_need_balance path sees exactly the same state of the VM
> as the main allocator, so if it breaks the loop the main allocator will
> go ahead just fine.

Yup, we need to pull that fix into 2.4.

wrt the oom-killer, I think we can keep everyone happy by
implementing both solutions ;) If the aa approach reaches
the point where it will fail a page allocation we run the
oom-killer, yield and then have another go at the allocation.
Do that a couple of times and *then* fail the page allocation.

This fixes the problem where the VM will (effectively) kill
a randomly chosen process rather than a deliberately chosen
one, and fixes the lockup problem which Andrea identifies,
where the victim process is stuck somewhere in-kernel
ignoring signals.

It'd be nice if the second and subsequent passes of the oom
killer were able to note that a kill was already outstanding,
so they don't just kill the same process all the time. Or
perhaps the oom killer should just skip over processes which
are in TASK_UNINTERRUPTIBLE. Probably this is getting a
little too elaborate. Generally, the oom killer works OK
as-is (that is, it kills stuff and the machine recovers.
I won't vouch for the accuracy of its targetting).

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/