I've taken a (non-totally-careful) look at the page fault handler
and return from syscall paths with user-space OOM handling in
mind, at it looks to me like you could, in fact, implement a
userspace handler, using an approach something like this:
-- daemon uses mlockall(MCL_CURRENT|MCL_FUTURE)
-- daemon holds /dev/oom open, runs using RT priority, but sleeps
most of the time.
-- kernel pages for mmap() of /dev/oom are allocated at device
open time or a similarly non-critical time.
-- page fault handler invokes something that will write a summary
of process info to mmapped pages and wake userspace daemon.
it keeps track of the number of times the userspace daemon is
woken within a single OOM condition (mechanism TBD).
-- daemon will kill processes according to a user-defined algorithm,
with increasing aggression each time it is woken.
-- if the daemon doesn't alleviate the situation after N tries,
kernel resorts to current behavior (do_exit(SIGKILL) on process
in page fault handler).
This relies on the ability to kill another process and reclaim its
pages without allocating memory; I don't know for sure if this
works, but at a quick glance, at least kill_proc() appears to be
OK.
The use of a fixed-size set of kernel pages means that the full
breadth of /proc info won't be available, but this approach could
work nicely for common cases like:
if (!strcmp(s, "navigator") || !strcmp(s, "mozilla-bin"))
kill_bloated_hog();
Of course, it's probably much less complex to do the same thing
in a loadable kernel module. But I don't think the userspace
option should be dismissed out of hand without a statement to
the effect of "it is impossible to return control to a userspace
mlockall() daemon because X function uses get_free_page()." There
may be such a problem, but so far I haven't seen it cited.
miket
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/