[PATCH 4/6] "not nearly so minimal" rmap for 2.5.26

Craig Kulesa (ckulesa@as.arizona.edu)
Fri, 19 Jul 2002 02:22:26 -0700 (MST)


In the usual place is a series of rmap patches, collected and ported from
various contributions...

http://loke.as.arizona.edu/~ckulesa/kernel/rmap-vm/2.5.26/

The first three are essentially the same as Andrew & Rik's posted patches
for 2.5.26, minus the arm changes:

2.5.26-rmap-1-core
2.5.26-rmap-2-lrufix
2.5.26-rmap-3-optimize

This, the fourth in the series, brings the rmap VM approximately to the
level of Rik van Riel's rmap-13b patches -- ex. the 2.4-ac kernel tree,
in terms of basic page replacement, page aging, and lru list logic.

The basic components of the patch, which might make a sensible splitting
arrangement someday (?), not in order:

- make dquot, inode and dentry cache shrinking functions return
the number of pages shrunk

- add rss limit enforcement and throttling; edit Dave McCracken's
optimizations to add this to rmap.

- alterations for numa

- page aging of active list, low-level background aging.
Heuristic akin to Linux 2.0/FreeBSD.

- preparatory changes to header files: mm_inline.h macros for LRU
list management, for_each_zone(), and other handy bits.

- major LRU list shakeup. shrink_cache functionally becomes
page_launder, which cleans the inactive (dirty) list and creates
a list of freeable pages in inactive (clean), like the 'cache'
list in FreeBSD. page_launder makes inactive pages "clean",
"clean" pages can be freed immediately or directly reclaimed
without going through the usual page allocation. Pages on
the active list are aged and sent to inactive (dirty) when they
are cold. Etc, etc... Lists are per-zone. Min, low, high,
plenty watermarks dictate when action should be taken to refill
the various lists.

- Drop_behind takes "already passed" pages in the readahead buffer
and deactivates them to the clean list. If we need them again,
they're easily reclaimed. If not, they make easy pickings for
reclaim.

Or something like that. I have *not* done any kind of patch splitting,
as significant changes undoubtedly lie ahead. One seems pretty near:

- Andrew Morton proposed a series of patches to reduce
pagemap_lru_lock contention -- in essence, they move a lot of
the page management VM functions away from processing one page
at a time, to batch processing.

http://mail.nl.linux.org/linux-mm/2002-07/msg00009.html

Implementation of this notion for the full rmap patch also looks
very interesting. In particular:

a) reclaim_page can reclaim in batch mode from the clean
list. Rik made the point that it might be good to
drop direct reclaim and simply free the pages. This
simplifies page_alloc.c logic a bit, and ensures that
page flags need only be updated in rmqueue(), just
like vanilla-2.5-latest. Right now, we need it in
both rmqueue and reclaim_page for direct-reclaim --
took me two days to figure that one out!

b) page_launder_zone is a great candidate for
batching, much in the same sense as akpm is batching
shrink_cache(). This is similar to its current
behavior, but we just won't hold the pagemap_lru_lock
except to load up on pages to scan.

c) Same deal for refill_inactive_zone().

Once Andrew has stabilized his lock contention patches, it'll be
interesting to see what they can do for the full rmap vm.

One significant question is large pages. Batching is great for 4K pages,
and indeed the motivation is to get some of the good behavior of larger
page sizes, without having to actually do that. But if large pages are
necessary to some folks, how do we (or should we) nicely degrade to
unbatched processing? Batch processing 4M pages sounds a bit on the
coarse side! :)

Give the patches a try, try to break them, send me feedback and fixes. ;)

Craig Kulesa
Steward Observatory
Univ. of Arizona

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/