Re: What to expect with the 2.6 VM

William Lee Irwin III (wli@holomorphy.com)
Wed, 2 Jul 2003 14:40:32 -0700


On Wed, Jul 02, 2003 at 07:47:00PM +0200, Andrea Arcangeli wrote:
> actually other more invasive ways could be to move rmap into highmem.
> Also the page clustering could also hide part of the mem overhead by
> assuming the pagetables to be contiguos, but page clustering isn't part
> of mainline yet either.

BSD-style page clustering preserves virtual contiguity of a software
page, but the new things don't; for ABI preservation, virtually
discontiguous, partial, and misaligned mappings of pages are handled.

The desired behavior can in principle be partially recovered by
scanning within a software page size -sized "blast radius" for each
chain element and only chaining enough elements to find the relevant
ptes that way.

As for remap_file_pages(), either people are misunderstanding or
ignoring me. There is a lovely three-step method to handling it:

(a) fix the truncate() bug; it is just a literal bug. There are at
least 3 different ways to fix it:
(i) tag vmas touched by remap_file_pages() for exhaustive search
(ii) do a cleanup pass after the current vmtruncate() doing
try_to_unmap() on any still-mapped pages
(iii) drop the current vmtruncate() entirely and do try_to_unmap()
on each truncated page
(ii) and (iii) do the locks in the wrong order, so some still-
mapped but truncated page could be out there; this could be
handed by Yet Another Cleanup Pass that does (i) or by tolerating
the new state elsewhere in the VM. There's plenty of ways to
code this and a couple choices of semantics (i.e make it
failable or reliable).

(b) implement the bits omitting pte_chains for mlock()'d mappings
This is obvious. Yank them off the LRU, set a bitflag, and
reuse page->lru for a counter.

(c) redo the logic around page_convert_anon() and incrementally build
pte_chains for remap_file_pages().
The anobjrmap code did exactly this, but it was chaining
distinct user virtual addresses instead.
(i) you always have the pte_chain in hand anyway; the core
is always prepped to handle allocating them now
(ii) instead of just bailing for file-backed pages in
page_add_rmap(), pass it enough information to know
whether the address matches what it should from the
vma, and start chaining if it doesn't
(iii) but you say ->mapcount sharing space with the chain is a
problem? no, it's not; again, take a cue from anobjrmap:
if a file-backed page needs a pte_chain, shoehorn
->mapcount into the first pte_chain block dangling off it

After all 3 are done, remap_file_pages() integrates smoothly into the VM,
requires no magical privileges, nothing magical or brutally invasive
that would scare people just before 2.6.0 is required, and the big
apps can get their magical lowmem savings by just mlock()'ing _anything_
they do massive sharing with, regardless of remap_file_pages().

Does anyone get it _now_?

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/