Re: Minutes from Feb 21 LSE Call

Daniel Phillips (phillips@arcor.de)
Thu, 27 Feb 2003 04:48:42 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Art Haas: "[PATCH] C99 initializers for drivers/mtd files"
Next message: Art Haas: "[PATCH] C99 initializers for drivers/mtd/chips"
Previous message: Daniel Jacobowitz: "Re: Invalid compilation without -fno-strict-aliasing"

On Wednesday 26 February 2003 17:02, Rik van Riel wrote:
> On Tue, 25 Feb 2003, Martin J. Bligh wrote:
> > It seemed, at least on the simple kernel compile tests that I did, that
> > all the long chains are not anonymous. It killed 95% of the space issue,
> > which given the simplicity of the patch was pretty damned stunning. Yes,
> > there's a pointer per page I guess we could kill in the struct page
> > itself, but I think you already have a better method for killing mem_map
> > bloat ;-)
>
> Also, with copy-on-write and mremap after fork, doing an
> object based rmap scheme for anonymous pages is just complex,
> almost certainly far too complex to be worth it, since it just
> has too many issues. Just read the patches by bcrl and davem,
> things get hairy fast.
>
> The pte chain rmap scheme is clean, but suffers from too much
> overhead for file mappings.

There is a lot of redundancy in the rmap chains that could be exploited. If
a pte page happens to reference a group of (say) 32 anon pages, then you can
set each anon page's page->index to its position in the group and let a
pte_chain node point at the pte of the first page of the group. You can then
find each page's pte by adding its page->index to the pte_chain node's pte
pointer. This allows a single rmap chain to be shared by all the pages in
the group.

This much of the idea is simple, however there are some tricky details to
take care of. How does a copy-on-write break out one page of the group from
one of the pte pages? I tried putting a (32 bit) bitmap in each pte_chain
node to indicate which pte entries actually belong to the group, and that
wasn't too bad except for doubling the per-link memory usage, turning a best
case 32x gain into only 16x. It's probably better to break the group up,
creating log2(groupsize) new chains. (This can be avoided in the common case
that you already know every page in the group is going to be copied, as with
a copy_from_user.) Getting rid of the bitmaps makes the single-page case the
same as the current arrangement and makes it easy to let the size of a page
be as large as the capacity of a whole pte page.

There's also the problem of detecting groupable clusters of pages, e.g., in
do_anon_page. Swap-out and swap-in introduce more messiness, as does mremap.
In the end, I decided it's not needed in the current cycle, but probably
worth investigating later.

My purpose in bringing it up now is to show that there are still some more
incremental gains to be had without needing radical surgery.

> As shown by Dave's patch, a hybrid system really is simple and
> clean, and it removes most of the pte chain overhead while still
> keeping the code nice and efficient.
>
> I think this hybrid system is the way to go, possibly with a few
> more tweaks left and right...

Emphatically, yes.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Art Haas: "[PATCH] C99 initializers for drivers/mtd files"
Next message: Art Haas: "[PATCH] C99 initializers for drivers/mtd/chips"
Previous message: Daniel Jacobowitz: "Re: Invalid compilation without -fno-strict-aliasing"