There is a lot of redundancy in the rmap chains that could be exploited. If
a pte page happens to reference a group of (say) 32 anon pages, then you can
set each anon page's page->index to its position in the group and let a
pte_chain node point at the pte of the first page of the group. You can then
find each page's pte by adding its page->index to the pte_chain node's pte
pointer. This allows a single rmap chain to be shared by all the pages in
the group.
This much of the idea is simple, however there are some tricky details to
take care of. How does a copy-on-write break out one page of the group from
one of the pte pages? I tried putting a (32 bit) bitmap in each pte_chain
node to indicate which pte entries actually belong to the group, and that
wasn't too bad except for doubling the per-link memory usage, turning a best
case 32x gain into only 16x. It's probably better to break the group up,
creating log2(groupsize) new chains. (This can be avoided in the common case
that you already know every page in the group is going to be copied, as with
a copy_from_user.) Getting rid of the bitmaps makes the single-page case the
same as the current arrangement and makes it easy to let the size of a page
be as large as the capacity of a whole pte page.
There's also the problem of detecting groupable clusters of pages, e.g., in
do_anon_page. Swap-out and swap-in introduce more messiness, as does mremap.
In the end, I decided it's not needed in the current cycle, but probably
worth investigating later.
My purpose in bringing it up now is to show that there are still some more
incremental gains to be had without needing radical surgery.
> As shown by Dave's patch, a hybrid system really is simple and
> clean, and it removes most of the pte chain overhead while still
> keeping the code nice and efficient.
>
> I think this hybrid system is the way to go, possibly with a few
> more tweaks left and right...
Emphatically, yes.
Regards,
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/