Re: large page patch (fwd) (fwd)

Hubertus Franke (frankeh@watson.ibm.com)
Fri, 9 Aug 2002 14:32:38 -0400


On Friday 09 August 2002 11:20 am, Daniel Phillips wrote:
> On Sunday 04 August 2002 19:19, Hubertus Franke wrote:
> > "General Purpose Operating System Support for Multiple Page Sizes"
> > htpp://www.usenix.org/publications/library/proceedings/usenix98/full_pape
> >rs/ganapathy/ganapathy.pdf
>
> This reference describes roughly what I had in mind for active
> defragmentation, which depends on reverse mapping. The main additional
> wrinkle I'd contemplated is introducing a new ZONE_LARGE, and GPF_LARGE,
> which means the caller promises not to pin the allocation unit for long
> periods and does not mind if the underlying physical page changes
> spontaneously. Defragmenting in this zone is straightforward.

I think the objection to that is that in many cases the cost of
defragmentation is to heavy to be recollectable through TLB miss handling
alone.
What the above paper does is a reservation protocol with timeouts
which decide that either (a) the reserved mem was used in time and hence
the page is upgraded to a large page OR (b) the reserved mem is not used and
hence unused parts are released.
It relies on the fact that within the given timeout, most/mamy pages are
typically referenced.

In our patch we have the ZONE_LARGE into which we allocate the
large page. Currently they are effectively pinned down, but in 2.4.18
we had it backed by the page cache.

My gut feeling right now would be to follow the reservation based scheme,
but as said its a gut feeling.
Defragmenting to me seems a matter of last resort, Copying pages is expensive.
If you however simply target the superpages for smaller clusters, then its an
option. But at the same time one might contemplate to simply make
the base page 16K or 32K and page fault time simply map / swap / read /
writeback the whole cluster.
What studies has been done on this wrt to benefits of such an approach.
I talked to Ted Tso who would really like small super pages for better I/O
performance...

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/