Re: large page patch

Martin J. Bligh (Martin.Bligh@us.ibm.com)
Thu, 01 Aug 2002 21:38:49 -0700


Direct email seemed to get seperated from the cc somewhere
along the line ... repeated for others on l-k (sorry Linus ;-))

> I doubt that. At least the naive math says that it should get
> exponentially less likely(*) to merge up/down for each level, so by the
> time you've reached order-10, any merging is already in the noise and
> totally unmeasurable.

Yeah, it's probably unmeasurable, just ugly ;-)
I guess it's more that it seems unnecessary ... if ia64 are the
only people that need it to be that ludicrously large, it'd be
better if they just did it in their arch tree. Just because they
could theoretically have 256Mb pages, do they really *need* them? ;-)

>> It also makes the config_nonlinear stuff harder (or we have to
>> # ifdef it, which just causes more unnecessary differentiation).
>
> Hmm.. This sounds like a good point, but I thought we already did all
> the math relative to the start of the zone, so that the alignment thing
> implied by MAX_ORDER shouldn't be an issue.
>
> Or were you thinking of some other effect?

The config_nonlinear stuff relies on a trick ... we shove physically
non-contig areas into the buddy allocator, but the buddy allocator
is guaranteed to return phys contig areas. That all works just fine
as long as the blocks we put in are of size greater than or equal to
2^MAX_ORDER * PAGE_SIZE, which is currently 4Mb. A 4Mb alignment is
not a problem for any known machine, but I think 256Mb may well be.
It's kind of a dirty trick, but it's a really neat, efficient
solution that gets rid of lots of zone balancing and pgdat proliferation.
It also lets me spread around ZONE_NORMAL across nodes for ia32 NUMA.

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/