Re: large page patch (fwd) (fwd)

David S. Miller (davem@redhat.com)
Sat, 03 Aug 2002 17:28:36 -0700 (PDT)


From: Linus Torvalds <torvalds@transmeta.com>
Date: Sat, 3 Aug 2002 10:35:00 -0700 (PDT)

David, you did page coloring once.

I bet your patches worked reasonably well to color into 4 or 8 colors.

How well do you think something like your old patches would work if

- you _require_ 1024 colors in order to get the TLB speedup on some
hypothetical machine (the same hypothetical machine that might
hypothetically run on 95% of all hardware ;)

- the machine is under heavy load, and heavy load is exactly when you
want this optimization to trigger.

Can you explain this difficulty to people?

Actually, we need some clarification here. I tried coloring several
times, the problem with my diffs is that I tried to do the coloring
all the time no matter what.

I wanted strict coloring on the 2-color level for broken L1 caches
that have aliasing problems. If I could make this work, all of the
dumb cache flushing I have to do on Sparcs could be deleted. Because
of this, I couldn't legitimately change the cache flushing rules
unless I had absolutely strict coloring done on all pages where it
mattered (basically anything that could end up in the user's address
space).

So I kept track of color existence precisely in the page lists. The
implementation was fast, but things got really bad fragmentation wise.

No matter how I tweaked things, just running a kernel build 40 or 50
times would fragment the free page lists to shreds such that 2-order
and up pages simply did not exist.

Another person did an implementation of coloring which basically
worked by allocating a big-order chunk and slicing that up. It's not
strictly done and that is why his version works better. In fact I
like that patch a lot and it worked quite well for L2 coloring on
sparc64. Any time there is page pressure, he tosses away all of the
color carving big-order pages.

I think we can at some point do the small cases completely transparently,
with no need for a new system call, and not even any new hint flags. We'll
just silently do 4/8-page superpages and be done with it. Programs don't
need to know about it to take advantage of better TLB usage.

Ok. I think even 64-page ones are viable to attempt but we'll see.
Most TLB's that do superpages seem to have a range from the base
page size to the largest supported superpage with 2-powers of two
being incrememnted between each supported size.

For example on Sparc64 this is:

8K PAGE_SIZE
64K PAGE_SIZE * 8
512K PAGE_SIZE * 64
4M PAGE_SIZE * 512

One of the transparent large page implementations just defined a
small array that the core code used to try and see "hey how big
a superpage can we try" and if the largest for the area failed
(because page orders that large weren't available) it would simply
fall back to the next smallest superpage size.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/