Re: large page patch (fwd) (fwd)

David Mosberger (davidm@napali.hpl.hp.com)
Sat, 3 Aug 2002 12:30:05 -0700


>>>>> On Sat, 3 Aug 2002 10:35:00 -0700 (PDT), Linus Torvalds <torvalds@transmeta.com> said:

Linus> How well do you think something like your old patches would
Linus> work if

Linus> - you _require_ 1024 colors in order to get the TLB speedup
Linus> on some hypothetical machine (the same hypothetical machine
Linus> that might hypothetically run on 95% of all hardware ;)

Linus> - the machine is under heavy load, and heavy load is exactly
Linus> when you want this optimization to trigger.

Your point about wanting databases have access to giant pages even
under memory pressure is a good one. I had not considered that
before. However, what we really are talking about then is a security
or resource policy as to who gets to allocate from a reserved and
pinned pool of giant physical pages. You don't need separate system
calls for that: with a transparent superpage framework and a
privileged & reserved giant-page pool, it's trivial to set up things
such that your favorite data base will always be able to get the giant
pages (and hence the giant TLB mappings) it wants. The only thing you
lose in the transparent case is control over _which_ pages need to use
the pinned giant pages. I can certainly imagine cases where this
would be an issue, but I kind of doubt it would be an issue for
databases.

As Dave Miller justly pointed out, it's stupid for a task not to ask
for giant pages for anonymous memory. The only reason this is not a
smart thing overall is that globally it's not optimal (it is optimal
only locally, from the task's point of view). So if the only barrier
to getting the giant pinned pages is needing to know about the new
system calls, I'll predict that very soon we'll have EVERY task in the
system allocating such pages (and LD_PRELOAD tricks make that pretty
much trivial). Then we're back to square one, because the favorite
database may not even be able to start up, because all the "reserved"
memory is already used up by the other tasks.

Clearly there needs to be some additional policies in effect, no
matter what the implementation is (the normal VM policies don't work,
because, by definition, the pinned giant pages are not pageable).

In my opinion, the primary benefit of the separate syscalls is still
ease-of-implementation (which isn't unimportant, of course).

--david
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/