Re: 64GB NUMA-Q after pgcl

William Lee Irwin III (wli@holomorphy.com)
Sun, 30 Mar 2003 21:22:14 -0800


On Sun, Mar 30, 2003 at 08:27:29PM -0800, William Lee Irwin III wrote:
> I can answer more questions about what goes on to make this happen if
> need be.

I'm just going to start explaining now.

----------------------------------------- page clustering turns the
| struct page | relationship between base
----------------------------------------- pages and ptes into 1:N.
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ struct pages remain of the
| | | | | | | | | | same size, but track a
----------------------------------------- larger area and are fewer
|PTE|PTE|PTE|PTE|PTE|PTE|PTE|PTE|PTE|PTE| in number. ptes still point
----------------------------------------- to the same size areas.

Anonymous pages want smaller than PAGE_SIZE pieces at a time, in fact
exactly 4KB (MMUPAGE_SIZE) to satisfy any particular fault, so we scan
around looking for PTE's to point at as many of the 4KB pieces as we can.

-------------------------------
page
-------------------------------
piece | piece | piece | piece |
-------------------------------
\ \ \ \
\ \ \ \
\ \ \ \
\ \ \ \
\ \ \ \
\ \ \ \
\ \ \ \
-------------------------------------------------------------
PTE | PTE | PTE | PTE | PTE | PTE | PTE | PTE
-------------------------------------------------------------

Miscellaneous side effects happen, like follow_page() and
get_user_pages() need to return pfn's instead of struct pages. Various
address calculations start needing unit conversions. Pagecache lookups
need to add in "subpfn offsets" relative to start of the base page. And
so on and so forth.

The net result should be (and was in Hugh's code) that there is zero
impact on binary compatibility. The smaller EXEC_PAGE_SIZE a.k.a.
MMUPAGE_SIZE is 100% faithfully emulated and the entire affair is fully
transparent to userspace. The maximum filesystem blocksize is increased.
And various O(pages) traversals get linear speedups, and various
O(pages) -sized data structures get linear size reductions.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/