Patrick Mochel has another patch that adds another zone on x86: the "low
memory" zone for the 0-1MB area, which is special for some things, notably
real mode bootstrapping (ie the SMP stuff could use it instead of the
current special-case allocations, and Pat needs it for allocating low
memory pags for suspend/resumt).
I'd like to see what these two look like together.
But even more I'd like to see a more dynamic zone setup: we already have
people talking about adding memory dynamically at run-time on some of the
server machines, which implies that we might want to add zones at a later
time, along with binding those zones to different zonelists.
This is also an issue for different architectures: some of these zones do
not make any _sense_ on other architectures. For example, what's the
difference between ZONE_HIGHMEM and ZONE_NORMAL on a sane 64-bit
architecture (right now I _think_ the 64-bit architectures actually make
ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they already need
to be able to distinguish between memory that can be PCI-DMA'd to, and
memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they use for
the DMA32 stuff?).
Anyway, what I'm saying is that "GFP_HIGHMEM" already traverses three
zones, and with ZONE_1M and ZONE_DMA32, you'd have a list of five of them.
Of which only _two_ would actually be meaningful on some architectures.
So should we not try to have some nicer interface like
	create_zone(&zone, offset, end);
	add_zone(&zone, zonelist);
and then we could on x86 have
	create_zone(zone+0, 0, 1M);
	create_zone(zone+1, 1M, 16M);
	create_zone(zone+2, 16M, 896M);
	create_zone(zone+3, 896M, 4G);
	create_zone(zone+4, 4G, 64G);
	.. populate the zones ..
	add_zone(zone+4, GFP_HIGHMEM);
	add_zone(zone+3, GFP_HIGHMEM);
	add_zone(zone+3, GFP_DMA32);
	add_zone(zone+2, GFP_HIGHMEM);
	add_zone(zone+2, GFP_DMA32);
	add_zone(zone+2, GFP_NORMAL);
	/* the 1M-16M zone is usable for just about everything */
	add_zone(zone+1, GFP_HIGHMEM);
	add_zone(zone+1, GFP_DMA32);
	add_zone(zone+1, GFP_NORMAL);
	add_zone(zone+1, GFP_DMA);
	/* The low 1M can be used for everything */
	add_zone(zone+0, GFP_HIGHMEM);
	add_zone(zone+0, GFP_DMA32);
	add_zone(zone+0, GFP_NORMAL);
	add_zone(zone+0, GFP_DMA);
	add_zone(zone+0, GFP_LOWMEM);
and eventually, when we get hot-plug memory, the hotplug event would be
just something like
	zone = kmalloc(sizeof(struct zone), GFP_KERNEL);
	create_zone(zone, start, end);
	.. populate it with the newly added memory ..
	/*
	 * Add it to all the appropriate zones (I suspect hotplug will
	 * only occur in high memory, but who knows? 
	 */
	add_zone(zone, GFP_HIGHMEM);
	...
(Note how this might also be part of the equation of how you add nodes
dynamically in a NuMA environment).
And see how the above would mean that something like sparc64 wouldn't need
to see five zones when it reall yonly needs two of them.
		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/