OK, zones have about PAGES_LOW pages free - these are buddied together as
good as they possibly can. Since direct reclaims put the page on the free
list first before allocating one.
> I now come to open a device which requires 64K in 8K
> pages from the DMA zone. What happens?
First - is it atomic or not? (device sounds like atomic)
If it is atomic:
1) you get one chance, no retries in __alloc_pages. [not changed]
2) you get one from those already free, no reclaims possible [not changed]
3) you are allowed to allocate below PAGES_MIN [not changed]
The result will depend on how many pages were free, if there are enough order
1 buddies.
With my algorithm it the number of free pages of all zones are very likely to
be close to PAGES_LOW since it tries to move towards it.
The original algorithm is harder to analyze, free pages will not grow unless
one hits PAGES_MIN and then kreclaimd gets started.
As a test I hit Magic-SysRq-M (256 MB RAM):
Free pages: 3836kB ( 0kB HighMem) ( Active: 21853, inactive_dirty:
19101, inactive_clean: 392, free: 959 (383 766 1149) )
0*4kB 1*8kB 8*16kB 8*32kB 4*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB
= 1800kB)
1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB
= 2036kB)
= 0kB)
And a while later I hit it again:
( Active: 22300, inactive_dirty: 18742, inactive_clean: 587, free: 947 (383
766 1149) )
3*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB
= 1028kB)
80*4kB 39*8kB 3*16kB 3*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB
= 2760kB)
For non atomic higher order allocations: There are more changes.
1) Tries to free the same number of pages that it want to alloc later.
2) Does not allow allocs when there are less than PAGES_MIN. [BUG in current
code, see earlier patch - higher order non atomic allocs could drain the free
reserve if there are a lot of inactive clean pages...]
3) Retries only while there is free shortage - this could be changed...
until all zones has more than PAGES_HIGH free. Or until there are no
inactive clean pages left. But why favor non atomic over atomic in this
way?
>
> I suspect that the chances of it failing will be significantly higher with
> this algorithm - do you have any thoughts for this?
>
Do you still think the risk is higher?
Stephans problem seems to be that this alloc runs over and over...
> I don't think we should purely select the allocation zone based purely on
> how much free it contains, but also if it's special (like the DMA zone).
>
It does prioritize due to the order the zones are checked in.
> You can't clean in-use slab pages out on demand like you can for fs
> cache/user pages.
/RogerL
-- Roger Larsson Skellefteċ Sweden - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/