Re: What to expect with the 2.6 VM

Andrea Arcangeli (andrea@suse.de)
Wed, 2 Jul 2003 19:11:59 +0200


On Wed, Jul 02, 2003 at 04:57:27PM +0100, Mel Gorman wrote:
> The second reason is avoiding the poor linear search algorithm used by
> get_unmapped_area() when looking for a large free virtual area. With a
> large number of mappings, this search is very expensive. It has been

this is true too. however get_unmapped_area never need to find a new
place for the granular usages since mmap is always called with MAP_FIXED for
those.

> proposed to alter the function to perform a tree based search. This could
> be a tree of free areas ordered by size for example but none has yet been

it can't be trivially a tree of free areas or (if naturally indexed with
the size of the hole) it would return the smallest-fitting hole, not the
leftmost-smallest-fitting-hole ;). A better solution is possible. Then
everybody will benefit w/o need of userspace changes. It's still pretty
orthogonal with remap_file_pages though.

> implemented. In the meantime, non-linear mappings are being used to bypass
> the VM.
>
> The third reason is related to frequent page faults associated with linear
> mappings. A non-linear mapping is able to prefault in all pages that are
> required by the mapping as it is presumed they will be needed very soon.
> To some extent, this can be addressed by specifying the MAP_POPULATE when
> calling mmap() for a normal mapping.

mlock already does it too.

> This feature has a very serious drawback. The system calls truncate() and
> mincore() are broken with respect to non-linear mappings. Both calls
> depend on vm_area_struct>vm_pgoff, which is the offset within
> the mapped file, but the field is meaningless within a non-linear mapping.
> This means that truncated files will still have mapped pages that no
> longer have a physical backing. A number of possible solutions, such as
> allowing the pages to exist but be anonymous and private to the process,
> have been suggested but none implemented.

the major reason you didn't mention for remap_file_pages is the rmap
avoidance. There's no rmap backing the remap_file_pages regions, so the
overhead per task is reduced greatly and the box stops running oom
(actually deadlocking for mainline thanks to the oom killer and NOFAIL
default behaviour). since there's no rmap, this in turn means either
this nonlinear vma will swap badly, or it means rmap is totally useless
to swap well. Which in short means either rmap has to go in its current
form (and the usefulness of remap_file_pages would be greatly reduced),
or nonlinear mappings would better stay pinned in ram since they'd
better not be used for the emaulator with 63G of highmem into swap on a
1G host anyways (the sysctl would fix the security detail in pinning
into ram like we're doing today with the largepages in 2.4).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/