Re: objrmap and vmtruncate

Martin J. Bligh (mbligh@aracnet.com)
Fri, 04 Apr 2003 19:45:14 -0800


>> I'm not convinced that we can't do something with nonlinear mappings for
>> this ... we just need to keep a list of linear areas within the nonlinear
>> vmas, and use that to do the objrmap stuff with. Dave and I talked about
>> this yesterday ... we both had different terminology, but I think the
>> same underlying fundamental concept ... I was calling them "sub-vmas"
>> for each linear region within the nonlinear space.
>
> that's wasted memory IMHO, if you need nonlinear, you don't want to
> waste further metadata, you only want to pin pages in the pagetables,
> the 'window' over the pagecache (incidentally shm)
>
> the vm shouldn't know about it.

OK, but this is only for the case when the things aren't memlocked anyway,
which in Oracle's case is never. Seems like we're thrashing a lot of time
and effort over sys_remap_file_pages considering it's never actually
desirable to scan the chains for pageout anyway.

And does anyone *really* start off with a linear vma and them convert
it to a linear one after using it? Can't we just fail that call? Would
result in orders of magnitude of reduction in complexity if we could
just narrow the scope of this beastie a bit.

If you have a *non-memlocked* VMA that you've *previously used* as linear,
then the sys_remap_file_pages stuff would fail with an error code. Is
that too painful? Maybe you can't "un-memlock" a non-linear VMA once
it's memlocked either. I'm quite possibly missing something but if someone
could point out what that is ... ?

>> The fundamental problem I came to (and I think Dave had the same problem)
>> is that I couldn't see what problem remap_file_pages was trying to solve,
>
> Oh that's clear, it's only the avoidance of the mmap calls that walks
> the rbtree with many vmas allocated. Which is another reason for not
> having any kind of metadata associated with the pages attached to the
> nonlinear vma. Taking a linearity inside the non-linearity sounds
> not worthwhile.

Well, it's an order of magnitude less expensive than mem_map still.
No, not perfect, but the world's a compromise ;-) And we don't need
this at all for memlocked ones.

>> eh? Not sure what you mean by that. It helped massively ...
>> diffprofile from kernbench showed:
>
> Indeed. objrmap is the only way to avoid the big rmap waste. Infact I'm
> not even convinced about the hybrid approch, rmap should be avoided even
> for the anon pages.

Right, but they seem to be mostly singletons (non-shared), so the
pte_direct stuff takes care of those mostly anyway. Would be nice eventually,
but I thinking taking one step at a time is maybe helpful ;-)

M
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/