Re: [patch 6/12] hold atomic kmaps across generic_file_read

Linus Torvalds (torvalds@transmeta.com)
Sat, 10 Aug 2002 00:25:38 -0700 (PDT)


On Fri, 9 Aug 2002, Andrew Morton wrote:
>
> What would be nice is a way of formalising the prefault, to pin
> the mm's pages across the copy_*_user() in some manner, perhaps?

Too easy to create a DoS-type attack with any trivial implementation.

However, I don't think pinning is worthwhile, since even if the page goes
away, the prefaulting was just a performance optimization. The code should
work fine without it. In fact, it would probably be good to _not_ prefault
for a development kernel, and verify that the code works without it. That
way we can sleep safe in the knowledge that there isn't some race through
code that requires the prefaulting..

I agree that if you could guarantee pinning the out-of-line code would be
a bit simpler, but since we have to handle the EFAULT case anyway, I doubt
that it is _that_ much simpler.

Also, there are actually advantages to doing it the "hard" way. If we ever
want to, we can actually play clever tricks that avoid doing the copy at
all with the slow path.

Example tricks: we can, if we want to, do a read() with no copy for a
common case by adding a COW-bit to the page cache, and if you do aligned
reads into a page that will fault on write, you can just map in the page
cache page directly, mark it COW in the page cache (assuming the page
count tells us we're the only user, of course), and mark it COW in the
mapping.

The nice thing is, this actually works correctly even if the user re-uses
the area for reading multiple times (because the read() will trap not
because the page isn't mapped, but because it is mapped COW on something
that will write to user space). The unmapped case is better, though, since
we don't need to do TLB invalidates for that case (which makes this
potentially worthwhile even on SMP).

I don't know if this is common, but it _would_ make read() have definite
advantages over mmap() on files that are seldom written to or mmap'ed in a
process (which is most of them, gut feel). In particular, once you fault
for _one_ page, you can just map in as many pages as the read() tried to
read in one go - so you can avoid any future work as well.

Imagine doing a

fstat(fd..)
buf = aligned_malloc(st->st_size)
read(fd, buf, st->st_size);

and having it magically populate the VM directly with the whole file
mapping, with _one_ failed page fault. And the above is actually a fairly
common thing. See how many people have tried to optimize using mmap vs
read, and what they _all_ really wanted was this "populate the pages in
one go" thing.

Is it a good idea? I don't know. But it would seem to fall very cleanly
out of the atomic kmap path - without affecting the fast path at _all_. It
would be a very specific and localized optimization, with no impact on the
rest of the system, since it's using the same fixup() logic that we have
to have anyway.

(Yeah, the COW bit on the page cache is special, and it would need page
mapping and obviously file writing to do something like

..
if (page->flags & PAGE_COW) {
page->flags &= ~PAGE_COW;
if (page->count > 1) {
remove-and-reinsert-new-page();
}
}
..

by hand before mapping it writable or writing to it. And the read()
optimization would _only_ work if nobody is using mmap() on the file at
the same time for those pages).

This would definitely be 2.7.x material, I'm just explaining why I like
the flexibility of the approach (as opposed to a very static "memcpy-only-
special-case" thing).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/