Normally yes, but not always. e.g. for squid you don't really want to
cache user space.
But I guess it would be a reasonable heuristic. Or at least worth a try :-)
> > But it'll need hints from the higher level code. e.g. read and write
> > could turn on write combining for bigger writes (let's say >8K)
> > I discovered that just unconditionally turning it on for all copies
> > is not good because it forces data out of cache. But I still have hope
> > that it helps for selected copies.
> Well if it's a really big read then bypassing the CPU cache on
> the userspace-side buffer would make sense.
> Can you control the cachability of the memory reads as well?
SSE2 has hints for that (prefetchnti and even prefetcht0,1 etc. for different
cache hierarchies), but it's not completely clear on how much
the CPUs follow these.
For writing it's much more obvious and usually documented even.
> What restrictions are there on these instructions? Would
> they force us to bear the cost of the aligment problem?
They should be aligned, otherwise it makes no sense. When you assume it's
more likely that one target or destination are unaligned then you can easily
align either target or destination. Trick is to chose the right one,
it varies on the call site.
(these are for big copies so a small alignment function is lost in the noise)
x86-64 copy_*_user currently aligns the destination, but hardcoding that
is a bit dumb and I'm not completely happy with it.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/