Re: [PATCH] 2.5.14 IDE 56

Andrew Morton (akpm@zip.com.au)
Thu, 09 May 2002 17:48:15 -0700


Andi Kleen wrote:
>
> Andrew Morton <akpm@zip.com.au> writes:
>
> > For bulk read() and write() I/O the best sized buffer is 8 kbytes. 4k is
> > pretty good, too. Anything larger blows the user-side buffer out of L1.
> > This is for x86.
>
> Modern x86 support prefetch hints for the CPU to tell it to not
> pollute the caches with "streaming data". I bet using them would
> be a big win.

Maybe. For your basic:

for (many) {
read(fd1, buf, 8192);
write(fd2, buf, 8192);
}

you want `buf' cached, but not the pagecache for fd1 and fd2.
If the prefetch hints can express that then yes, nice.

> The rep ; movsl loop used in copy*user isn't
> very good on modern x86 anyways (it is ok on PPro, but loses on Athlon
> and P4)

On PII and PIII, rep;movsl is slower than an open-coded
duff-device copy for all src/dest alignments except for
the case where both are eight-byte-aligned. By up to
20%, iirc. four-byte-aligned to four-byte-aligned isn't
too bad.

Of course, a lot of copy_*_users are well-aligned. But
a lot are not. I ended up deciding that switching to
the duff-device copy would be a very small overall win, when
you weight it by the alignment patterns of normal kernel
usage.

But making a runtime slection of which copy function to
use (based on src/dest alignment) could speed up the
kernel's most expensive function by maybe 10-15% overall.

The test proggy is in http://www.zip.com.au/~akpm/linux/cptimer.tar.gz

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/