Re: [PATCH] TCP Zero Copy for mmapped files

Larry McVoy (
Thu, 2 Jan 2003 17:27:26 -0800

On Fri, Jan 03, 2003 at 01:56:27AM +0000, Alan Cox wrote:
> On Fri, 2003-01-03 at 00:45, Thomas Ogrisegg wrote:
> > Unfortunately the linux-sendfile is not as good as the HP-UX
> > one. Under HP-UX you can define a "struct iovec" header to
> > be sent before the file is sent.
> Thats a design decision. With TCP_CORK and sensible syscall performance
> those kind of web specific hacks are not appropriate

Indeed. In case Alan's message wasn't clear: if your syscall overhead
is zero then many "optimizations" become superfluous. In fact, those
optimizations, one cache miss at a time, tend to be a big part of what
makes the syscall layer so heavyweight.

Linux is amazing in that it is basically the only real operating system
I know of that has stayed so focussed on making the syscall layer be
almost invisible. it's worth a "rah rah" because you can use the
operating system like it was libc, there is basically very little
cost in crossing in/out.

Here's the LMbench context switch benchmark running on a 1.6Ghz Athlon:

load free cach swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt int ctx usr sys idl
0.67 73M 577M 25M 0 0 0 0 0 0 4.0 2.0 107 548K 23 77 0
0.67 73M 577M 25M 0 0 0 0 0 0 2.0 2.0 105 549K 19 81 0
0.67 73M 577M 25M 0 0 0 0 0 0 4.0 2.0 107 549K 27 73 0
0.70 73M 577M 25M 0 0 0 0 0 0 2.0 2.0 105 548K 23 77 0

Yeah, that's more than a half a million context switchs/second and each
of those include 2 system calls. So Linux is doing 2 system calls and
a context switch in 1.8 microseconds.

When you can get in and out of the kernel that fast, your thinking should
change. You get to use the kernel more freely. And you certainly don't
want to do anything to screw that up. My hat is off to Linus and team
for working so hard to make these numbers be so good (and keep on working,
see the recent syscall discussion).

Larry McVoy            	 lm at  
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at