RE: [Patch][RFC] epoll and half closed TCP connections

David Schwartz (davids@webmaster.com)
Sun, 13 Jul 2003 23:14:28 -0700


> David Schwartz wrote:

> > This is really just due to bad coding in 'poll', or more
> > precisely very bad
> > for this case. For example, why is it allocating a wait queue
> > buffer if the
> > odds that it will need to wait are basically zero? Why is it adding file
> > descriptors to the wait queue before it has determined that it needs to
> > wait?

> Pfeh! That's just tweaking.

Nonetheless, it's embarassing for Linux that performance shoots up if you
replace a normal call to 'poll' with a sleep followed by a call to 'poll'
with a zero wait. But that's peripheral to my point.

> If you really want to optimise 'poll', maintain a per-task event
> interest set like epoll does (you could use the epoll infrastructure),
> and on each call to 'poll' just apply the differences between the
> interest set and whatever is passed to poll.
>
> That would actually reduce the number of calls to per-fd f_op->poll()
> methods to O(active), making the internal overhead of 'poll' and
> 'select' comparable with epoll.
>
> I'd not be surprised if someone has done it already. I heard of a
> "scalable poll" patch some years ago.
>
> That leaves the overhead of the API, which is O(interested) but it is
> a much lower overhead factor than the calls to f_op->poll().

Definitely, the API will always guarantee performance is O(n). If you're
interested in ultimate scalability, you can never exceed O(n) with 'poll'.
But my point is that you can never exceed O(n) with any discovery
implementation because the number of sockets to be discovered scales at
O(n), and you have to do something per socket.

This doesn't change the fact that 'epoll' outperforms 'poll' at every
conceivable situation. It also doesn't change the fact that edge-based
triggering allows some phenomenal optimizations in multi-threaded code. It
also doesn't change the fact that Linux's 'poll' implementation is not so
good for the high-load, busy server case.

All I'm trying to say is that the argument that 'epoll' scales at a better
order than 'poll' is bogus. They both scale at O(n) where 'n' is the number
of connections you have. No conceivable implementation could scale any
better.

DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/