RE: [Patch][RFC] epoll and half closed TCP connections

David Schwartz (
Sun, 13 Jul 2003 16:05:38 -0700

> David Schwartz wrote:

> > For most real-world loads, M is some fraction of N. The fraction
> > asymptotically approaches 1 as load increases because under
> > load it takes
> > you longer to get back to polling, so a higher fraction of the
> > descriptors
> > will be ready when you do.

> Ah, but as the fraction approaches 1, you'll find that you are
> asymptotically approaching the point where you can't handle the load
> _regardless_ of epoll overhead.

This has not been my experience. On pretty much every OS except Linux, my
experience has been that as you are spending more time doing work, each call
to 'poll' discovers more file descriptors ready. Further, the number of
bytes you can send/receive is greater (because it took you longer to get
back to the same connection), so again, the amount of work you do, per call
to 'poll' goes way up. I think most of the problem is just that Linux's
'poll' is extremely expensive and not due to any inherent API benefit of

> > By the way, I'm not arguing against epoll. I believe it
> > will use less
> > resources than poll in pretty much every conceivable situation. I simply
> > take issue with the argument that it has better ultimate scalability or
> > scales at a different order.

> It scales according to the amount of work pending, which means that it
> doesn't take any _more_ time than actually doing the pending work.
> (This assumes you use epoll appropriately; there are many ways to use
> epoll which don't have this property).

But so does 'poll'. If you double the number of active and inactive
connections, 'poll' takes twice as long. But you do twice as much per call
to 'poll'. You will both discover more connections ready to do work on and
move more bytes per connection as the load increases.

> That was always the complaint about select() and poll(): they dominate
> the run time for large numbers of connections. epoll, on the other
> hand, will always be in the noise relative to other work.

I think this is largely true for Linux because of bad implementation of
'poll' and therefore 'select'.

> If you want a formula for slides :), time_polling/time_working is O(1)
> with epoll, but O(N) with poll() & select().

It's not O(N) with 'poll' and 'select'. Twice as many file descriptors
means twice as many active file descriptors which means twice as many
discovered per call to 'poll'. If the calls to 'poll' are further apart
(because of the additional real work done in-between calls) it means more
than twice as many discovered per call to 'poll'. Add to this that you will
find more bytes ready to read or more space in the send queue per call to
'poll' as the load goes up.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at