Re: and nicer too - Re: [PATCH] epoll more scalable than poll

John Gardiner Myers (jgmyers@netscape.com)
Wed, 30 Oct 2002 18:07:07 -0800


Davide Libenzi wrote:

>John, your first post about epoll was "the interface has a bug, please
>do not merge it".
>
My first post about epoll pointed out how it was designed for single
threaded callers and concluded:

I certainly hope /dev/epoll itself doesn't get accepted into the kernel,
the interface is error prone. Registering interest in a condition when
the condition is already true should immediately generate an event, the
epoll interface did not do that last time I saw it discussed. This
deficiency in the interface requires callers to include more complex
workaround code and is likely to result in subtle, hard to diagnose bugs.

I did not say "the interface has a bug", I said that the interface is
error prone. This is a deficiency that should be fixed before the
interface is added to the kernel.

>Sorry, what prevents you in coding that ? If you, instead of ranting
>because epoll does not fit your personal idea of event notification, took
>a look to the example http server used for the test ( coroutine based )
>you'll see that does exactly that.
>
You posted code which you claimed was "even more cleaner and simmetric"
(sic) because it fell through to the do_use_fd() code instead of putting
the do_use_fd() code in an else clause. A callback scheme is akin to
the if/else structure. To adapt the first code to a callback scheme,
the accept callback has to somehow arrange to call the do_use_fd()
callback before returning to the event loop. This requirement is subtle
and asymmetric.

>I really don't believe this. Are you just trolling or what ? It is clear
>that your acceptor routine has to do a little more work than that in a
>real program.
>
Basically, you spawn off another coroutine. That complicates the "fall
through to do_use_fd()" logic in the first code by requiring an external
facility not required by the second code. The second code could simply
have the accept code loop until EAGAIN.

>Doh ! John, did you actually read the code ?
>
Yes, indeed.

>Could you compare AIO level
>of intrusion inside the kernel code with the epoll one ?
>
Aio poll extends the existing set of poll notification hooks with a
callback mechanism. It then plugs into this callback mechanism in order
to deliver events. The end result is that the same notification hooks
are used for classic poll and aio poll. When aio poll is not being
used, there is no additional performance penalty other than a slightly
larger poll_table_entry and poll_table_page.

Epoll creates a new callback mechanism and plugs into this new callback
mechansim. It adds a new set of notification hooks which feed into this
new callback mechansim. The end result is that there is one set of
notification hooks for classic poll and another set for epoll. When
epoll is not being used, the poll and socket code makes an additional
set of checks to see that nobody has registered interest through the new
callback mechanism.

> It fits _exactly_
>the rt-signal hooks. One of the design goals for me was to add almost
>nothing on the main path. You can lookup here for a quick compare between
>aio poll and epoll for a test where events delivery efficency does matter
>( pipetest ) :
>
This is a comparison of the cost of using epoll to the cost of using aio
in one particular situation. It is irrelevant to the point I was making.

>Now, I don't believe that a real world app will exchange 300000 tokens per
>second through a pipe, but this help you to understand the efficency of
>the epoll event notification subsystem.
>
>
My understanding of the efficiency of the epoll event notification
subsystem is:

1) Unlike the current aio poll, it amortizes the cost of interest
registration/deregistration across multiple events for a given connection.

2) It declares multithreaded use out of scope, making optimizations that
are only appropriate for use by single threaded callers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/