Re: linux select() bug hit

Andrew Morton (akpm@zip.com.au)
Fri, 25 Jan 2002 23:17:24 -0800


"A. Castro" wrote:
>
> Please CC'ed any answers/questions. I'm not on the mailing list.
>
> Greetings,
>
> Reason for posting/sending this email.
>
> 1. the actual message:
> pppoe[1857]: Linux select bug hit! This message is harmless, but please
> ask the Linux kernel developers to fix it.
>

hmm. Source is at http://www.roaringpenguin.com/pppoe/rp-pppoe-3.3.tar.gz

They have this:

/* There is a bug in Linux's select which returns a descriptor
* as readable if N_HDLC line discipline is on, even if
* it isn't really readable. This return happens only when
* select() times out. To avoid blocking forever in read(),
* make descriptor 0 non-blocking */
flags = fcntl(0, F_GETFL);
if (flags < 0) fatalSys("fcntl(F_GETFL)");
if (fcntl(0, F_SETFL, (long) flags | O_NONBLOCK) < 0) {
fatalSys("fcntl(F_SETFL)");
}

and later this:

syncReadFromPPP(PPPoEConnection *conn, PPPoEPacket *packet)
{
int r;
#ifndef HAVE_N_HDLC
struct iovec vec[2];
unsigned char dummy[2];
vec[0].iov_base = (void *) dummy;
vec[0].iov_len = 2;
vec[1].iov_base = (void *) packet->payload;
vec[1].iov_len = ETH_DATA_LEN - PPPOE_OVERHEAD;

/* Use scatter-read to throw away the PPP frame address bytes */
r = readv(0, vec, 2);
#else
/* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */
unsigned char buf[ETH_DATA_LEN - PPPOE_OVERHEAD + 2];
r = read(0, buf, ETH_DATA_LEN - PPPOE_OVERHEAD + 2);
if (r >= 2) {
memcpy(packet->payload, buf+2, r-2);
}
#endif
if (r < 0) {
/* Catch the Linux "select" bug */
if (errno == EAGAIN) {
rp_fatal("Linux select bug hit! This message is harmless, but please ask the Linux kernel developers to fix it.");
}
fatalSys("read (syncReadFromPPP)");
}

and

struct timeval *tvp = NULL;
...
for (;;) {
if (optInactivityTimeout > 0) {
tv.tv_sec = optInactivityTimeout;
tv.tv_usec = 0;
tvp = &tv;
}
FD_ZERO(&readable);
FD_SET(0, &readable); /* ppp packets come from stdin */
if (conn->discoverySocket >= 0) {
FD_SET(conn->discoverySocket, &readable);
}
FD_SET(conn->sessionSocket, &readable);
while(1) {
r = select(maxFD, &readable, NULL, NULL, tvp);
if (r >= 0 || errno != EINTR) break;
}
...
/* Handle ready sockets */
if (FD_ISSET(0, &readable)) {
if (conn->synchronous) {
syncReadFromPPP(conn, &packet);
} else {
asyncReadFromPPP(conn, &packet);
}
}

So as the comment says, they are claiming that select() is returning
"yes" for an O_NONBLOCK descriptor which has N_HDLC line disc pushed
onto it, if the select times out. So a subsequent read() on that
descriptor returns -1 (EAGAIN).

And from a quick read, the code looks OK. select() says there's
activity on fd 0, but there isn't.

Can any ABI gurus confirm that this is actually a kernel bug?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/