Re: status of nfs and tcp with 2.4

Bill Rugolsky Jr. (rugolsky@ead.dsa.com)
Thu, 27 Sep 2001 13:27:21 -0400


On Thu, Sep 27, 2001 at 01:10:30PM -0400, James D Strandboge wrote:
> On Thu, Sep 27, 2001 at 05:32:09PM +0200 or thereabouts, Trond Myklebust wrote:
> > None: AFAIK nobody has yet written any code that works for the server.
>
> In your opinion, how involved would it be to write the tcp code since
> the udp is already written? I haven't actually looked into it much,
> and thought you might have some ideas, or perhaps pointers.

Neil Brown answered a query from Martin Pool about this on the NFS list
back in July. You should probably contact Martin.

Regards,

Bill Rugolsky

> From: Neil Brown <neilb@cse.unsw.edu.au>
> To: Martin Pool <mbp@valinux.com>
> Message-ID: <15198.47569.868029.592501@notabene.cse.unsw.edu.au>
> Cc: nfs@lists.sourceforge.net, tpot@valinux.com
> Subject: Re: [NFS] NFSv3/tcp -- where to begin?
> In-Reply-To: message from Martin Pool on Wednesday July 25
> References: <20010725205307.B1435@wistful.humbug.org.au>
> List-Archive: <http://lists.sourceforge.net/archives//nfs/>
> Date: Wed, 25 Jul 2001 22:21:37 +1000 (EST)

On Wednesday July 25, mbp@valinux.com wrote:
> Can anybody give me some idea of what in particular is broken with NFS
> over TCP in knfsd? I'd like to try and fix it.
>
> I can start by just removing the #if 0 and seeing what breaks, but if
> some kind soul would point me in the right direction that would be
> great...

I think that is it all corner cases now. I have run the SPEC SFS
benchmark against knfsd using tcp and got it to complete, so it sort
of works.

Issues that I can think of include:

- Guard against denial of service - impose some limit on the number of
incoming connections, and start randomly dropping connections when
this limit is exceeded.
- cope with fragmented rpc packets - or prove that they don't exist.
RPC over TCP consists of a number of frames, each with a 4 byte
header. The bottom 31 bits are the frame size. The top bit
indicates whether this is a terminal fragment.
A sequence of non-terminal fragements followed by a terminal
fragment make one RPC packet. The code current rejects any
non-terminal fragment and (I think) closes the connections.
See comment in net/sunrpc/svcsock.c:svc_tcp_recvfrom
Many clients never send non-terminal fragments, but the spec says
they are allowed so....
- Fix svc_tcp_sendto.
If there is insufficient room in the socket buffers, the write will
block (I think) and a dead client could tie up a tcp thread for a
long time. Alternately, the write might not block (I cannot
remember) and some data will simple never be sent, which will
confuse the client.
There have been various suggestions for fixing this, like having a
single thread given the responsibility of blocking, and
disassociating the svc_rqst structure from the threads (currently
there is one request structure per thread).
Ultimately, you need to decide when you are going to say "I cannot
deliver this reply", and then whether you will just drop the packet,
or close the connection.
You need to decide the maximum amount of buffers that you will
allocate, and under what circumstances you will wait for space to be
available in the queue.
Maybe if there is insufficient spare to write the whole replay then:
if there a 10% idle threads, block,
else close the connection.

Also, you might want to throttle incoming requests when memory gets
tight. E.g. if any thread is blocking on writing to a tcp
connection, don't accept any more requests on that connection.

- guard against ridiculously large incoming packets. If a header
arrives saying there are 10 million bytes to come, the code will
currently wait for them. If should reject any packets which claims
to be larged than RPCSVC_MAXPAYLOAD.
There is also a "FIXME" that points out that data is left on the
incoming queue until a full frame has arrived. If this is bigger
than the TCP window size, it will never arrive.
Now I think that RPCSVC_MAXPAYLOAD is smaller than the default
window size, so the above fix should resolve this, but it should be checked.

- address every "FIXME" in net/sunrpc/svcsock.c

That should be enough to get you started :-)
It pretty much all fits the category of avoiding denial of service,
either deliberate or accidental. Ask yourself "How can an obnoxious client
behave in a way that we don't expect and hence confuse or disable the
server."

NeilBrown

_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/nfs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/