Neil Brown answered a query from Martin Pool about this on the NFS list
back in July. You should probably contact Martin.
> From: Neil Brown <firstname.lastname@example.org>
> To: Martin Pool <email@example.com>
> Message-ID: <firstname.lastname@example.org>
> Cc: email@example.com, firstname.lastname@example.org
> Subject: Re: [NFS] NFSv3/tcp -- where to begin?
> In-Reply-To: message from Martin Pool on Wednesday July 25
> References: <20010725205307.B1435@wistful.humbug.org.au>
> List-Archive: <http://lists.sourceforge.net/archives//nfs/>
> Date: Wed, 25 Jul 2001 22:21:37 +1000 (EST)
On Wednesday July 25, email@example.com wrote:
> Can anybody give me some idea of what in particular is broken with NFS
> over TCP in knfsd? I'd like to try and fix it.
> I can start by just removing the #if 0 and seeing what breaks, but if
> some kind soul would point me in the right direction that would be
I think that is it all corner cases now. I have run the SPEC SFS
benchmark against knfsd using tcp and got it to complete, so it sort
Issues that I can think of include:
- Guard against denial of service - impose some limit on the number of
incoming connections, and start randomly dropping connections when
this limit is exceeded.
- cope with fragmented rpc packets - or prove that they don't exist.
RPC over TCP consists of a number of frames, each with a 4 byte
header. The bottom 31 bits are the frame size. The top bit
indicates whether this is a terminal fragment.
A sequence of non-terminal fragements followed by a terminal
fragment make one RPC packet. The code current rejects any
non-terminal fragment and (I think) closes the connections.
See comment in net/sunrpc/svcsock.c:svc_tcp_recvfrom
Many clients never send non-terminal fragments, but the spec says
they are allowed so....
- Fix svc_tcp_sendto.
If there is insufficient room in the socket buffers, the write will
block (I think) and a dead client could tie up a tcp thread for a
long time. Alternately, the write might not block (I cannot
remember) and some data will simple never be sent, which will
confuse the client.
There have been various suggestions for fixing this, like having a
single thread given the responsibility of blocking, and
disassociating the svc_rqst structure from the threads (currently
there is one request structure per thread).
Ultimately, you need to decide when you are going to say "I cannot
deliver this reply", and then whether you will just drop the packet,
or close the connection.
You need to decide the maximum amount of buffers that you will
allocate, and under what circumstances you will wait for space to be
available in the queue.
Maybe if there is insufficient spare to write the whole replay then:
if there a 10% idle threads, block,
else close the connection.
Also, you might want to throttle incoming requests when memory gets
tight. E.g. if any thread is blocking on writing to a tcp
connection, don't accept any more requests on that connection.
- guard against ridiculously large incoming packets. If a header
arrives saying there are 10 million bytes to come, the code will
currently wait for them. If should reject any packets which claims
to be larged than RPCSVC_MAXPAYLOAD.
There is also a "FIXME" that points out that data is left on the
incoming queue until a full frame has arrived. If this is bigger
than the TCP window size, it will never arrive.
Now I think that RPCSVC_MAXPAYLOAD is smaller than the default
window size, so the above fix should resolve this, but it should be checked.
- address every "FIXME" in net/sunrpc/svcsock.c
That should be enough to get you started :-)
It pretty much all fits the category of avoiding denial of service,
either deliberate or accidental. Ask yourself "How can an obnoxious client
behave in a way that we don't expect and hence confuse or disable the
NFS maillist - NFS@lists.sourceforge.net
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/