[RFC] NFS client changes - page kmap() in the RPC layer.

Trond Myklebust (trond.myklebust@fys.uio.no)
Sun, 5 May 2002 15:29:13 +0200


Hi,

One of the reported problems that remain to be fixed in the client
RPC layer when using a HIGHMEM system, is that we can deadlock on
'kmap()'.

The problem is that there is a limited pool of addresses available to
the kmap() operation. The current design is based on the premise that
no single process will 'hold' more than one kmapped page for any
extended period of time.
The NFS layer breaks this premise, since each read/write kmap()s the
pages at the beginning of the RPC call, and only kunmap()s at the
end. If, say, a server is down for an extended period, so that all
RPC calls loop and retry, then we might end up exhausting the entire
resource pool.

The solution is to move the kmapping operation into the RPC
socket layer. There are only 2 places where we need kmap:

- When calling sock_sendmsg()
- When copying the reply from the server into pages.

Of course, in order to do so, we need to teach the RPC layer to deal
with pages, rather than page kmap() addresses.

The 2 patches in

http://www.fys.uio.no/~trondmy/src/2.4.19-pre8/alpha

do precisely that. The patch linux-2.4.19-fix_kmap.dif introduces a
new 'RPC buffer' type defined as follows:

struct xdr_buf {
struct iovec head[1], /* RPC header + non-page data */
tail[1]; /* Appended after page data */

struct page ** pages; /* Array of contiguous pages */
unsigned int page_base, /* Start of page data */
page_len; /* Length of page data */

unsigned int len; /* Total length of data */
};

and applies it to the NFS write code.
The patch linux-2.4.19-fix_kmap2.dif changes the various NFS 'read'
operations (read, readdir, readlink) to use this structure.

The patches will apply on top of linux-2.4.19-NFS_ALL.dif, although
you will have to disable the NFS direct I/O code, as I haven't
yet gotten around to fixing that (Chuck?).

Comments please?

Please note: The two patches do not include code to make use of the
TCP_CORK + sendpage() zero copy socket interface, but doing so is
trivial if RPC were to use the above structure.

Cheers,
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/