2.4: NFS client kmapping across I/O

Xeno (xeno@overture.com)
Mon, 28 Jan 2002 18:25:08 -0800


Trond, thanks for the excellent fattr race fix. I'm sorry I haven't
been able to give feedback until now, things got busy for a while. I
have not yet had a chance to run your fixes, but after studying them I
believe that they will resolve the race nicely, especially with the use
of nfs_inode_lock in the recent NFS_ALL experimental patches. FWIW.

Now I also have time to mention the other NFS client issue we ran into
recently, I have not found mention of it on the mailing lists. The NFS
client is kmapping pages for the duration of reads from and writes to
the server. This creates a scaling limitation, especially under
CONFIG_HIGHMEM64G and i386 where there are only 512 entries in the
highmem kmap table. Under I/O load, it is easy to fill up the table,
hanging all processes that need to map highmem pages a substantial
fraction of the time.

Before 2.4.15, it is particularly bad, nfs_flushd locks up the kernel
under I/O load. My testcase was to copy 12 100M files from one NFS
server to another, it was very reliable at producing the lockup right
away. nfs_flushd fills up the kmap table as it sends out requests, then
blocks waiting for another kmap entry to free up. But it is running in
rpciod, so rpciod is blocked and cannot process any responses to the
requests. No kmap entries are ever freed once the table fills up. In
this state, the machine pings and responds to SysRq on the serial port,
but just about everything else hangs.

It looks like nfs_flushd was turned off in 2.4.15, that is the
workaround I have applied to our machines. I have also limited the
number of requests across all NFS servers to LAST_PKMAP-64, to leave
some kmap entries available for non-NFS use. It is not an ideal
workaround, though, it artificially limits I/O to multiple servers.
I've thought about bumping up LAST_PKMAP to increase the size of the
highmem kmap table, but the table looks like it was designed to be
small.

I've also thought about pushing the kmaps and kunmaps down into the RPC
layer, so the pages are only mapped while data is copied to or from
them, not while waiting for the network. That would be more work, but
it looks doable, so I wanted to run the problem and the approach by you
knowledgeable folks while I'm waiting for hardware to free up for kernel
hacking.

Thanks,
Xeno
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/