2.4: NFS client race causes data loss when appending

Xeno (xeno@overture.com)
Wed, 05 Dec 2001 18:00:54 -0800


We've been observing intermittent data loss when appending to files over
NFS due to a race in the NFS client (losing < 4K a couple times a day).
Just one process appending to each file. We're using 2.4.9, but it
looks like the race is present in other 2.4 versions. Happens most
frequently when nfs_file_write calls nfs_revalidate_inode while there
are lots of writebacks pending. Here's the sequence of events we're
seeing.

1. getattr request goes out to get file size. Value will be
stale compared to inode->i_size, since writes are happening.
2. All writebacks for the inode complete.
3. getattr response returns with stale file size value.
4. __nfs_refresh_inode checks writebacks, finds none,
overwrites inode->i_size.
5. generic_file_write resets file position (O_APPEND) with
stale file size, overwriting previously written data.

Race is theoretically present in 2.2 as well, but the delay-based
flushing in 2.2 rarely flushes all writes away, so there are usually
requests on the writeback list. Only became a problem in 2.4 with
more aggressive flushing clearing out all writebacks.

Here's a patch (2.4.16) that works for us, it eliminates the race in the
most common case. BKL and NFS_REVALIDATING prevent new writes from
coming in while the getattr is happening, so we just need to check for
writebacks before starting the getattr. Not a perfect solution, there's
still a potential for races with direct calls to nfs_refresh_inode that
bypass nfs_revalidate_inode. In practice, we're not triggering those
operations with any frequency.

--- linux/fs/nfs/inode.c Fri Nov 9 14:28:15 2001
+++ linux-nfsappendrace/fs/nfs/inode.c Wed Dec 5 17:12:28 2001
@@ -868,8 +868,9 @@
__nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
{
int status = -ESTALE;
struct nfs_fattr fattr;
+ int writebacks;

dfprintk(PAGECACHE, "NFS: revalidating (%x/%Ld)\n",
inode->i_dev, (long long)NFS_FILEID(inode));

@@ -889,8 +890,9 @@
}
}
NFS_FLAGS(inode) |= NFS_INO_REVALIDATING;

+ writebacks = nfs_have_writebacks(inode);
status = NFS_PROTO(inode)->getattr(inode, &fattr);
if (status) {
dfprintk(PAGECACHE, "nfs_revalidate_inode: (%x/%Ld) getattr failed, error=%d\n",
inode->i_dev, (long long)NFS_FILEID(inode), status);
@@ -900,8 +902,11 @@
remove_inode_hash(inode);
}
goto out;
}
+
+ if ( writebacks && nfs_size_to_loff_t(fattr.size) < inode->i_size )
+ fattr.size = (__u64) inode->i_size;

status = nfs_refresh_inode(inode, &fattr);
if (status) {
dfprintk(PAGECACHE, "nfs_revalidate_inode: (%x/%Ld) refresh failed, error=%d\n",

Please cc on responses, thanks!

Xeno
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/