nfs client oops, all 2.4 kernels

Trond Myklebust (trond.myklebust@fys.uio.no)
Tue, 11 Sep 2001 10:45:27 +0200


>>>>> " " == Erik DeBill <erik@www.creditminders.com> writes:

> I've been running into a repeatable oops in the NFS client
> code, apparently related to file locking.

This is the same Oops as was sent in by Olivier Molteni. It's due to
an incorrect assumption in the locking code. To repeat:

2 processes are calling close(), which under POSIX, tries to free all
locks held by a program. The problem occurs when you `dup()' a file.

In that case, both processes can call close() at the same time, thus
triggering a call to filp_close().

The race boils down to:

- locks_unlock_delete() assumes that the BKL (kernel_lock()) is
sufficient to protect against *thisfl_p from disappearing
beneath it due to some second process.
BUT
- The call to lock() in locks_unlock_delete() sleeps when the
underlying filesystem is NFS, hence 2 processes can race despite
the BKL assumption.

IMHO the correct thing to do is to detach the lock from the global
linked list *before* one makes the call to f_op->lock() down to the
NFS code.

This would probably entail splitting locks_delete_lock() into 2
procedures: one that does the unlinking from the global list:

*thisfl_p = fl->fl_next;
fl->fl_next = NULL;

list_del(&fl->fl_link);
INIT_LIST_HEAD(&fl->fl_link);

the other that does the rest.

Could you check if the appended patch works?

Cheers,
Trond

--- linux-2.4.9/fs/locks.c.orig Thu Jul 5 00:39:28 2001
+++ linux-2.4.9/fs/locks.c Tue Sep 11 10:41:53 2001
@@ -473,21 +473,26 @@
fl->fl_insert(fl);
}

-/* Delete a lock and then free it.
- * Remove our lock from the lock lists, wake up processes that are blocked
- * waiting for this lock, notify the FS that the lock has been cleared and
- * finally free the lock.
+/*
+ * Remove lock from the lock lists
*/
-static void locks_delete_lock(struct file_lock **thisfl_p, unsigned int wait)
+static inline void _unhash_lock(struct file_lock **thisfl_p)
{
struct file_lock *fl = *thisfl_p;

*thisfl_p = fl->fl_next;
fl->fl_next = NULL;

- list_del(&fl->fl_link);
- INIT_LIST_HEAD(&fl->fl_link);
+ list_del_init(&fl->fl_link);
+}

+/*
+ * Wake up processes that are blocked waiting for this lock,
+ * notify the FS that the lock has been cleared and
+ * finally free the lock.
+ */
+static inline void _delete_lock(struct file_lock *fl, unsigned int wait)
+{
fasync_helper(0, fl->fl_file, 0, &fl->fl_fasync);
if (fl->fl_fasync != NULL){
printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync);
@@ -502,6 +507,15 @@
}

/*
+ * Delete a lock and then free it.
+ */
+static void locks_delete_lock(struct file_lock **thisfl_p, unsigned int wait)
+{
+ _unhash_lock(thisfl_p);
+ _delete_lock(*thisfl_p, wait);
+}
+
+/*
* Call back client filesystem in order to get it to unregister a lock,
* then delete lock. Essentially useful only in locks_remove_*().
* Note: this must be called with the semaphore already held!
@@ -511,12 +525,13 @@
struct file_lock *fl = *thisfl_p;
int (*lock)(struct file *, int, struct file_lock *);

+ _unhash_lock(thisfl_p);
if (fl->fl_file->f_op &&
(lock = fl->fl_file->f_op->lock) != NULL) {
fl->fl_type = F_UNLCK;
lock(fl->fl_file, F_SETLK, fl);
}
- locks_delete_lock(thisfl_p, 0);
+ _delete_lock(fl, 0);
}

/* Determine if lock sys_fl blocks lock caller_fl. Common functionality
@@ -1661,6 +1676,7 @@
while ((fl = *before) != NULL) {
if ((fl->fl_flags & FL_POSIX) && fl->fl_owner == owner) {
locks_unlock_delete(before);
+ before = &inode->i_flock;
continue;
}
before = &fl->fl_next;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/