Re: generic_osync_inode/ext2_fsync_inode still not safe

Stephen C. Tweedie (sct@redhat.com)
Fri, 20 Apr 2001 21:27:27 +0100


Hi,

On Wed, Apr 18, 2001 at 06:45:40AM -0300, Marcelo Tosatti wrote:

> As far as I can see, you cannot guarantee that an inode which is unlocked
> _and_ clean (accordingly to the inode->i_state) is safely on disk.
>
> The reason for that are calls to sync_one() which write the inode
> asynchronously:
>
> sync_one(struct inode *inode, int sync) {
> ...
> /* Don't write the inode if only I_DIRTY_PAGES was set */
> if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC))
> write_inode(inode, sync); <-------------
...
> inode->i_state &= ~I_LOCK;

Right --- nasty. But not _too_ nasty, there's a moderately easy way
of dealing with this.

Basically we can't trust the i_state flag for this purpose if we are
allowing async IO to happen on inodes without having proper IO
completion callbacks marking the inodes as unlocked once they are firm
on disk. However, in this case the filesystem itself will know which
underlying buffer_head contains the inode data, and can check to see
if that buffer is locked and perform a wait if necessary.

This is somewhat unpleasant in that it may sometimes cause unnecessary
false sharing, given that we have multiple inodes in an inode block.
However, I can't see any simple way around that.

Linus, do you have any preference for how to deal with this? We can
either do it by adding a s_op->wait_inode() function to complement
write_inode(), and have a wait_inode() implementation in block-device
filesystems which does the buffer lookup and wait; or we can push that
whole logic into the filesystems, so that the I_DIRTY check is removed
from the VFS mid-layer altogether and the filesystem is responsible
for testing both the inode and buffer locked state when we try to wait
for outstanding inode IO to complete.

The second method is a bit cleaner conceptually but it results in more
code duplication.

Cheers,
Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/