Re: synchronous mounts

Stephen C. Tweedie (sct@redhat.com)
Fri, 16 Nov 2001 12:28:55 +0000


Hi,

On Thu, Nov 15, 2001 at 07:19:43PM -0500, Jeff Garzik wrote:

> When working on something likely to crash, I always remount my
> filesystems 'sync' with the intention to have the kernel immediately
> sync to disk anything and everything it is coded to do.

The kernel has, in my memory, never behaved like that on sync mounts.
mount -o sync was always intended just to give people the BSD-style
sync metadata updates that some users expected.

The "mount" man page is wrong on this one.

> Since the
> kernel is responsible to flushing data to disk, it makes perfect sense
> to have an option to sync not only metadata but data to disk
> immediately, if the user desires such.

If you want to sync _everything_, it's at least 5 seeks per write
syscall when you're writing a new file: superblock, group descriptor,
block bitmap, inode, data, and potentially inode indirect.

There's no point doing all that, especially since some of that data is
redundant and will be rebuilt by e2fsck anyway after a crash.

Is it really such an important feature that we're willing to suffer a
factor-of-100 or more slowdown for it?

> Further, expecting all apps to fsync(2) files under the right
> circumstances is not reasonable. There are "normal" circumstances where
> someone expects non-syncing behavior of "cat foo bar > foobar", and then
> there are extentuating circumstances where another expects the shell to
> sync that command immediately. Should we rewrite cat/bash/apps to all
> fsync, depending on an option? Should we expect people to modify all
> their shell scripts to include "/bin/sync" for those times when they
> want data-sync? Such is not scalable at all.

Not-scalable is doing 5000 seeks to write a 4MB file.

The behaviour you are talking about now, "cat foo bar > foobar" and
expecting it to be intact on return, is *not the same thing*. The
sync mount option is there to order metadata writes for predictable
recovery of the directory structure. In the "cat" case, nobody cares
what the inode is like during the write. All that is desired in that
example is fsync-on-close, and it is insane to implement
fsync-on-close by writing every single block of the file
synchronously.

At ALS, an ext3 user asked why ext3 performance was entirely unusable
under mount -o sync (he had a broken config which accidentally set an
ext3 mount synchronous), whereas ext2 was OK. I only realised
afterwards that this was because of ext3's ordered data writes:
whereas ext2 was just syncing the inodes and indirect blocks on write,
ext3 was syncing the data too as part of the ordered data guarantees,
and performance was totally destroyed by the extra seeks.

"sync to keep the fs structures intact" and "sync to keep this file
intact" are two totally different things. In the latter case, we only
care about the file contents as a whole, so fsync-on-close is far more
appropriate. If we want that, lets add it as a new option, but I
don't see the benefit in making o- sync do all file data writes 100%
synchronously.

Cheers,
Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/