Re: ext3-2.4-0.9.4

Patrick J. LoPresti (patl@curl.com)
30 Jul 2001 10:38:52 -0400


Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> > Chris Mason <mason@suse.com> writes:
> >
> > > Correct, in the current 2.4.x code, its a quirk. fsync(any object) ==
> > > fsync(all pending metadata, including renames).
> >
> > This does not help. The MTAs are doing fsync() on the temporary file
> > and then using the *subsequent* rename() as the committing operation.
>
> Which is quaint, because as we've pointed out repeatedly to you rename
> is not an atomic operation. Even on a simple BSD or ext2 style fs it can
> be two directory block writes, metadata block writes, a bitmap write
> and a cylinder group write.

But not on a journalling filesystem. I assume that a journal "commit"
is atomic. If it is not, then fsync() on the directory does not solve
the problem either.

Put another way, I am suggesting a mount-time or directory option to
effectively cause rename() and link() to automatically be followed by
an fsync() of the containing directory. (Actually, from this
perspective, maybe you could fix the MTA in user space with LD_PRELOAD
hackery or somesuch. Hm...)

> > It would be nice to have an option (on either the directory or the
> > mountpoint) to cause all metadata updates to commit to the journal
> > without causing all operations to be fully synchronous. This would
>
> You mean fsync() on the directory.

In other words, "Get the MTA authors to change their code." That is a
nice little war, but it is fought at the expense of users who just
want to use the code provided by their vendor and have it work.

The situation is this:

The relevant standards (POSIX, SuS, etc.) provide no way to perform
reliable transactions on a file system.

BSD provides one solution, which is synchronous metatdata. (I am
assuming modern BSDs already deal with the multiple-disk-block
problem to make these transactions properly atomic. Is this
assumption false?)

Linux provides a different solution, which is fsync() on the
directory.

All MTAs, and other apps besides, currently use the BSD solution for
reliable transactions.

Is it really so absurd to ask Linux to provide efficient support of
the BSD semantics as an option?

- Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/