Re: ext3-2.4-0.9.4

Patrick J. LoPresti (patl@cag.lcs.mit.edu)
03 Aug 2001 23:50:23 -0400


Matthias Andree <matthias.andree@stud.uni-dortmund.de> writes:

> On Fri, 03 Aug 2001, Patrick J. LoPresti wrote:
>
> > To fill in more of the table, Qmail does:
> >
> > fd = open(tmp)
> > write(fd)
> > fsync(fd)
> > link(tmp,final)
> > close(fd)
>
> http://cr.yp.to/qmail/faq/reliability.html

...which is consistent. Qmail is assuming that the link() is
synchronous, as it was back in the "Good Old Days" of stock FFS.

> > ...and Postfix does:
> >
> > fd = open(final)
> > write(fd)
> > (should be an "fsync(fd)" here, but I cannot find it)
> > fchmod(fd,+execute)
> > fsync(fd)
> > close(fd)
>
> > Postfix apparently uses the execute bit to indicate that delivery is
> > complete. I am probably misreading the source (version 20010228
> > Patchlevel 3), but I do not see any fsync() between the write and the
> > fchmod. Surely it is there or this delivery scheme is not reliable on
> > any system, since without an intervening fsync() the writes to the
> > data and the permissions can happen out of order.
>
> Not really. The error code if fsync() or close failed are propagated
> back to the caller who then decides what to do. smtpd.c nukes the
> file.

That is not the problem. The problem is that the system could start
flushing blocks to disk after the call to fchmod and before the call
to fsync. If so, the system could write the mode bits first and then
crash before writing the data, leaving the execute bit set on the file
but without valid data within. This could result in a corrupted mail
message.

To avoid this, Postfix *must* do fsync() or fdatasync() after the
write() and before the fchmod()+fsync(); that will insure that the
execute bit implies valid ("committed") data in the file. I was
unable to find any such call to fsync() or fdatasync(), but as I
mentioned, I am probably simply misreading the code.

> > Anyway, it is certainly true that it is largely useless to have
> > fsync() commit only one path to a file; many applications expect to be
> > able to force a simple link(x,y) to be committed to disk.
>
> BSD FFS + softupdates sync all file names, traversing from the mount
> point down to the actual directory entries that need to be synched.

...and the Linux developers continue to insist that this is "stupid".
Ah, the joys of gaps in standards.

> It looks so to me. After the MTA behaviour has been dug up, the
> dirsync option could be even weaker if fsync() behaved like FFS +
> softupdates: sync the directory entries, including those of link and
> rename, as well.

Ideally, this would be an option you could set per-application (as
opposed to per-directory or per-mountpoint), because we are really
talking about allowing Linux to support applications written for BSD
file system semantics. It is not obvious to me what the best
implementation for that would be, though. Maybe just a compile-time
option to choose the appropriate open/link/rename/etc. operations.

- Pat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/