Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks

Lawrence Greenfield (leg+@andrew.cmu.edu)
Mon, 15 Jul 2002 21:02:11 -0400


From: "Patrick J. LoPresti" <patl@curl.com>
Date: 15 Jul 2002 17:31:07 -0400
[...]
I really wish MTA authors would just support Linux's "fsync the
directory" approach. It is simple, reliable, and fast. Yes, it does
require Linux-specific support in the application, but that's what
application authors should expect when there is a gap in the
standards.

Actually, it's not all that simple (you have to find the enclosing
directories of any files you're modifying, which might require string
manipulation) or necessarily all that fast (you're doubling the number
of system calls and now the application is imposing an ordering on the
filesystem that didn't exist before).

It's only necessary for ext2. Modern Linux filesystems (such as ext3
or reiserfs) don't require it.

Finally: ext2 isn't safe even if you do call fsync() on the directory!

Let's consider: some filesystem operation modifies two different
blocks. This operation is safe if block A is written before block
B.

. FFS guarantees this by performing the writes synchronously: block A
is written when it is changed, followed by block B when it is changed.

. Journalling filesystems (ext3, reiserfs) guarantee this by
journalling the operation and forcing that journal entry to disk
before either A or B can be modified.

. What does ext2 do (in the default mode)? It modifies A, it modifies
B, and then leaves it up to the buffer cache to write them back---and
the buffer cache might decide to write B before A.

We're finally getting to some decent shared semantics on
filesystems. Reiserfs, ext3, FFS w/ softupdates, vxfs, etc., all work
with just fsync()ing the file (though an fsync() is required after a
link() or rename() operation). Let's encourage all filesystems to
provide these semantics and make it slightly easier on us stupid
application programmers.

Larry

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/