Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks

Chris Mason (mason@suse.com)
15 Jul 2002 13:31:49 -0400


On Mon, 2002-07-15 at 11:22, Patrick J. LoPresti wrote:
> Consider this argument:
>
> Given: On ext3, fsync() of any file on a partition commits all
> outstanding transactions on that partition to the log.
>
> Given: data=ordered forces pending data writes for a file to happen
> before related transactions are committed to the log.
>
> Therefore: With data=ordered, fsync() of any file on a partition
> syncs the outstanding writes of EVERY file on that
> partition.
>
> Is this argument correct? If so, it suggests that data=ordered is
> actually the *worst* possible journalling mode for a mail spool.
>

Yes. In practice this doesn't hurt as much as it could, because ext3
does a good job of letting more writers come in before forcing the
commit. What hurts you is when a forced commit comes in the middle of
creating the file. A data write that could have been contiguous gets
broken into two or more writes instead.

> One other thing. I think this statement is misleading:
>
> IF your server is stable and not prone to crashing, and/or you
> have the write cache on your hard drives battery backed, you
> should strongly consider using the writeback journaling mode of
> Ext3 versus ordered.
>
> This makes it sound like data=writeback is somehow unsafe when
> machines crash. I do not think this is true. If your application
> (e.g., Postfix) is written correctly (which it is), so it calls
> fsync() when it is supposed to, then data=writeback is *exactly* as
> safe as any other journalling mode.

Almost. data=writeback makes it possible for the old contents of a
block to end up in a newly grown file. There are a few ways this can
screw you up:

1) that newly grown file is someone's inbox, and the old contents of the
new block include someone else's private message.

2) That newly grown file is a control file for the application, and the
application expects it to contain valid data within (think sendmail).

> "Battery backed caches" and the
> like have nothing to do with it.

Nope, battery backed caches don't make data=writeback more or less safe
(with respect to the data anyway). They do make data=ordered and
data=journal more safe.

> And if your application is written
> incorrectly, then other journalling modes will reduce but not
> eliminate the chances for things to break catastrophically on a crash.
>
> So if the partition is dedicated to correct applications, like a mail
> spool is, then data=writeback is perfectly safe. If it is faster,
> too, then it really is a no-brainer.

For mail servers, data=journal is your friend. ext3 sometimes needs a
bigger log for it (reiserfs data=journal patches don't), but the
performance increase can be significant.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/