Re: ReiserFS data corruption in very simple configuration

Stephen C. Tweedie (sct@redhat.com)
Wed, 26 Sep 2001 15:43:11 +0100


Hi,

On Tue, Sep 25, 2001 at 01:13:04PM -0700, Mike Fedyk wrote:

> > Stock reiserfs only provides meta-data journalling. It guarantees that
> > structure of you file-system will be correct after journal replay, not
> > content of a files. It will never "trash" file that wasn't accessed at
> > the moment of crash, though. Full data-journaling comes at cost. There
> > is patch by Chris Mason <Mason@Suse.COM> to support data journaling in
> > reiserfs. Ext3 supports it also.

> When files on a ReiserFS mount have data from other files, does that mean
> that it has recovered wrong meta-data, or is it because the meta-data was
> committed before the data?

It can be either, but the former can only be the result of a problem
(either hardware fault or a data-corrupting software bug of some
description). In the normal case, only the latter scenario happens.

ext3 has a mode to flush all data before metadata gets committed.
That is its default mode, and it avoids this problem without having to
actually journal the data.

> So, if I write a file, does ReiserFS write the structures first, and if the
> data isn't written, whatever else was deleted from the block before will now
> be in the file?

Yep. ext3 behaves in the same way in its fastest "writeback" data
mode.

> If that's so, then one way to keep old deleted data from getting into
> partially written files after a crash would be to zero out the blocks on
> unlink. I can imagine that this would prevent undelete, and slow down
> deleting considerably.

Indeed.

> Another way, may be to keep a journal of which blocks have actually been
> committed. Maybe a bitmap in the journal, or some other structure...

ext3 does exactly that. It's necessary to keep things in sync if we
have blocks of data being deleted and reallocated as metadata, or
vice-versa.

> If you have data journaling, does that mean there is a possability of
> recovering a complete file -before- it was written? i.e:

> echo a > test;
> sync;
> cat picture.tif > test
> (writing in progress, only partially in journal)
> power off

> Will "a" be in test upon recovery?

If you are using full data journaling (ext3's "journal" data mode) or
the default "ordered" data mode, then no, you never see such
behaviour.

In the ordered mode, it achieves this precisely because it is keeping
a record of which blocks have been committed (or, more accurately,
which *deleted* blocks have had the delete committed). If you do a
"cat > file", then before the new data is written, the file gets
truncated and all its old data blocks deleted. ext3 will then refuse
to reuse those blocks until the delete has been committed, so if we
crash and end up rolling back the delete transaction, we'll never see
new data blocks in the old file.

Cheers,
Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/