Re: data corrupting bug in 2.4.20 ext3, data=journal

Stephen C. Tweedie (sct@redhat.com)
02 Dec 2002 12:15:05 +0000


On Mon, 2002-12-02 at 07:17, Andrew Morton wrote:

> Are you sure? I can't make it happen on 2.4.19. And disabling the new
> BH_Freed logic (which went into 2.4.20-pre5) makes it go away.
>
> --- linux-akpm/fs/jbd/commit.c~a Sun Dec 1 23:10:12 2002
> +++ linux-akpm-akpm/fs/jbd/commit.c Sun Dec 1 23:10:27 2002
> @@ -695,7 +695,7 @@ skip_commit: /* The journal should be un
> - clear_bit(BH_JBDDirty, &bh->b_state);
> +// clear_bit(BH_JBDDirty, &bh->b_state);

Argh.

That's not the right fix --- it reintroduces the bug that BH_Freed was
introduced to solve in the first place.

The problem is that ext3 is expecting that truncate_inode_pages() (and
hence ext3_flushpage) is only called during a truncate. That's what the
function is named for, after all, and it's the hint we need to indicate
that future writeback on the data we're discarding should be disabled
(so that we don't get old data written on top of new data should the
block get deallocated.)

But kill_supers() eventually calls truncate_inode_pages() too when we're
doing the invalidate_inodes(). And ext3 is reacting just the way it
would for a normal truncate --- the data still gets written to the
journal (correct, if we reboot before the truncate commits then the old
data is preserved in the journal) but is not queued for writeback.

The solution is to set BH_Freed in ext3_flushpage IFF we're being called
from the truncate, but to avoid it if we're in an umount. I'm not sure
of the best way to do that right now, but there are some trivial but
hacky methods possible (eg. see if we're in a nested transaction; if so,
it's a truncate, if not, it's a umount.) MS_ACTIVE might be a possible
flag to test, but I'll need to double-check whether that is 100% safe
--- we can't afford to skip the BH_Freed setting if we're in a truncate
and the filesystem is not yet completely quiesced.

Cheers,
Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/