Re: Linux-2.5.14..

Andrew Morton (akpm@zip.com.au)
Wed, 08 May 2002 21:34:20 -0700


Daniel Pittman wrote:
>
> On Mon, 6 May 2002, Linus Torvalds wrote:
> > On Mon, 6 May 2002, Daniel Pittman wrote:
> >>
> >> From the look of the changelog at least a few of the file corruption
> >> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should
> >> I expect this release to address the problems I was seeing?
> >
> > "Expect" is too strong a word. I'd say "hope" - a number of truncate
> > bugs were fixed, but whether that was what bit you, nobody knows.
> >
> > I suspect the real answer is that we'd love for you to test things
> > out, but that if it ends up being too painful to recover if the
> > problems happen again, you probably shouldn't..
>
> Right. I got brave enough to test it on a real, live system after
> extensive fake testing. It seems to work well, at least so far as
> running the same workload that cause massive file corruption correctly.

hmm.

> So, I believe that 2.5.14 is working correctly with 2k ext3 filesystems,
> at least for minimal use. I didn't do any sort of extreme load testing
> or anything like that, being cautious about it.

I've been testing 2.5.14 pretty hard for a couple of days.

ext2, ext3-ordered, ext3-writeback (all with small blocks) are solid.

reiserfs and vfat are solid.

JFS deadlocks (see the BUGBUG comment in jfs_txnmgr.c - it happens). I've
asked the JFS guys for help on this; possibly the code can just be removed:
the buffer-based writeout which I replaced wouldn't have written the
pages anyway...

ext3-journalled is not happy.

There's a locking bug between try_to_free_buffers and buffer_insert_inode_queue
which never seems to trigger. I've got that fixed.

There's a known race between unmount and writeback which is probably
impossible to trigger. (see the FIXME at __sync_list). Testing the
fix for that at present.

The "sync" functions aren't right. Pages which are both dirty and
under writeback are not correctly waited upon. This is a minor
correctness thing which nobody would notice. Still thinking
about the best way to close this.

So unless you're a JFS or ext3-journalled user, 2.5.14 is OK.

> On reboot, I got an assertion in ext3, though, and the following BUG
> trace. So, something still isn't well, but it seems to be getting it
> much more right. :)
>
> ...
>
> >>EIP; c015cf45 <journal_dirty_metadata+13d/174> <=====
>
> ...
> Code; c015cf45 <journal_dirty_metadata+13d/174> <=====
> 0: 0f 0b ud2a <=====
> Code; c015cf47 <journal_dirty_metadata+13f/174>
> 2: 60 pusha
> Code; c015cf48 <journal_dirty_metadata+140/174>
> 3: 04 40 add $0x40,%al

04 60 -> line 1120. Yup, I get that one too. I assume you were
testing with data=journal.

Thanks again...

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/