Re: ext2 FS corruption with 2.5.59.

Andrew Morton (akpm@digeo.com)
Fri, 24 Jan 2003 02:32:13 -0800


Oleg Drokin <green@namesys.com> wrote:
>
> Hello!
>
> While doing my usual quick tests I pass all my reiserfs patches through,
> I accidentally forgot to mount reiserfs partition and all the tests
> were performed on ext2 instead.
> So I found that first of all fsx (freebsd nfs testing tool)
> breaks if there are parallel writers (no matter if they write to this
> same partition or not). i_blocks value becomes negative on truncate
> (and then it is written to disk). fsx log attached.
> Also subsequent fsck found lots of other corruptions.
> (previously fs was checked and no corruptions were found).
> Excerpts from e2fsck output are attached.
>
> My test consists of running "fsx -c 1234 testfile", "iozone -a",
> "dbench 60", "fsstress -p10 -n1000000 -d ." at the same time on the
> tested FS.
> fsx usually breaks just when dbench is finished.
>
> Host is dual athlon 1700+, 1G RAM, SMP kernel, Highmem support, no preemt.
>
> I reproduced this behaviour on vanilla 2.5.59. (and It is easily
> reproducable)
>

In 2.5.52 I broke sys_sync in subtle ways. It seems that fsstress hits
sync() hard enough to trigger the failure.

sys_sync() will set mapping->dirtied_when non-zero against a clean inode.
Later, in (say) __iget(), that inode gets moved over to inode_unused or
inode_in_use. But because it has non-zero ->dirtied_when,
__mark_inode_dirty() thinks that the inode must still be on sb->s_dirty.

But it isn't. It's on inode_in_use. It (and its pages) never get written
out and the data gets thrown away on unmount. Christoph has hit the same
thing. I bet he was running fsstress too. Sorry about that.

The below patch should fix it up - please test that.

So that's the umount problem. I don't know why you're getting the fsx-linux
failure - I was unable to hit it in an hour's run of the above workload on
4-way. Against both scsi and IDE. So please look further into that, thanks.

diff -puN fs/fs-writeback.c~sync-fix fs/fs-writeback.c
--- 25/fs/fs-writeback.c~sync-fix 2003-01-24 02:14:23.000000000 -0800
+++ 25-akpm/fs/fs-writeback.c 2003-01-24 02:17:12.000000000 -0800
@@ -67,6 +67,7 @@ void __mark_inode_dirty(struct inode *in

spin_lock(&inode_lock);
if ((inode->i_state & flags) != flags) {
+ const int was_dirty = inode->i_state & I_DIRTY;
struct address_space *mapping = inode->i_mapping;

inode->i_state |= flags;
@@ -90,7 +91,7 @@ void __mark_inode_dirty(struct inode *in
* If the inode was already on s_dirty or s_io, don't
* reposition it (that would break s_dirty time-ordering).
*/
- if (!mapping->dirtied_when) {
+ if (!was_dirty) {
mapping->dirtied_when = jiffies|1; /* 0 is special */
list_move(&inode->i_list, &sb->s_dirty);
}
@@ -280,7 +281,7 @@ sync_sb_inodes(struct super_block *sb, s
__iget(inode);
__writeback_single_inode(inode, really_sync, wbc);
if (wbc->sync_mode == WB_SYNC_HOLD) {
- mapping->dirtied_when = jiffies;
+ mapping->dirtied_when = jiffies|1;
list_move(&inode->i_list, &sb->s_dirty);
}
if (current_is_pdflush())

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/