Re: ext3-2.4-0.9.0

Andrew Morton (andrewm@uow.edu.au)
Sun, 08 Jul 2001 11:05:32 +1000


Neil Brown wrote:
>
> On Saturday July 7, andrewm@uow.edu.au wrote:
> > An update of the ext3 journalling filesystem for 2.4 kernels
> > is available at
> >
> > http://www.uow.edu.au/~andrewm/linux/ext3/
> >
> > Patches are against 2.4.6-ac1 and 2.4.6.
>
> I thought it was time to try out ext3 between nfsd and raid5, so I
> built 2.4.6 plus this patch, and an ext3 filesystem on a largish
> raid5 volume, exported it (with the "sync" flag), mounted it from
> another machines with NFSv2, and ran "dbench 4".
>
> This produces a live-lock (I think that it the right term).
> Throughput would drop to zero (determined by watching the counts in
> /proc/nfs/rpc/nfsd), but could be coaxed along by generating other
> filesystem activity.
>
> I tried nfs over ext3 on a plain ide disc and it worked fine.
> I tried dbench directly on ext3/raid5 and it worked fine.
> I tried dbench/nfs/ext2/raid5 and it worked fine.
>
> So I think it is some interaction between ext3fs and raid5 triggered
> by the high rate of "fsync" calls made by nfsd. Naturally I blame
> ext3 because I know more about raid5 and nfsd :-)

fsync will cause ext3 to commit the current transaction once all
handles against it close - so that will produce rapid bursts
of small numbers of writes.

> One particular aspect of raid5 that *could* be related is that it is
> very reticent to schedule write requests. It tries to hang on the them
> as long as possible in the hope of getting more write requests in the
> same stripe. My guess as to what is happening is that as write
> request is submitted and then waited-for without an intervening
> run_task_queue(&tq_disk);

Could well be. ext3 will happily feed 2,000 buffers into submit_bh()
prior to running tq_disk. Everything else is happy with this, so I blame
nfsd and raid5 :) Rapid fsyncs will break this up, however.

Does this patch help?

--- fs/jbd/commit.c 2001/07/01 04:24:42 1.40
+++ fs/jbd/commit.c 2001/07/08 00:53:42
@@ -202,6 +202,7 @@
spin_unlock(&journal_datalist_lock);
unlock_journal(journal);
ll_rw_block(WRITE, bufs, wbuf);
+ run_task_queue(&tq_disk);
lock_journal(journal);
journal_brelse_array(wbuf, bufs);
goto write_out_data;
@@ -410,6 +411,7 @@
bh->b_end_io = end_buffer_io_sync;
submit_bh(WRITE, bh);
}
+ run_task_queue(&tq_disk);
lock_journal(journal);

/* Force a new descriptor to be generated next

> When the system is livelocked, all I can tell at the moment (I am at
> home and the console is at work so I cannot use alt-sysrq) is that
> kjournal is waiting in wait_on_buffer and an nfsd thread is waiting on
> the journal.

That sounds like Something Wierd is going on. wait_on_buffer will
unplug and the disks should be going hell-for-leather.

> I will try to explore it more deeply next time I am at work, but if
> there are any suggestions as to what it might be, or how I might more
> easily find out what is going on, I am all ears.
>

I'll see if I can get it to happen here. Thanks.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/