Re: fsync fixes for 2.4

Andrew Morton (akpm@zip.com.au)
Mon, 15 Jul 2002 11:36:56 -0700


Andrea Arcangeli wrote:
>
> ...
> as for the scaling with async flushes to multiple devices, 2.4 has a
> single flushing thread, 2.5 as Andrew said (partly) fixes this as he
> explained me at OLS, with multiple pdflush. The only issue I seen in his
> design is that he works based on superblocks, so if a filesystem is on
> top of a lvm backed by a dozen of different harddisks, only one pdflush
> will pump on those dozen physical request queues, because the first
> pdflush entering the superblock will forbid other pdflush to work on the
> same superblock too. So the first physical queue that is full, will
> forbid pdflush to push more dirty pages to the other possibly empty
> physical queues.

Well. There's no way in which we can get effective writeback against
200 spindles by relying on pdflush, so that daemon is mainly there
to permit background writeback under light-to-moderate loads.

Once things get heavy, the only sane approach is to use the actual
caller of write(2) as the resource for performing the writeback.
As we're currently doing, in balance_dirty[_pages](). But the
problem there is that in both 2.4 and 2.5, a caller to that function
can easily get stuck on the wrong queue, and bandwidth really suffers.

I've been working on changing 2.5 so that the write(2) caller no
longer performs a general "writeback of everything" - that caller
instead performs writeback specifically against the queue which
he just dirtied. Do this by using the address_space->backing_dev_info
as a key during a search across the superblocks and blockdev inodes.
That works quite well.

But there's still a problem where pdflush goes to writeback a queue
and fills it up, so the userspace program ends up blocking (due to
pdflush's activity) when it really should not. Still undecided about
what to do about that.

And yes, point taken on the LVM thing. If the chunk size is reasonably
small (a few megabytes) then we should normally get decent concurrency,
but there will be corner-cases.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/