Re: 2.4.14-pre6

Andrew Morton (akpm@zip.com.au)
Thu, 01 Nov 2001 12:55:41 -0800


Neil Brown wrote:
>
> ...
> What I would like is that as soon as a buffer was marked "dirty", it
> would get passed down to the driver (or at least to the
> block-device-layer) with something like
> submit_bh(WRITEA, bh);
> i.e. a write ahead. (or is it write-behind...)
> The device handler (the elevator algorithm for normal disks, other
> code for other devices) could keep them ordered in whatever way it
> chooses, and feed them into the queues at some appropriate time.
>

Sounds sensible to me.

In many ways, it's similar to the current scheme when it's used
with an enormous request queue - all writeable blocks in the
system are candidates for request merging. But your proposal
is a whole lot smarter.

In particular, the current kupdate scheme of writing the
dirty block list out in six chunks, five seconds apart
does potentially miss out on a very large number of merging
opportunities. Your proposal would fix that.

Another potential microoptimisation would be to write out
clean blocks if that helps merging. So if we see a write
for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
then write it out too. I suspect this would be a win for
ATA but a loss for SCSI. Not sure.

But I have a gut feel that all this is in the noisefloor,
compared to The Big Problem. It's just a matter of identifying
and fixing TBP. Fixing the fdatasync() thing didn't help,
because ext2_write_inode() for a new file has to read the
inode block from disk. Fixing that, by doing an async preread
of the inode's block in ext2_new_inode() didn't help either,
I suspect because my working set was so large that the VM
tossed out my preread before I got to use it. A few more days
poking is needed.

Oh. I have a gripe concerning prune_icache(). The design
idea behind keventd is that it's a "process context bottom
half handler". It's used for things like cardbus hotplug
interrupt handlers, handling tty hangups, etc. It should
probably run SCHED_FIFO.

Using keventd to synchronously flush large amounts of
data out to disk constitutes gross abuse - it's being blocked
from performing its designed duties for many seconds. Can we
please not do that? We already have kswapd, kupdate, bdflush,
which should be sufficient.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/