Re: 2.4.19pre1aa1
Andrea Arcangeli (andrea@suse.de)
Wed, 6 Mar 2002 00:37:50 +0100
On Tue, Mar 05, 2002 at 03:24:49PM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > BTW, I noticed one of my last my email was a private reply so I'll
> > answer here too for the buffer_head pagecache I/O part:
> 
> Heh.  Me too.
>  
> > Having persistence on the physical I/O information is a good thing, so
> > you don't need to resolve logical to physical block at every I/O and bio
> > has a cost to setup too. The information we carry on the bh isn't
> > superflous, it's needed for the I/O so even if you don't use the
> > buffer_head you will still need some other memory to hold such
> > information, or alternatively you need to call get_block (and serialize
> > in the fs) at every I/O even if you've plenty of ram free. So I don't
> > think the current setup is that stupid, current bh only sucks for the
> > rawio and that's fixed by bio.
> 
> The small benefit of caching the get_block result in the buffers
> just isn't worth it.
> 
> At present, a one-megabyte write to disk requires the allocation
> and freeing and manipulation and locking of 256 buffer_heads and
> 256 BIOs.  lru_list_lock, hash_table_lock, icache/dcache
> thrashing, etc, etc.   It's an *enormous* amount of work.
> 
> I'm doing the same amount of work with as few as two (yes, 2) BIOs.
> 
> This is not something theoretical.  I have numbers, and code.
> 20% speedup on a 2-way with a workload which is dominated
> by copy_*_user.  It'll be more significant on larger machines,
> on machines with higher core/main memory speed ratios, on
> machines with higher I/O bandwidth. (OK, that bit was theoretical).
then let's cut and paste this part as well :)
depends what you're doing, if you do `cp /dev/zero .` and the fs is
lucky enough to have free contigous space I definitely can see the
improvement of highlevel merging, but that's not always what you're
doing with the fs, for example that's not the case for kernel compiles
and small files where you'll be always fragmented and where the bio will
at max hold 4k and you keep rewriting into cache. The times you enter
get_block you enter in a fs lock, rather than staying at the per-page
lock, it's not additional locking, the bh on the pagecahce doesn't need
any additional locking. So for a kernel compile the current situation is
an obvious advantage in performance and scalability (fs code definitely
doesn't scale at the moment).
But ok, globally it will be probably better to drop the bh since we have
to work on the bio anyways somehow and so at the very least we don't
want to be slowed down from the bio logic in the physically contigous
pagecache flood case.
I just meant the bh isn't totally pointless and it could be shrunk as
Arjan said in a private email.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/