Re: 2.4.19rc2aa1 i_size atomic access

Andrea Arcangeli (andrea@suse.de)
Tue, 23 Jul 2002 18:56:02 +0200


On Fri, Jul 19, 2002 at 03:56:36PM -0700, Daniel McNeil wrote:
> Here is another approach. I added two version fields to the inode
> structure. The first one is updated before i_size and the 2nd is
> updated after with memory barriers in between. The i_size_read()
> samples the version fields and i_size and loops until it can read
> i_size without an i_size update happening at the same time. It is
> not pretty but it does fix the problem and the cache line is not
> written by i_size_read() and it should work on all architechtures.
> I've tested this on a two proc system.

I also considered this possibility before taking the other approch, I
thought it was inferior because it adds branches and it increases the
dcache pressure, so I thought just marking our cacheline dirty and
reading it in one go, with no additional overhead would been a win (the
less possible number of cycles and no branch prediction issues). Of
course the below will allow parallel i_size readers to scale, but again,
I think the fstat benchmark doesn't matter much and true parallel
readers on the same inode (not only i_size readers) will have to collide
on the pagecache_lock anyways (even in 2.5). So I still think the
chmpxchg8b is a win despite it marks the i_size cacheline dirty, but
somebody should try to benchmark it probably to verify the major
bottleneck remains the pagecache_lock.

I actually applied the below but I enabled it only for the non x86 32bit
archs (like s390, ppc) where I have no idea how to code a get_64bit it
in asm. It should be definitely better than a separate spinlock
protecting the i_size.

comments?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/