Re: 2.4.19rc2aa1 i_size atomic access

Daniel McNeil (daniel@osdl.org)
29 Jul 2002 17:34:16 -0700


Andrea,

Sorry I haven't responded, but I was on vacation all last week and
was not near a computer.

I like your code change. Incrementing the v2 before the v1 in the
i_size_write() is much better. My code was definitely uglier -- but
it was correct since the version1 and version2 where sampled before
i_size was read and version1 and version2 where checked again after.
It was excessive, but correct.

On your patch, shouldn't non-smp preempt still use the 64-bit stuff?
The comment says it should, but the #ifdef's are not checking for
PREEMPT or did I miss something?

I would still be curious about the performance difference between the
version approach and the cmpxchg8 approach. With SMP I'm a bit worried
about the cacheline bouncing around and the memory bandwith wasted.

Any ideas on what kind of test would be appropriate?
I've got access to 2-proc to 8-proc systems I could run some tests on,
just not sure what test would be useful. The fstat() test isn't
realistic.

Increasing the versions to 32-bit is ok with -- I was just trying to
not waste too much space.

Daniel

On Tue, 2002-07-23 at 10:08, Andrea Arcangeli wrote:
> On Fri, Jul 19, 2002 at 03:56:36PM -0700, Daniel McNeil wrote:
> > + short v1;
> > + short v2;
> > + loff_t i_size;
> > +
> > + /*
> > + * retry if i_size was possibly modified while sampling.
> > + */
> > + do {
> > + v1 = inode->i_attr_version1;
> > + v2 = inode->i_attr_version2;
> > + rmb();
> > + i_size = inode->i_size;
> > + rmb();
> > + } while (v1 != v2 ||
> > + v1 != inode->i_attr_version1 ||
> > + v1 != inode->i_attr_version2);
> > + return i_size;
> [..]
> > #elif BITS_PER_LONG==64
> > return inode->i_size;
> > #endif
> > @@ -548,8 +566,12 @@
> > static inline void i_size_write(struct inode * inode, loff_t i_size)
> > {
> > #if BITS_PER_LONG==32
> > - set_64bit((unsigned long long *) &inode->i_size,
> > - (unsigned long long) i_size);
> > + inode->i_attr_version1++; /* changing i_size */
> > + wmb();
> > + inode->i_size = i_size;
> > + wmb();
> > + inode->i_attr_version2++; /* done with change */
> > + wmb();
> > #elif BITS_PER_LONG==64
> > inode->i_size = i_size;
> > #endif
>
> btw, looking at the implementation the read side was buggy. First it's
> pointless to read both version1 and version2 before reading the isize,
> secondly if you increase version1 first (in the writer), the reader has
> to read version2 before i_size and version1 after i_size. While you're
> doing the opposite, you compare v1 (version1) read before i_size with
> version2 after i_size.
>
> So while merging it I rewrote it this way (I also change the type of the
> sequence number to int, 2^16 can overflow quite fast in a multighz cpu,
> mostly for paranoid of course, and to go safe with the atomic granularty
> provided by archs, int certainly has to be atomic-granular).
>
> loff_t i_size;
> int v1, v2;
>
> /* Retry if i_size was possibly modified while sampling. */
> do {
> v1 = inode->i_size_version1;
> rmb();
> i_size = inode->i_size;
> rmb();
> v2 = inode->i_size_version2;
> } while (v1 != v2);
>
> return i_size;
>
>
> and the new writer side:
>
> inode->i_size_version2++;
> wmb();
> inode->i_size = i_size;
> wmb();
> inode->i_size_version1++;
> wmb(); /* make it visible ASAP */
>
> Andrea
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/