Re: 2.4.19rc2aa1 i_size atomic access

Daniel McNeil (daniel@osdl.org)
19 Jul 2002 15:56:36 -0700


On Fri, 2002-07-19 at 02:23, Andrea Arcangeli wrote:
> On Thu, Jul 18, 2002 at 05:09:21PM -0700, Daniel McNeil wrote:
> > fstat() was an easy way to test for incorrect reading of i_size,
> > so I could test the cmpxchg8 code to read a 64-bit value.
>
> yes that's probably the simpler way to reproduce it.
>
> > This cacheline bouncing is the thing I don't like. It was not clear
>
> I don't like it either of course, but I couldn't figure out a way to read 64bit
> atomically without generating it.
>
> > >
> > > If you have suggestions that's welcome, thanks.
> >
> > I'll let you know if I think of any.
>
> I guess cmpxchg8b is the best way at least for 2.4.
>
> Andrea

Andrea,

Here is another approach. I added two version fields to the inode
structure. The first one is updated before i_size and the 2nd is
updated after with memory barriers in between. The i_size_read()
samples the version fields and i_size and loops until it can read
i_size without an i_size update happening at the same time. It is
not pretty but it does fix the problem and the cache line is not
written by i_size_read() and it should work on all architechtures.
I've tested this on a two proc system.

Let me know what you think,

Daniel

--- linux-2.4.19-rc2aa1/include/linux/fs.h Fri Jul 19 15:22:08 2002
+++ linux-2.4.19-rc2aa1.new/include/linux/fs.h Fri Jul 19 15:22:25 2002
@@ -501,6 +501,8 @@
atomic_t i_writecount;
unsigned int i_attr_flags;
__u32 i_generation;
+ volatile short i_attr_version1;
+ volatile short i_attr_version2;
union {
struct minix_inode_info minix_i;
struct ext2_inode_info ext2_i;
@@ -539,7 +541,23 @@
static inline loff_t i_size_read(struct inode * inode)
{
#if BITS_PER_LONG==32
- return (loff_t) read_64bit((unsigned long long *) &inode->i_size);
+ short v1;
+ short v2;
+ loff_t i_size;
+
+ /*
+ * retry if i_size was possibly modified while sampling.
+ */
+ do {
+ v1 = inode->i_attr_version1;
+ v2 = inode->i_attr_version2;
+ rmb();
+ i_size = inode->i_size;
+ rmb();
+ } while (v1 != v2 ||
+ v1 != inode->i_attr_version1 ||
+ v1 != inode->i_attr_version2);
+ return i_size;
#elif BITS_PER_LONG==64
return inode->i_size;
#endif
@@ -548,8 +566,12 @@
static inline void i_size_write(struct inode * inode, loff_t i_size)
{
#if BITS_PER_LONG==32
- set_64bit((unsigned long long *) &inode->i_size,
- (unsigned long long) i_size);
+ inode->i_attr_version1++; /* changing i_size */
+ wmb();
+ inode->i_size = i_size;
+ wmb();
+ inode->i_attr_version2++; /* done with change */
+ wmb();
#elif BITS_PER_LONG==64
inode->i_size = i_size;
#endif

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/