Re: ext2 FS corruption with 2.5.59.

Andrew Morton (akpm@digeo.com)
Sat, 25 Jan 2003 21:10:03 -0800


William Lee Irwin III <wli@holomorphy.com> wrote:
>
> William Lee Irwin III <wli@holomorphy.com> wrote:
> >> Ticket locks need atomic fetch and increment. These don't look right.
>
> On Sat, Jan 25, 2003 at 07:46:48PM -0800, Andrew Morton wrote:
> > Well look at the reader side:
> > loff_t i_size_read(struct inode *inode)
> > {
> > unsigned seq;
> > loff_t ret;
> >
> > do {
> > seq = fr_write_begin(&inode->i_frlock);
> > ret = inode->i_size;
> > } while (seq != fr_write_end(&inode->i_frlock);
> > return ret;
> > }

argh. That should have been:

> > seq = fr_read_begin(&inode->i_frlock);
> > ret = inode->i_size;
> > } while (seq != fr_read_end(&inode->i_frlock);
> > return ret;
> > }

of course.

> This doesn't look particularly reassuring either. We have:
>
> (1) increment ->pre_sequence
> (2) wmb()
> (3) get inode->i_size
> (4) wmb()
> (5) increment ->post_sequence
> (6) wmb()
>
> Supposing the overall scheme is sound, one of the wmb()'s is unnecessary;

Could be.

> I'd have to go through some kind of state transition fiasco to be sure
> this actually recovers from the races where two readers fetch the same
> value of ->pre_sequence or ->post_sequence and store the same
> incremented value to convince myself this is right.

readers do not modify the lock - they simply observe.

The fr_write_begin/fr_write_end pair assumes that there is only a single
writer possible. In the case of i_size, that exclusion is provided by i_sem.
i_size is always modified under i_sem.

> I'll assume you've
> either done so yourself or are relying on someone else's verification.

More the latter ;)

> Restarting the read like this is highly unusual; if retrying the
> critical section is in fact the basis of this locking algorithm then
> it's not a true ticket lock.

Retrying the read is the basis of the locking algorithm.

The frlock stuff needs more work for non-SMP bloat avoidance, but it's simple
and seems sensible.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/