Re: ext2 FS corruption with 2.5.59.

William Lee Irwin III (wli@holomorphy.com)
Sat, 25 Jan 2003 20:14:26 -0800


William Lee Irwin III <wli@holomorphy.com> wrote:
>> Ticket locks need atomic fetch and increment. These don't look right.

On Sat, Jan 25, 2003 at 07:46:48PM -0800, Andrew Morton wrote:
> Well look at the reader side:
> loff_t i_size_read(struct inode *inode)
> {
> unsigned seq;
> loff_t ret;
>
> do {
> seq = fr_write_begin(&inode->i_frlock);
> ret = inode->i_size;
> } while (seq != fr_write_end(&inode->i_frlock);
> return ret;
> }

This doesn't look particularly reassuring either. We have:

(1) increment ->pre_sequence
(2) wmb()
(3) get inode->i_size
(4) wmb()
(5) increment ->post_sequence
(6) wmb()

Supposing the overall scheme is sound, one of the wmb()'s is unnecessary;
in theory rmb() is all that's needed before (5) to catch writes.

I'd have to go through some kind of state transition fiasco to be sure
this actually recovers from the races where two readers fetch the same
value of ->pre_sequence or ->post_sequence and store the same
incremented value to convince myself this is right. I'll assume you've
either done so yourself or are relying on someone else's verification.

Restarting the read like this is highly unusual; if retrying the
critical section is in fact the basis of this locking algorithm then
it's not a true ticket lock.

On Sat, Jan 25, 2003 at 07:46:48PM -0800, Andrew Morton wrote:
> One change which is needed here is to disable preemption in fr_write_begin();
> otherwise an frlock could be in the pre!=post state for hundreds of
> milliseconds while the writer gets preempted. Other CPUs would just spin for
> the duration.
> The same would happen if the writer takes an interrupt while pre!=post, but
> that's the same for all spinlocks...

Yes, this is a standard requirement for all non-sleeping locks. There
was only enough code in the post to "be sure" that the fetch and
increment bits were missing; I assumed that otherwise they'd be wrapped.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/