Re: 2.0, 2.2, 2.4, 2.5: fsync buffer race

Andrew Morton (akpm@digeo.com)
Mon, 10 Feb 2003 13:44:34 -0800


Andrea Arcangeli <andrea@suse.de> wrote:
>
> On Mon, Feb 10, 2003 at 12:40:00PM -0800, Andrew Morton wrote:
> > void sync_dirty_buffer(struct buffer_head *bh)
> > {
> > lock_buffer(bh);
> > if (test_clear_buffer_dirty(bh)) {
> > get_bh(bh);
> > bh->b_end_io = end_buffer_io_sync;
> > submit_bh(WRITE, bh);
> > } else {
> > unlock_buffer(bh);
> > }
> > }
>
> If you we don't take the lock around the mark_dirty_buffer as Linus
> suggested (to avoid serializing in the non-sync case), why don't you
> simply add lock_buffer() to ll_rw_block() as we suggested originally

That is undesirable for READA.

> and
> you #define sync_dirty_buffer as ll_rw_block+wait_on_buffer if you
> really want to make the cleanup?

Linux 2.4 tends to contain costly confusion between writeout for memory
cleansing and writeout for data integrity.

In 2.5 I have been trying to make it very clear and explicit that these are
fundamentally different things.

> ...
> Especially in 2.4 I wouldn't like to make the below change that is
> 100% equivalent to a one liner patch that just adds lock_buffer()
> instead of the test-and-set-bit (for reads I see no problems either).

That'd probably be OK, with a dont-do-that for READA.

> BTW, Linus's way that suggests the lock around the data modifications
> (unconditionally), would also enforce metadata coherency so it would
> provide an additional coherency guarantee (but it's not directly related
> to this problem and it may be overkill). Normally we always allow
> in-core modifications of the buffer during write-IO to disk (also for
> the data in pagecache). Only the journal commits must be very careful in
> avoiding that (like applications must be careful to run fsync and not to
> overwrite the data during the fsync). So normally taking the lock around
> the in-core modification and mark_buffer_dirty, would be overkill IMHO.

Yup. Except for a non-uptodate buffer. If software is bringing a
non-uptodate buffer uptodate by hand it should generally be locked, else a
concurrent read may stomp on the changes. There are few places where this
happens.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/