Re: linux-2.4.10-pre5

Andrea Arcangeli (andrea@suse.de)
Sun, 9 Sep 2001 05:30:15 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Andrea Arcangeli: "Re: expand_stack fix [was Re: 2.4.9aa3]"
Previous message: Andrea Arcangeli: "Re: linux-2.4.10-pre5"

On Sat, Sep 08, 2001 at 07:31:57PM -0700, Linus Torvalds wrote:
>
> On Sun, 9 Sep 2001, Andrea Arcangeli wrote:
> >
> > last blkdev close()
>
> I repeat: why last? What if there's a read-only user that has the thing
> open for some reason, like gathering statistics every once in a while with
> ioctl's or whatever?
>
> That "last" is a bug.

The only case where we must synchronize is for tune2fs fsck etc...
there's no ioctl gathering issue running under those. The ioctl
gathering will be started after fsck returned, then it doesn't matter if
the ioctl stays there forever.

Infact it would be particularly bad to synchronize at every close for
the real users of the blkdev, that really want to simply work on a file
as much transparently and fast as possible.

However in theory we could do the fsync + synchronize at every ->release but I
doubt we really need that. At least it wasn't a "bug" but it is a
very intentional behaviour.

> > - write pagecache to disk
> > - drop pagecache
> > - now all new data is on disk
> > - all the above and the below is done atomically with the
> > bdev semaphore (no new openers or anything can play
> > with the pagecache under us, only thing that could
> > play under us on the disk is the fs if mounted rw)
> >
> > - here _if_ we have an fs under us, it still has obsolete
> > data in pinned buffer cache we need to fix it
>
> Agreed. But you should NOT make that a special case.
>
> I'm saying that you can, and should, just _unconditionally_ call
>
> invalidate_device()
>
> which in turn will walk the buffer cache for that device, and try to throw
> it away.
>
> With the simple (again unconditional) addition of: If it cannot throw it
> away because it is pinned through bh->b_count, it knows somebody is using
> the buffer cache (obviously), so regardless of _what_ kind of user it is,
> be it a direct-io one or a magic kernel module or whatever, it does
>
> lock_buffer(bh);
> if (!buffer_dirty(bh))
> submit_bh(bh, READ);
> else {
> /* we just have to assume that the aliasing is ok */
> unlock_buffer(bh);
> }
>
> UNCONDITIONALLY.

This will corrupt the fs at the first hdparm -t on a rw mounted disk
_unless_ the fs always lock_buffer(superblock_bh) before modifying any
metadata _and_ marks the buffer dirty before the unlock_super, which is
not the case. We could possibly rely solely on the lock_super around the
update_buffers() but that would be too weak too since the fs is allowed
not to use the lock_super around all its metadata updates, while it is
well defined where we set/reset MSRDONLY, that always happens with the
super lock acquired and so I only rely on the MSRDONLY plus super lock
acquired which is certainly (and I think the only) safe approch
completly backwards compatible.

> Without caring about things like "is there a filesystem mounted". No
> silly rules. The _only_ rule becomes: "try to make the buffer cache as
> up-to-date as possible".
>
> See? I'm saying that your approach tries to make decisions that it just
> should not make. It shouldn't care or know about things like "a filesystem

I wish I didn't need to make those decisions, infact if you pick the first
patches and you'll see they were not making those decisions and the
early patches were less safe than the latest ones (though the window for
the race was small).

I wish the cache coherency logic would be simpler but just doing
something unconditionally it's going to break things in one way or
another as far I can tell.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrea Arcangeli: "Re: expand_stack fix [was Re: 2.4.9aa3]"
Previous message: Andrea Arcangeli: "Re: linux-2.4.10-pre5"