lock_super() has been `down()' for a long time. In 2.4, too.
> As Andrew mentioned, there would also need to be be a per-journal lock to
> ensure coherency of the journal data. Currently the per-filesystem and
> per-journal lock would be equivalent, but when a single journal device
> can be shared among multiple filesystems they would be different locks.
Well. First I want to know if block-highmem is in there. If not,
then yep, we'll spend ages spinning on the BKL. Because ext3 _is_
BKL-happy, and if a CPU takes a disk interrupt while holding the BKL
and then sits there in interrupt context copying tons of cache-cold
memory around, guess what the other CPUs will be doing?
> I will leave it up to Andrew and Stephen to discuss locking scalability
> within the journal layer.
ext3 is about 700x as complex as ext2. It will need to be done with
some care.
> Within the filesystem there can be a large number of increasingly fine
> locks added - a superblock-only lock with per-group locks, or even
> per-bitmap and per-inode-table(-block) locks if needed. This would
> allow multi- threaded inode and block allocations, but a sane lock
> ranking strategy would have to be developed. The bitmap locks would
> only need to be 2-state locks, because you only look at the bitmaps
> when you want to modify them. The inode table locks would be read/write
> locks.
The next steps for ext2 are: stare at Anton's next set of graphs and
then, I expect, removal of the fs-private bitmap LRUs, per-cpu buffer
LRUs to avoid blockdev mapping lock contention, per-blockgroup locks
and removal of lock_super from the block allocator.
But there's no point in doing that while zone->lock and pagemap_lru_lock
are top of the list. Fixes for both of those are in progress.
ext2 is bog-simple. It will scale up the wazoo in 2.6.
> If there is a try-writelock mechanism for the individual inode table
> blocks you can avoid write lock contention for creations by simply
> finding the first un-write-locked block in the target group's inode table
> (usually in the hundreds of blocks per group for default parameters).
Depends on what the profile say, Andreas. And I mean profiles - lockmeter
tends to tell you "what", not "why". Start at the top of the list. Fix
them by design if possible. If not, tweak it!
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/