The difference is, there's only one superblock per mount.  There are 
bazillions
of inodes.
> > I'd then be able to write a trivial program that would eat inode+blocksize
> > worth of cache for each cached inode, by opening one file on each itable
> > block.
> 
> you already have X overhead per inode cached... yes this would increase
> X but since there is typically more than one inode per block there is
> also sharing as well.  So inode+blocksize is not true.
You skipped over my example too fast.
> > I'd also regret losing the genericity that comes from the read_inode 
(unpack)
> > and update_inode (repack) abstraction.
> 
> so what is write_inode... re-repack?  :)
It's a trivial shell for update_inode:
http://innominate.org/~graichen/projects/lxr/source/fs/ext2/inode.c?v=v2.4#L1146
> > Right now, I don't see any fields in
> > _info that aren't directly copied, but I expect there soon will be.
> 
> i_data[] is copied, and that would be nice to directly access in
> inode->u.ext2_i.i_bh...
Yes, that's the major one, it's 60 bytes, more than the other _info fields 
put together.  However, almost half the itable block data is going to be 
redundant, and the proposal as I understand it is to lock it in cache while 
the inode is in cache.  This makes things worse, not better - it reduces the 
total number of inodes that can be cached.  And that's the best case, 
when *all* the inodes on an itable block are in cache.  Take a look at the 
inode distribution in your directories and see if you think that's likely.
> > An alternative approach: suppose we were to map the itable blocks with
> > smaller-than-blocksize granularity.  We could then fall back to smaller
> > transfers under cache pressure, eliminating much thrashing.
> 
> in ibu fs the entire inode table[1] is accessing via the page cache. 
> ext2 could do this too.  If ext2's per-block-group inode table has
> padding at the end page calculations get a bit more annoying but it's
> still doable.
That's roughly what I had in mind, for starters.
It's worth keeping in mind that tweaking the icache efficiency in this case 
is really just curing a symptom - the underlying problem is a mismatch 
between readdir order and inode order.
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/