Re: [rfc] Near-constant time directory index for Ext2

Daniel Phillips (phillips@innominate.de)
Thu, 22 Feb 2001 02:22:32 +0100


"H. Peter Anvin" wrote:
>
> Daniel Phillips wrote:
> >
> > Have you looked at the structure and algorithms I'm using? I would not
> > call this a hash table, nor is it a btree. It's a 'hash-keyed
> > uniform-depth tree'. It never needs to be rehashed (though it might be
> > worthwhile compacting it at some point). It also never needs to be
> > rebalanced - it's only two levels deep for up to 50 million files.
>
> I'm curious how you do that. It seems each level would have to be 64K
> large in order to do that, with a minimum disk space consumption of 128K
> for a directory. That seems extremely painful *except* in the case of
> hysterically large directories, which tend to be the exception even on
> filesystems where they occur.

Easy, with average dirent reclen of 16 bytes each directory leaf block
can holds up to 256 entries. Each index block indexes 512 directory
blocks and the root indexes 511 index blocks. Assuming the leaves are
on average 75% full this gives:

(4096 / 16) * 512 * 511 * .75 = 50,233,344

I practice I'm getting a little more than 90,000 entries indexed by a
*single* index block (the root) so I'm not just making this up.

> I think I'd rather take the extra complexity and rebalancing cost of a
> B-tree.

Do you still think so?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/