Re: [rfc] Near-constant time directory index for Ext2

Chris Mason (mason@suse.com)
Thu, 22 Feb 2001 11:33:39 -0500


On Wednesday, February 21, 2001 07:30:47 PM -0800 Linus Torvalds
<torvalds@transmeta.com> wrote:
> On Thu, 22 Feb 2001, Daniel Phillips wrote:
>>
>
> I'd love to hear the results from R5, as that seems to be the reiserfs
> favourite, and I'm trying it out in 2.4.2 because it was so easy to plug
> in..

Quick details, since I don't think I've seen them on l-k yet. r5 was
chosen because it is more tuned to the reiserfs disk format. The location
of a directory item on disk is determined by the hash of the name, and r5
is designed to put similar names close to each other on disk.

The benchmark that shows this best is creating X number of files in a
single dir (named 0001, 0002, 0003 etc). r5 greating increases the chances
the directory item for 00006 will be right next to the item for 00007. If
the application accesses these files in the same order they were created,
this has benefits at other times than just creation. The benchmarks Ed
posted give a general idea for other naming patterns, but this one is best
case:

Time to create 100,000 files (10 bytes each) with r5 hash: 48s
Time to create 100,000 files (10 bytes each) with tea: 3m58s

The percentage increase just gets bigger as you create more and more files.
That doesn't mean this is a real world case, but it is what the hash was
designed for.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/