Re: [rfc] Near-constant time directory index for Ext2

Davide Libenzi (davidel@xmailserver.org)
Wed, 21 Feb 2001 15:34:38 -0800 (PST)


On 21-Feb-2001 Linus Torvalds wrote:
> In article <20010221023515.6DF8E18C99@oscar.casa.dyndns.org>,
> Ed Tomlinson <tomlins@cam.org> wrote:
>>
>>The default in reiserfs is now the R5 hash, but you are right that lots of
>>efforts went
>>into finding this hash. This includes testing various hashes on real
>>directory
>>structures to see which one worked best. R5 won.
>
> That's interesting. The R5 hash is easily also the only one of the
> reiser hashes that might be useable for the generic VFS hashing. It's
> not so different in spirit from the current one, and if you've done the
> work to test it, it's bound to be a lot better.
>
> (The current VFS name hash is probably _really_ stupid - I think it's
> still my original one, and nobody probably ever even tried to run it
> through any testing. For example, I bet that using a shift factor of 4
> is really bad, because it evenly divides a byte, which together with the
> xor means that you can really easily generate trivial bad cases).
>
> What did you use for a test-case? Real-life directory contents? Did you
> do any worst-case analysis too?

Yep, 4 is not good as a shifting factor. Prime number are the better choice for
this stuff.
The issue to have a good distribution is not only to have a good hashing
function, but also to give this function not correlated data.
Good hashing function for a Domain A may not be so good for a Domain B.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/