Re: [rfc] Near-constant time directory index for Ext2

Ralph Loader (suckfish@ihug.co.nz)
24 Feb 2001 10:43:16 +1300


Hi,

I while ago I did some experimentation with simple bit-op based string
hash
functions. I.e., no multiplications / divides in the hash loop.

The best I found was:

int hash_fn (char * p)
{
int hash = 0;
while (*p) {
hash = hash + *p;
// Rotate a 31 bit field 7 bits:
hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
}
return hash;
}

[I haven't kept my test program / data set - if anyone compares the
above
to the others functions mentioned on the list, let me know.]

The 31 and 7 were determined experimentally. But the 31 has a
theoretical
explanation (which might even be valid):

The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],
where P is the polynomial x**31 - 1 (over Z_2).
Presumably the "best" P would be irreducible - but that would have more
bits set in the polynomial, making reduction harder. A compromise is to
choose P in the form x**N - 1 but with relatively few factors.
X**31 - 1 is such a P.

Also, a 32 bit rotate (modulo X**32 - 1, which is equal
to (X - 1) ** 32 over Z_2), came out pretty badly.

One thing that shouldn't be forgotten about hashing for hash tables
is that you have to reduce the hash value to the required range - doing
that well greatly reduces the difference between various hash functions.

Ralph.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/