Re: [rfc] Near-constant time directory index for Ext2

Ralph Loader (suckfish@ihug.co.nz)
24 Feb 2001 18:34:47 +1300


Andries,

> > int hash_fn (char * p)
> > {
> > int hash = 0;
> > while (*p) {
> > hash = hash + *p;
> > // Rotate a 31 bit field 7 bits:
> > hash = ((hash << 7) | (hash >> 24)) & 0x7fffffff;
> > }
> > return hash;
> > }

>

> Hmm. This one goes in the "catastrophic" category.

>
> For actual names:
>
> N=557398, m=51 sz=2048, min 82, max 4002, av 272.17, stddev 45122.99
>
> For generated names:
>
> N=557398, m=51 sz=2048, min 0, max 44800, av 272.17, stddev 10208445.83
>

Instead of masking the hash value down to 11 bits you could try:

index = (hash ^ (hash >> 11) ^ (hash >> 22)) & 0x7ff;

I ran a quick test which gave fairly good results with that: 12871
identifiers
from a source tree) gave a mean square bucket size of 45.65, expected
value for a random function is 45.78.

That change might improve some of your other hashes as well - there
doesn't
seem to be much point in computing a 32 bit value only to throw 20 bits
away -
stirring in the extra bits makes much more sense to me.

> > The rotate is equivalent to a multiplication by x**7 in Z_2[P=0],

> > where P is the polynomial x**31 - 1 (over Z_2).
> > Presumably the "best" P would be irreducible - but that would have more
> > bits set in the polynomial, making reduction harder. A compromise is to
> > choose P in the form x**N - 1 but with relatively few factors.
> > X**31 - 1 is such a P.
>
> It has seven irreducible factors. Hardly "almost irreducible".

I didn't say it was. "almost irreducible" polynomials with Hamming
weight two are pretty rare... Relative to say, x**32 - 1 or x**24 - 1,
having 7 factors is good.

Ralph.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/