A skewed hash doesn't hurt the fixed-depth tree in the obvious way - it
just tends to leave a lot of half-full buckets around, which wastes
about 1/3 of the leaf space. The reason for this behaviour is, when
you create a lot of sequentially-related names a skewed hash will tend
to dump a lot them into a tiny sliver of the hash space, and after
splitting the little sliver it's quite unlikely that later entries will
hit the same small range. The only good protection against this is to
make the hash function vary wildly if even one bit in the string
changes. This is what crc does, and that's why I'm interested in it.
I've rehabilitated the hack_show_dir code, which is enabled by
#defining DX_HACK. To kprint a dump of hash buckets and statistics do:
This dump is extremely helpful in judging the effectiveness of hash
functions, just take a look at how the range of hash values that gets
mapped into each leaf block. Ideally, there should not be too much
The format of the dump is:
bucketnumber:blocknumber hashstart/range (entrycount)
Yusuf Goolamabbas sent me a pointer to some new work on hash functions:
I coded up the fnv_hash and included it in today's patch - there is a
define to select which to use; dx_hack_hash is still the default.
fnv_hash is only a little wose than dx_hack_hash, which still produces
the most uniform distribution of all the hash functions I've tested.
But I can see from the dumps that it's still not optimal, it's just
that all the others are worse.
I still have quite a few leads to follow up on the hashing question.
Next week I hope I'll get time to try crc32 as a hashing function. I
hope it doesn't win because I'll need a 1K table to evaluate it, and
that would be 20% of the whole code size.
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to firstname.lastname@example.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/