Re: [RFC][PATCH] Faster generic_fls

Daniel Phillips (dphillips@sistina.com)
Wed, 30 Apr 2003 21:15:33 +0200


On Wednesday 30 April 2003 16:11, Linus Torvalds wrote:
> On 30 Apr 2003, Falk Hueffner wrote:
> > gcc 3.4 will have a __builtin_ctz function which can be used for this.
> > It will emit special instructions on CPUs that support it (i386, Alpha
> > EV67), and use a lookup table on others, which is very boring, but
> > also faster.
>
> Classic mistake. Lookup tables are only faster in benchmarks, they are
> almost always slower in real life. You only need to miss in the cache
> _once_ on the lookup to lose all the time you won on the previous one
> hundred calls.
>
> "Small and simple" is almost always better than the alternatives. I
> suspect that's one reason why older versions of gcc often generate code
> that actually runs faster than newer versions: the newer versions _look_
> like they do a better job, but..

I agree that this one lies in a gray area, being linearly faster (PIV
notwithwstanding) but bigger too.

In the dawn of time, before God gave us Cache, my version would have been the
fastest, because it executes the fewest instructions. In the misty future,
as cache continues to scale and processors sprout more parallel execution
units, it will be clearly better once again. For now, it's marginal, and
after all, what's a factor of two between friends?

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/