Re: [RFC][PATCH] Faster generic_fls

Linus Torvalds (torvalds@transmeta.com)
Thu, 1 May 2003 04:40:58 +0000 (UTC)


In article <p73ade730d1.fsf@oldwotan.suse.de>, Andi Kleen <ak@suse.de> wrote:
>Linus Torvalds <torvalds@transmeta.com> writes:
>
>> Yeah, except if you want best code generation you should probably use
>>
>> static inline int fls(int x)
>> {
>> int bit;
>> /* Only well-defined for non-zero */
>> asm("bsrl %1,%0":"=r" (bit):"rm" (x));
>
>"g" should work for the second operand too and it'll give gcc
>slightly more choices with possibly better code.

"g" allows immediates, which is _not_ correct.

>But the __builtin is probably preferable if gcc supports it because
>a builtin can be scheduled, inline assembly can't.

You're over-estimating gcc builtins, and underestimating inline
assembly.

gcc builtins usually generate _worse_ code than well-written inline
assembly, since builtins try to be generic (at least the ones that are
cross-architecture).

And inline asms schedule as well as any built-in, unless they are marked
volatile (either explicitly or implicitly by not having any outputs).

And the proof is in the pudding: I'll bet you a dollar my code generates
better code. AND my code works on the intel compiler _and_ with older
gcc's.

The fact is, gcc builtins are almost never worth it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/