Re: [RFC][PATCH] Faster generic_fls

linux@horizon.com
1 May 2003 13:31:10 -0000


The Fount of Holy Pengun Pee expressed himself thus:
> Yeah, except if you want best code generation you should probably use
>
> static inline int fls(int x)
> {
> int bit;
> /* Only well-defined for non-zero */
> asm("bsrl %1,%0":"=r" (bit):"rm" (x));
> return x ? bit : 32;
> }
>
> which should generate (assuming you give "-march=xxxx" to tell gcc to use
> cmov):
>
> movl $32, %eax
> bsrl %edx,%ecx
> testl %edx, %edx
> cmovne %ecx, %eax
>
> which looks pretty optimal.

Except that the testl is unnecessary. The full semantics of
bsrl are:

if (source != 0) {
destination = <index of most significant set bit in source>;
ZF = 0;
} else {
destination = UNDEFINED;
ZF = 1;
}

Thus, there are two reasonable code sequences:

asm(" bsrl %2, %1"
"\n cmovne %1, %0" : "=r" (bit), "=r" (temp) : "rm" (x), "0" (32));

and (if I've got the earlyclobber syntax right):

asm(" bsrl %1, %0"
"\n cmoveq %2, %0" : "=&r,&r" (bit) : "0,rm" (x), "rm,rm" (32));

Note that in this latter case, I have to list %1 == %0 as an explicit
alternative, because otherwise the & on operand 0 would tell GCC to forbid
that combination and it's only %0 == %2 that's forbidden.

(The first alternative is listed first because it uses fewer registers
and so is preferred, all other things being equal.)

Unfortunately, I'm not sure how to let GCC choose between these two
alternatives...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/