Re: [PATCH] micro-opt DEBUG_ADD_PAGE

Kai Germaschewski (kai@thphy.uni-duesseldorf.de)
Thu, 8 Feb 2001 02:24:59 +0100 (CET)


On Wed, 7 Feb 2001, Linus Torvalds wrote:

> No. The optimization is entirely legal - but the fact that
> "constant_test_bit()" uses a "volatile unsigned int *" is the reason why
> gcc thinks it can't optimize it.

This thing did attract me somewhat and I decided to learn a little about
compilers.

Result: Unfortunately it's not just the volatile, there's a bunch of
conditions you have to fulfill to have the compiler optimize this. (Sounds
like work for the compiler guys).

Test program is attached, inspecting the code (egcs 2.91.66 and
gcc-2.96 (-69) generate the same code gives the following conclusions:

- f1(unsigned long f): manually optimized

if (f & ((1 << 1) | (1 << 2) | (1 << 4))) {

-> optimized code (of course)

- f2(unsigned long f): leave some work to the compiler

if ((f & (1 << 1)) || (f & (1 << 2)) || (f & (1 << 4))) {

-> optimized code (good)

- f3(unsigned int f): use constant_test_bit macro

if (constant_test_bit(1, &f) || constant_test_bit(2, &f) ||
constant_test_bit(4, &f)) {

-> optimized code

where

#define constant_test_bit(nr, addr) \
(((1UL << (nr & 31)) & ((unsigned int*)(addr))[nr >> 5]) != 0)

(doesn't optimize when putting *const* unsigned int there)

- f4: same thing as f3, but use (unsigned long f) instead of
(unsigned int f)

-> no optimization

- f5: same thing as f3, but use inline function for constant_test_bit

-> no optimization

- f6: same thing as f3, but use test_bit instead of constant_test_bit,
where

#define test_bit(nr,addr) \
(__builtin_constant_p(nr) ? \
constant_test_bit((nr),(addr)) : \
variable_test_bit((nr),(addr)))

-> no optimization

Conclusion: With the compilers tested, lots of cases are not optimized
although the could be in theory:
- casting even from unsigned int to unsigned long breaks optimization
- macros are better than inline
- Even though evaluated at compile time, __builtin_constant_p breaks
optimization here, too.

BTW: volatile makes optimization impossible as well, of course, it leads
to repeated reloads of the variable, whereas otherwise it's cached in a
register in the above "no optimization" cases. That's expected behavior.

--Kai

Test code:
----------

#define ADDR (*(volatile long *) addr)

static __inline__ int inl_constant_test_bit(int nr, const void * addr)
{
return ((1UL << (nr & 31)) & (((unsigned int *) addr)[nr >> 5])) != 0;
}

#define constant_test_bit(nr, addr) (((1UL << (nr & 31)) & ((unsigned int*)(addr))[nr >> 5]) != 0)

static __inline__ int variable_test_bit(int nr, volatile void * addr)
{
int oldbit;

__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}

#define test_bit(nr,addr) \
(__builtin_constant_p(nr) ? \
constant_test_bit((nr),(addr)) : \
variable_test_bit((nr),(addr)))

int f1(unsigned long f)
{
if (f & ((1 << 1) | (1 << 2) | (1 << 4))) {
return 1;
}
return 0;
}

int f2(unsigned long f)
{
if ((f & (1 << 1)) || (f & (1 << 2)) || (f & (1 << 4))) {
return 1;
}
return 0;
}

int f3(unsigned int f)
{
if (constant_test_bit(1, &f) || constant_test_bit(2, &f) || constant_test_bit(4, &f)) {
return 1;
}
return 0;
}

int f4(unsigned long f)
{
if (constant_test_bit(1, &f) || constant_test_bit(2, &f) || constant_test_bit(4, &f)) {
return 1;
}
return 0;
}

int f5(unsigned int f)
{
if (inl_constant_test_bit(1, &f) || inl_constant_test_bit(2, &f) || inl_constant_test_bit(4, &f)) {
return 1;
}
return 0;
}

int f6(unsigned int f)
{
if (test_bit(1, &f) || test_bit(2, &f) || test_bit(4, &f)) {
return 1;
}
return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/