if (input & INPUT_FLAG_FOO)
output |= OUTPUT_FLAG_FOO;
if (input & INPUT_FLAG_BAR)
output |= OUTPUT_FLAG_BAR
if (input & INPUT_FLAG_BAZ)
output |= OUTPUT_FLAG_BAZ
etc.
The GCC output from this on the x86 is full of (slow) jumps.
Also, jumps inhibit optimization.
More efficient code, which can also be optimized around more,
is produced by
output |= ((input / INPUT_FLAG_FOO) & 1) * OUTPUT_FLAG_FOO;
output |= ((input / INPUT_FLAG_BAR) & 1) * OUTPUT_FLAG_BAR;
output |= ((input / INPUT_FLAG_BAZ) & 1) * OUTPUT_FLAG_BAZ;
Withot having to define bit-shift amounts, GCC nicely optimizes that
into, e.g.
movl %edx,%eax
sall $17,%eax
andl $16777216,%eax
orl %eax,%ecx
Might such a copy_bit macro be a useful general kernel utility?
I know Linus hates slow code, and a jump is slower than just about
anything.
-- -Colin