Not at all.  You can sometimes sanely re-organise and optimise your C
code to produce better assembly without affecting its readability too
much.
> Tweaking your code and sacrificing chickens until you happen to get the
> output you want is no substitute for fixing the compiler.
"Fixing the compiler" for ARM would entail getting a degree in compiler
design, and rewriting GCC from scratch.  GCC *really* isn't suited to
ARM CPUs at all.  It sucks.  The fact that earlier GCC versions sucked
less is the really interesting thing here...
GCC 3.1 is a hard compiler to find stuff wrong with (unlike previous
versions - well done gcc people!)  However, here's a good example of a
bit of madvise_fixup_start, built for Xscale with a 5-stage instruction
pipeline with branch prediction.
It's from mm/filemap.c, setup_read_behaviour (inline function inside
madvise_fixup_start).
1. The C code:
        VM_ClearReadHint(vma);
        switch(behavior) {
                case MADV_SEQUENTIAL:
                        vma->vm_flags |= VM_SEQ_READ;
                        break;
                case MADV_RANDOM:
                        vma->vm_flags |= VM_RAND_READ;
                        break;
                default:
                        break;
        }
2. GCC 3.1 output with -O2 on ARM:
        ldr     r3, [r4, #20]
        cmp     r6, #1
        bic     r3, r3, #98304
        str     r7, [r4, #8]
        str     r3, [r4, #20]
        beq     .L803			<=== branch if equal
        cmp     r6, #2
        orreq   r3, r3, #32768
        beq     .L811			<=== branch if equal
.L806:
        ldr     r0, [r4, #56]
...
        b       .L809
.L811:
        str     r3, [r4, #20]
        b       .L806			<=== unconditional branch
.L803:
        orr     r3, r3, #65536
        b       .L811			<=== unconditional branch
   This gives the following instruction path lengths:
	neither:	10 (no branches)
	first:		11 (3 branches)
	second:		12 (2 branches)
   -Os doesn't make much difference.
3. My human-based optimised output for ARM is:
        ldr     r3, [r4, #20]
        cmp     r6, #1
        bic     r3, r3, #98304
        str     r7, [r4, #8]
        orreq   r3, r3, #65536
        beq     .L806			<=== branch if equal
        cmp     r6, #2
        orreq   r3, r3, #32768
.L806:
        str     r3, [r4, #20]
        ldr     r0, [r4, #56]
...
        b       .L809
   This gives the following instruction path lengths:
	neither:	10 (no branches)
	first:		8  (3 branches)
	second:		10 (2 branches)
Any way you look at it, the above has got to be better.
We can probably get something very close to (3) out of gcc by doing
something like:
	unsigned long flags;
	flags = vma->vm_flags & ~(VM_SEQ_READ|VM_RAND_READ);
        switch(behavior) {
                case MADV_SEQUENTIAL:
                        flags |= VM_SEQ_READ;
                        break;
                case MADV_RANDOM:
                        flags |= VM_RAND_READ;
                        break;
                default:
                        break;
        }
	vma->vm_flags = flags;
No gotos required.  So, what about VM_ClearReadHint()?  It turns out
that its only used in once place - setup_read_behaviour().  Now, with
the above, we end up with the following output from the same version
of gcc with -O2:
        ldr     r3, [r4, #20]
        cmp     r6, #1
        bic     r3, r3, #98304
        str     r7, [r4, #8]
        orreq   r3, r3, #65536
        beq     .L801
        cmp     r6, #2
        orreq   r3, r3, #32768
.L801:
        ldr     r0, [r4, #56]
        str     r3, [r4, #20]
All I've shown above is how GCC can miss some optimisations its easy
for a human to see, and there are some simple ways to rewrite C code
to allow GCC to make those optimisations.  But hey - that's nothing
new.
-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/