Re: gcc 2.95 vs 3.21 performance

Helge Hafting (helgehaf@aitel.hist.no)
Tue, 04 Feb 2003 14:11:56 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Mikael Pettersson: "Re: two x86_64 fixes for 2.4.21-pre3"
Previous message: Tim Schmielau: "Re: [PATCH *] use 64 bit jiffies"
Maybe in reply to: Martin J. Bligh: "gcc 2.95 vs 3.21 performance"
Next in thread: Linus Torvalds: "Re: gcc 2.95 vs 3.21 performance"

Padraig@Linux.ie wrote:
[...]
> Interesting. I just noticed that I get 50% decrease in
> the speed of my program if I just insert a printf(). I.E.
> my program is like:
>
> printf()
> for(;;) {
> do_sorting_loop_test();
> }
>
> If I remove the initial printf it doubles in speed?
> I assume this is some weird caching thing?

Looks like a cacheline alignment issue to me.
This loop of yours occupy x cachelines on your cpu,
moving it in memory by adding the printf
might cause it to ocupy x+1 cachelines.
That might be noticeable if x is a really small number,
such as 1.

> gcc is 3.2.1 (same happens for 2.95..)
>
> <boggle>
> Note this is with -O3. If I don't specify -O then
> leaving the printf in speeds things up by about 15%
> </boggle>

Sure - going from -O3 to -O changes code generation so
your loop code hits the cachelines differently.
In this case the printf moved the loop into
better alignment.

My advice is to put your test loop in a function of its own,
and do the printing in the function that calls it.
functions are always aligned the same (good) way so
that calling them will be fast.

You can tune the speed of your inner loop by experimenting
with the insertion of one or more NOP asms in front
of the loop. Just be aware that all such tuning is wasted once
you change anything at all in that function - you'll have to
re-do the tuning each time.

The compiler should ideally align the loops for maximum performance.
That can be hard though, considering all the different processors
that might run your program. And aligning everything optimally
could waste a _lot_ of code space - so do this only for
small loops with lots of iterations.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mikael Pettersson: "Re: two x86_64 fixes for 2.4.21-pre3"
Previous message: Tim Schmielau: "Re: [PATCH *] use 64 bit jiffies"
Maybe in reply to: Martin J. Bligh: "gcc 2.95 vs 3.21 performance"
Next in thread: Linus Torvalds: "Re: gcc 2.95 vs 3.21 performance"