Re: [CFT] faster athlon/duron memory copy implementation

Ed Sweetman (ed.sweetman@wmich.edu)
Thu, 24 Oct 2002 16:31:31 -0400


Robert Love wrote:
> On Thu, 2002-10-24 at 16:09, Ed Sweetman wrote:
>
>
>>I seem to be seeing compiler optimizations come into play with the
>>numbers and not any mention of them that i've seen has been talked
>>about. That could be causing any discrepencies with predicted values. So
>>not only would we have to look at algorithms, but also the compilers and
>>what optimizations we plan on using them with. Some do better on
>>certain compilers+flags than others. It's a mixmatch that seems to only
>>get complicated the more realistic you make it.
>
>
> The majority of the program is inline assembly so I do not think
> compiler is playing a huge role here.
>
> Regardless, the numbers are all pretty uniform in saying the new no
> prefetch method is superior so its a mute point.
>
> Robert Love

With gcc 3.x i get

495MB/s with -O3 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops=4
-falign-functions=4

488MB/s with -O3 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops=4

467MB/s with -O0 -march=i686 -mcpu=i686

which is almost a 30MB/s difference or 6% simply from compiler options
of the same compiler. It may not mean much in 1 second. But few things
where we care about performance are only run for one second.

I'd expect something below 3% and realistically closer to 1%. Any ideas
as to why it is making a difference? Does the execution path to the
function in C really take up performance to drop 30MB/s of memory
bandwidth because from the looks of it this program is very small and
things should be really quick to the asm functions.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/