With gcc 3.x i get
495MB/s with -O3 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops=4
-falign-functions=4
488MB/s with -O3 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops=4
467MB/s with -O0 -march=i686 -mcpu=i686
which is almost a 30MB/s difference or 6% simply from compiler options
of the same compiler. It may not mean much in 1 second. But few things
where we care about performance are only run for one second.
I'd expect something below 3% and realistically closer to 1%. Any ideas
as to why it is making a difference? Does the execution path to the
function in C really take up performance to drop 30MB/s of memory
bandwidth because from the looks of it this program is very small and
things should be really quick to the asm functions.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/