RE: fast_copy_page sections?!

Karl Vogel (karl.vogel@seagha.com)
Fri, 20 Jun 2003 11:27:48 +0200


> On Fri, 2003-06-20 at 10:38, Karl Vogel wrote:
> > In arch/i386/lib/mmx.c there is a function fast_copy_page()
> using some
> > crafty asm statements. However I don't quite understand the
> part with the
> > fixup sections?!
>
> basically, if your cpu doesn't do prefetch, it will fault. The section
> stuff adds an exception handler for this fault that zeroes out the
> prefetch code so that next time there no longer is a prefetch() there.

Oh I see, but the code is wrapped in an #ifdef CONFIG_MK7. Don't all
Athlon's have the prefetch instruction?!

Why wasn't your no_prefetch code from your fast.c test program used?
Since it doesn't need the fault handler and it is faster anyway.

I did some testing and was able to save some more cycles (mind you,
not a whole lot :)

$ sudo nice -n -20 ./athlon
Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $

copy_page() tests
copy_page function 'warm up run' took 18241 cycles per page
copy_page function '2.4 non MMX' took 20254 cycles per page
copy_page function '2.4 MMX fallback' took 20194 cycles per page
copy_page function '2.4 MMX version' took 18236 cycles per page
copy_page function 'faster_copy' took 10954 cycles per page
copy_page function 'even_faster' took 10777 cycles per page
copy_page function 'no_prefetch' took 6631 cycles per page
copy_page function 'no_prefetch2' took 6586 cycles per page

no_prefetch2 is using:

for (i=0;i<=4096-256;i+=256)
__asm__ __volatile(
"movl 192(%1,%2),%0\n"
"movl 128(%1,%2),%0\n"
"movl 64(%1,%2),%0\n"
"movl 0(%1,%2),%0\n"
: "=&r" (d1)
: "r" (from), "r" (i));
for(i=0; i<4096/64; i++) {
__asm__ __volatile__ (
" movq (%0), %%mm0\n"
" movq 8(%0), %%mm1\n"
" movq 16(%0), %%mm2\n"
" movq 24(%0), %%mm3\n"
" movq 32(%0), %%mm4\n"
" movq 40(%0), %%mm5\n"
" movq 48(%0), %%mm6\n"
" movq 56(%0), %%mm7\n"
" movntq %%mm0, (%1)\n"
" movntq %%mm1, 8(%1)\n"
" movntq %%mm2, 16(%1)\n"
" movntq %%mm3, 24(%1)\n"
" movntq %%mm4, 32(%1)\n"
" movntq %%mm5, 40(%1)\n"
" movntq %%mm6, 48(%1)\n"
" movntq %%mm7, 56(%1)\n"
: : "r" (from), "r" (to) : "memory");
from+=64;
to+=64;
}

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) XP 1700+
stepping : 2
cpu MHz : 1465.996
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 2922.90
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/