Re: 7.52 second kernel compile

Cort Dougan (cort@fsmlabs.com)
Mon, 18 Mar 2002 15:36:37 -0700


Here's the modified for PPC version and the results.

The cycle timer in this case is about 16.6MHz.

# ./foo
92.01: 1
7.98: 2
# ./foo
3.71: 0
92.30: 1
3.99: 2
# ./foo
92.01: 1
7.97: 2
# ./foo
92.01: 1
7.97: 2
# ./foo
3.71: 0
92.30: 1
3.99: 2
# ./foo
3.71: 0
92.30: 1
3.99: 2

#include <stdlib.h>

#if defined(__powerpc__)
#define rdtsc(low) \
__asm__ __volatile__ ("mftb %0": "=r" (low))
#else
#define rdtsc(low) \
__asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
#endif

#define MAXTIMES 1000
#define BUFSIZE (128*1024*1024)
#define access(x) (*(volatile unsigned int *)&(x))

int main()
{
unsigned int i, j;
static int times[MAXTIMES];
char *buffer = malloc(BUFSIZE);

for (i = 0; i < BUFSIZE; i += 4096)
access(buffer[i]);
for (i = 0; i < MAXTIMES; i++)
times[i] = 0;
for (j = 0; j < 100; j++) {
for (i = 0; i < BUFSIZE ; i+= 4096) {
unsigned long start, end;

rdtsc(start);
access(buffer[i]);
rdtsc(end);
end -= start;
if (end >= MAXTIMES)
end = MAXTIMES-1;
times[end]++;
}
}
for (i = 0; i < MAXTIMES; i++) {
int count = times[i];
double percent = (double)count / (BUFSIZE/4096);
if (percent < 1)
continue;
printf("%7.2f: %d\n", percent, i);
}
return v0;
}

} Btw, here's a program that does a simple histogram of TLB miss cost, and
} shows the interesting pattern on intel I was talking about: every 8th miss
} is most costly, apparently because Intel pre-fetches 8 TLB entries at a
} time.
}
} So on a PII core, you'll see something like
}
} 87.50: 36
} 12.39: 40
}
} ie 87.5% (exactly 7/8) of the TLB misses take 36 cycles, while 12.4% (ie
} 1/8) takes 40 cycles (and I assuem that the extra 4 cycles is due to
} actually loading the thing from the data cache).
}
} Yeah, my program might be buggy, so take the numbers with a pinch of salt.
} But it's interesting to see how on an athlon the numbers are
}
} 3.17: 59
} 34.94: 62
} 4.71: 85
} 54.83: 88
}
} ie roughly 60% take 85-90 cycles, and 40% take ~60 cycles. I don't know
} where that pattern would come from..
}
} What are the ppc numbers like (after modifying the rdtsc implementation,
} of course)? I suspect you'll get a less clear distribution depending on
} whether the hash lookup ends up hitting in the primary or secondary hash,
} and where in the list it hits, but..
}
} Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/