Re: [BENCHMARK] Corrected gcc3.2 v gcc2.95.3 contest results

Jan Hudec (bulb@ucw.cz)
Tue, 24 Sep 2002 11:34:02 +0200


On Mon, Sep 23, 2002 at 08:01:20PM -0700, Andrew Morton wrote:
> Con Kolivas wrote:
> >
> > ...
> > n=5 for number of samples
> >
> > Kernel Mean CI(95%)
> > 2.5.38 411 344-477
> > 2.5.39-gcc32 371 224-519
> > 2.5.38-mm2 95 84-105
> >
> > The mean is a simple average of the results, and the CI(95%) are the 95%
> > confidence intervals the mean lies between those numbers. These numbers seem to
> > be the most useful for comparison.
> >
> > Comparing 2.5.38(gcc2.95.3) with 2.5.38(gcc3.2) there is NO significant
> > difference (p 0.56)
> >
> > Comparing 2.5.38 with 2.5.38-mm2 there is a significant diffence (p<0.001)
> > [SNIP...]
> >
> > when I've run dozens of tests previously on the same kernel I've found that even
> > with a mean of 400 rarely a value of 80 will come up. Clearly this lowest score
> > does not give us the information we need.
> >
>
> I think this is really going way too far. I mean, the datum which
> we take away from the above result is that 2.5.38 sucks. No more
> accuracy is required.

5 samples is about the minimum where you can compute the confidence
intervals and trust them to reasonable extent. We see, that it's enough
samples to compare to the -mm2. But we can't say anything about gcc32
compiled version.

> Yes, if the differences are small then a few extra runs may be needed
> to drill down into the finer margins. The tester should be able to
> judge that during the test. You get a feel for these things.
>
> I believe that your time would be better spent developing and incorporating
> more tests (wider coverage) than worrying about super-high accuracy.

Accuracy is increased by just geting more samples. More tests are of
course important, but each must be run enough times so the results are
statisticaly significant.

> (And if there's more than a 1% variation between same kernel, compiled
> with different compilers then the test is bust. Kernel CPU time is
> dominated by cache misses and runtime is dominated by IO wait.
> Quality of code generation is of tiny significance)

So we will need a lot runs to see if there is a difference...

-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/