You're right...if you define the problem as exact accounting,
statisical approaches won't cut it.
However, for the purposes of top and ps (the subject of the original
message :-), exact accounting seems like overkill. After all, top
only updates the screen every 5 seconds (!) and ps has process startup
overhead, so what's 100ms error? The requirement for top/ps is that
there not be awful degenerate cases of consistent error. The McCanne
and Torek paper shows a common class of awful degenerate cases and how
to fix them (by avoiding aliasing that comes from doing profiling
stats and other things off the same timer).
Of course, if we can get exact accounting for free, then it's good for
top/ps. But LTT and even TSC introduces overead proportional to the
number of "events" (context switches, irqs, whatever depending on how
exact you want to be), rather than a fixed background overhead. It's
not obvious to me why that overhead is warrented for top/ps. Even if
it's <2%, look at how carefully optimized the rest of the linux
context switch is.
Don't get me wrong... tracing infrastructure is *really useful* when
doing performance debugging. I used kitrace a lot for this (see the
Software Pratices and Experience paper---you can put tracepoints
anywhere at runtime, but you do so in assembly :-(). LTT seems like a
nice complementary piece of work and it would be useful infrastructure
to have, but it looks to me like a debugging tool and not what should
required to make top/ps work.
-John Heidemann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/