Re: top/ps incorrectly reporting process execution times

Karim Yaghmour (karym@opersys.com)
Mon, 26 Jun 2000 22:33:22 -0400


Hello John,

I've taken a look at the paper. Randomizing is an interesting approach,
but I must confess that, in my opinion, it leaves the problem intact.

Even though it does correct to certain extent the results displayed in
/proc, it leaves a lot of other statistics out. For instance, as was
pointed out by someone during the Linux BOF, if no process is currently
scheduled, Linux is unable to report the time spent servicing interrupts.
Therefore, the machine could be completely saturated and /proc would
still report that the system is idle!

LTT solves this problem. Since we are notified of the entry/exit of
syscalls/traps/irqs, it is possible to identify the __exact__ amount
of time spent in the system for a process or for servicing interrupts.
Therefore, bottlenecks become immediatly apparent. No statistics, only
exact behavior is reported.

Of course, the randomizing technique could be extended to sample the
instruction pointer to check if we are in an ISR (interrupt service routine).
But this would leave other blind spots.

The technique used by LTT is fairly simple; there should be a way to
hook onto main kernel entry/exit points. These hooks look like:

TRACE_IRQ_ENTRY(irq, !(user_mode(regs)));

This macro resolves to (if the kernel was configured to used tracing,
otherwise, the macro amounts to no code):
#define TRACE_IRQ_ENTRY(ID, KERNEL) \
do \
{\
trace_irq_entry irq_entry;\
irq_entry.irq_id = ID;\
irq_entry.kernel = KERNEL;\
trace_event(TRACE_EV_IRQ_ENTRY, &irq_entry);\
} while(0);

I've seen this type of macro being used at different places in the
kernel, this is why it is used in this way.

Would this have been possible to automatically generate by the
compiler, I would have done it this way, but some of these hooks
would be impossible to generate by compiler. Take for instance:

TRACE_SCHEDCHANGE(prev->pid, next->pid, prev->state);

I don't see how the compiler could be told to put this in the
right place.

To sum up, I would say that the fix is there. It would just need
to be included in the sources. As I've said during my talk, LTT
gives it's user a unique view over Linux's behavior. Not only
regarding system performance, but also as to what pertains to
system behavior.

Had LTT been around when the Mindcraft stuff came out, the thundering
hurd problem would have been fairly easy to observe in the system
traces.

Moreover, have you ever tried to debug a synchronization problem
with gdb ... ;)

It's close to impossible. You have to go through a step by step
analysis of your programs' dynamics to find out what is happening.
LTT would provide you with the rundown of this dynamic behavior
without modifying your program's behavior.

Given the opportunities LTT makes available for performance analysis
and behavioral understanding, I think this would be a very good
addition to Linux.

As for the McCanne/Torek solution, it's reach is limited compared
to the possibilities given by LTT.

p.s.: LTT can also be used to implement advanced security
auditing mechanisms into Linux whereby a program executing a certain
sequence of forbidden events would trigger an alert or reaction.
Using arcane techniques to find buffer-overflows wouldn't be
necessary with such a type of reporting.
p.p.s.:The techniques used by LTT are used in other Unicies to do
similar stuff. This, by itself, doesn't justify LTT's stuff being
included in the kernel, but it does go to show usefulness.

John Heidemann wrote:
>
> The question of why top/ps incorrectly reporting process execution
> times came up in Karim and Michel's presentation at Usenix, and then
> again during the Linux BOF (with a question from a U. Washington
> person whose name I didn't get).
>
> Although Linus briefly summarized the problem, I thought Linux folks
> would be interested in a complete description and suggested solution.
> The problem, exploit programs that demonstrate it, and a fix were all
> described by McCanne and Torek in '93 (ironically, at *Usenix*, see
> below for complete reference). The source of the problem is (as Linus
> briefly described at the BOF) due to timer-driven programs interacting
> with the profiling timers; McCanne and Torek provide details.
>
> Seems like a good opportunity for some Linux hacker to implement a fix
> based on prior work.
>
> -John Heidemann
>
> [McCanne93b]
> Steve McCanne and Chris Torek.
> A Randomized Sampling Clock for CPU Utilization Estimation and Code Profiling.
> In USENIX Conference Proceedings, San Diego, CA, USENIX.
> January, 1993.
> <ftp://ftp.ee.lbl.gov/papers/statclk-usenix93.ps.Z>.
>

-- 
===================================================
                 Karim Yaghmour
               karym@opersys.com
          Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===================================================

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/