\reviewtitle{HP Caliper - An Architecture for Performance Analysis Tools}
\reviewlabel{hundt00caliper}
\reviewauthor{Robert Hundt}

Caliper dynamically instruments an executable on-the-fly to help build
of tools for performance analysis, coverage analysis, correctness
checking, and testing.  The two main techniques to monitor running
programs are binary instrumentation and statistical sampling.
Sampling has become more error prone as pipelines have become
deeper.  Instrumentation, however, is considered to be intrusive
because it may change a program's cache and paging behavior.  Caliper
integrates IA-64's performance measurement unit (PMU) with dynamic
instrumentation.

A tool using Caliper runs as a "Developer Tool Process" and talks
with the Application Process through the system's debug interface
(\eg ttrace or /proc).  Most of Caliper lives in a shared library.
It is split into four parts: measurement (\eg basic block coverage),
events (\eg process creation), process (\eg signals), configuration
(sets Caliper's parameters), and Context, which allows Caliper to
zero in on particular measurements.

Caliper is written in a combination of C and Python.

Dynamic instrumentation can take two forms: (1) trampolines and (2)
inline.  Trampolines insert long branches to code that executes the
original instruction and the Caliper code (\eg and invocation
counter).  This doesn't cause changes in the address space of the
original application.  Inline is trickier because one must change the
relative offsets of nearby instructions.  The overhead for inline is
$35\%$ and for out-of-line $112\%$.  Caliper appears to usually favor
the in-line approach.  Caliper always works at the granularity of
functions.  Late instrumentation is called "lazy."  Instrumentation
does not work with dynamically generated code.