\reviewtitle{HP Caliper - An Architecture for Performance Analysis Tools} \reviewlabel{hundt00caliper} \reviewauthor{Robert Hundt} Caliper dynamically instruments an executable on-the-fly to help build of tools for performance analysis, coverage analysis, correctness checking, and testing. The two main techniques to monitor running programs are binary instrumentation and statistical sampling. Sampling has become more error prone as pipelines have become deeper. Instrumentation, however, is considered to be intrusive because it may change a program's cache and paging behavior. Caliper integrates IA-64's performance measurement unit (PMU) with dynamic instrumentation. A tool using Caliper runs as a "Developer Tool Process" and talks with the Application Process through the system's debug interface (\eg ttrace or /proc). Most of Caliper lives in a shared library. It is split into four parts: measurement (\eg basic block coverage), events (\eg process creation), process (\eg signals), configuration (sets Caliper's parameters), and Context, which allows Caliper to zero in on particular measurements. Caliper is written in a combination of C and Python. Dynamic instrumentation can take two forms: (1) trampolines and (2) inline. Trampolines insert long branches to code that executes the original instruction and the Caliper code (\eg and invocation counter). This doesn't cause changes in the address space of the original application. Inline is trickier because one must change the relative offsets of nearby instructions. The overhead for inline is $35\%$ and for out-of-line $112\%$. Caliper appears to usually favor the in-line approach. Caliper always works at the granularity of functions. Late instrumentation is called "lazy." Instrumentation does not work with dynamically generated code.