Many computations exhibit an inherent trade off between accuracy, performance, and power consumption - if the client is willing to accept a slightly less accurate result, it is often possible to adapt the computation to deliver that result while consuming substantially less power or compute time. But most implementations of these computations operate at only one point in this underlying tradeoff space. In many cases the implementer of the computation is not even aware that this tradeoff space exists and that the computation has the ability to operate successfully at a variety of points in this space. The field of approximate computing investigates techniques for discovering and exploiting these tradeoffs.
My research group developed many of the foundational techniques and systems in this area, including rich program transformations for generating efficient and accurate approximate program execution, techniques for reasoning about the performance and accuracy of approximate computations, and approximate programming languages that enable developers to specify and verify approximate computations. My group also pioneered foundational techniques in approximate parallel computing.
Approximate computing is often enabled by rich program transformations (program transformations that may change the semantics of the program). Rich transformations induce a search space of programs surrounding the original program. An automated search of this space can find programs with desirable accuracy, performance, and power consumption characteristics.
My research group pioneered the first rich transformations, including loop perforation, which increases performance by skipping loop iterations, task skipping, which increases performance and robustness by skipping tasks, dynamic knobs , which enable programs to dynamically switch between different performance and accuracy settings, implementation selection and reduction sampling, which enable programs to execute at a range of points in the accuracy versus performance space, and synchronzation elimination, which identifies acceptable race conditions in parallel programs, then removes synchronization designed to prevent those race conditions.
Loop Perforation: Loop perforation increases performance by automatically transforming loops to skip some of the iterations that they would otherwise perform. Our MIT 2009 Technical Report shows how to use loop perforation to automatically augment computations with the ability to operate successfully at a variety of points in their underlying performance, accuracy, and power consumption trade-off space. It also shows how to use loop perforation to enable computations to dynamically adapt to fluctuations in the amount of delivered computational resources. Such fluctuations can be caused, for example, by dynamic voltage/frequency scaling, the loss of processors, or varying load.
Our ESEC/FSE 2011 paper shows how to use loop perforation to automatically discover and exploit performance vs. accuracy tradeoffs that are inherently present (but, in the absence of loop perforation, hidden) in many applications.
Managing Performance vs. Accuracy Trade-offs With Loop Perforation
Stelios Sidiroglou, Sasa Misailovic, Henry Hoffmann, and Martin Rinard
19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13th European Software Engineering Conference (ESEC-13)
Szeged Hungary, September 2011
Our ICSE 2010 paper shows how to use loop perforation as a profiling technique that helps developers find computations that are amenable to optimizations that trade accuracy in return for increased performance or reduced power consumption.
Quality of Service Profiling
Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin C. Rinard
Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering (ICSE 2010)
Cape Town, South Africa, May 2010
Task Skipping: My ICS 2006 paper presents task skipping, a precursor to loop perforation. Instead of skipping loop iterations, this paper presents a technique that skips tasks in parallel computations. In practice, many of these tasks are simply blocks of iterations of parallel loops.
The paper shows that task skipping can produce a range of computations with different performance and accuracy characteristics. The paper also shows how to automatically derive statistical performance and accuracy models that characterize the effect of task skipping as applied to a specific program. It is then possible to use these models to purposefully discard tasks to improve performance or reduce power consumption while preserving acceptably accurate computation.
Probabilistic Accuracy Bounds for Fault-Tolerant Computations That Discard Tasks
Martin C. Rinard
20th ACM International Conference on Supercomputing
Cairns, Australia, June 2006
Many parallel computations initiate parallel tasks, then execute a barrier that waits until all of the parallel tasks complete. In practice, straggler tasks that take much longer to execute than other tasks can inhibit performance as the computing platform sits mostly idle waiting for the straggler tasks to finish. Straggler tasks can also increase the latency of the computation, producing sluggish or unresponsive systems in an interactive context.
My OOPSLA 2007 paper demonstrates how to use task skipping to eliminate barrier idling caused by straggler tasks, specifically by identifying straggler tasks that can be skipped while preserving acceptable accuracy. The results show that this technique can improve both parallel performance and responsiveness in the face of straggler tasks.
Using Early Phase Termination to Eliminate Load Imbalances at Barrier Synchronization Points
Martin C. Rinard
2007 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications
Montreal, Canada, October 2007
Dynamic Knobs: Programs sometimes come with static configuration parameters that control the performance, power consumption, and accuracy of various algorithms within the program. Our ASPLOS 2011 Paper and our MIT 2010 Technical Report show how to automatically find such parameters, convert them into dynamic knobs, which can be automatically controlled to dynamically move the computation to different points in the performance versus accuracy tradeoff space, then use the dynamic knobs to dynamically adapt the computation to deliver acceptably responsive execution in the face of changes in the performance characteristics of the underlying computational platform.
Dynamic Knobs for Responsive Power-Aware Computation
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2011)
Newport Beach, CA March 2011
It is often desirable to understand and characterize the potential effects of rich program transforms (such as loop perforation or synchronization elimination) on the end to end performance or accuracy of the computation. To this end, my group pioneered techniques for reasoning about approximate computations. These techniques provide models or guarantees that characterize the accuracy and performance effects of various approximate computing techniques as applied to specific computations. We have implemented these techniques in programming systems that enable developers to obtain approximate computations with prectable and desirable performance and accuracy properties.
Our initial research in this area was empirical, using dynamic analysis on executions of the program on representative inputs to derive empirical performance versus accuracy models that characterized the performance and accuracy effects of rich transformations. Our seminal task skipping research, for example, produced empirical statistical models that characterized the performance and accuracy impact of skipping specific percentages of specific tasks.
We next moved on to develop static techniques that analyze the program before it executes to obtain performance and accuracy models with (often probabilistic) guarantees that hold over all program executions.
Probabilistically Accurate Program Transformations: This research uses random variables to model the values that the program manipulates. The static analysis reasons about how these proability distributions propagate through the program in the presence of various rich program transformations (such as loop perforation). The end result is a probabilistic characterization of the accuracy of the result that the transformed program produces. This characterization is guaranteed to be accurate whenever the probability distributions accurately model the inputs.
Implementation Selection and Reduction Sampling: Our POPL 2012 paper shows how to reason probabilistically about two kinds of rich program transformations (implementation selection and reduction sampling) applied to programs consisting of DAGs of function nodes connected into a tree by reduction nodes. This model of computation is similar to the widely used Map/Reduce model. Our analysis may therefore serve as foundation for understanding the effects of performing Map/Reduce computations over only a subset of the specified inputs (which is common practice for computations that operate on large data sets stored across distributed platforms with partial failures).
Randomized Accuracy-Aware Program Transformations for Efficient Approximate Computations
Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, and Martin Rinard
Proceedings of the 39th Annual ACM Symposium on Principles of Programming Languages (POPL 2012)
Philadelphia, Pennsylvania January 2012
Loop Perforation: Our SAS 2011 paper shows how to reason probabilistically about the effect of loop perforation on a range of computational patterns that operate on values drawn from specified probability distributions. The presented results characterize the performance and accuracy impact of loop perforation applied to computations that implement these patterns.
Probabilistically Accurate Program Transformations
Sasa Misailovic, Dan Roy, and Martin Rinard
Static Analysis - 18th International Symposium (SAS 2011)
Venice, Italy, September 2011
Relaxed Programs: It is possible to express many approximate computations as standard programs augmented with control variables. The program can then use nondeterministic assignments to control variables to make the full range of acceptable executions available. We call programs with control variables relaxed programs because they relax traditional rigid program execution semantics. Our PLDI 2012 paper presents programming language mechanisms for identifying and specifying control variables and shows how to reason about the resulting relaxed programs to ensure that they preserve important safety properties despite the nondeterminism.
Proving Acceptability Properties of Relaxed Nondeterministic Approximate Programs
Michael Carbin, Deokhwan Kim, Sasa Misailovic, and Martin Rinard
Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2012)
Beijing, China, June 2012
Rely: Rely is designed to support computations that execute on approximate and/or unreliable hardware systems. The goal is to provide programming support that enables developers to reason about the reliability of computations that execute on such hardware platforms. Rely is a programming language and program analysis system that enables developers to specify reliability requirements (i.e., minimum probabilities with which program components must produce a correct result), which the Rely compiler then checks. The result is a relaxed approximate program with reliability guarantees - the guarantees specify the probability with which the program produces a correct result. Applications include contexts which tolerate some incorrect computations and performance analysis for checkable computations which produce a result that can be checked, then reexecuted if the computation produced an incorrect result.
Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware
(Best Paper Award)
Michael Carbin, Sasa Misailovic, and Martin Rinard
Proceedings of the 2013 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2013)
Indianapolis, Indiana, October 2013
Chisel: Chisel extends Rely to work with combined reliability and accuracy requirements. Starting with a specification of these requirements, the Chisel compiler automatically chooses which operations must execute precisely and which can execute unreliabily/approximately while still satisfying the combined reliability and accuracy requirements. Chisel also determines which variables can be stored in reliable memory and which can be stored in approximate memory. The selection of unreliable/approximate operations and memory is performed to maximize a resource objective such as energy savings.
Chisel: Reliability- and Accuracy-Aware Optimization of Approximate Computational Kernels
(Best Paper Award)
Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin Rinard
Proceedings of the 2014 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2014)
Portland, Oregon, October 2014
Automatic Parallelization with Statistical Accuracy Tests: The QuickStep compiler (see our 2013 paper and 2010 technical report) implements a collection of parallelization transformations (specifically, parallelism introduction transformations, accuracy enhancing transformations, and performance enhancing transformations such as synchronization elimination). Starting with a sequential program, QuickStep automatically applies these transformations to generate a space of candidate parallel programs, using statistical accuracy tests to evaluate the acceptability of each candidate parallel program. QuickStep can automatically generate parallel computations (such as efficient computations that contain acceptably infrequent data races) that are inherently beyond the reach of previously proposed automatic parallelization strategies.
Parallelizing Sequential Programs with Statistical Accuracy Tests
Sasa Misailovic, Deokhwan Kim, and Martin Rinard
ACM Transactions on Embedded Computing Systems
Volume 12, Number 2s (May 2013)
Copyright 2013 by ACM, Inc.
Parallel Approximate Data Structure Construction: Exact parallel data structure construction algorithms often avoid data races by synchronizing data structure updates. In practice, however, it may be possible to acceptably increase performance by eliminating these synchronization operations. Our HotPar 2013 paper presents a parallel synchronization-free data structure construction algorithm and presents results that characterize the impact of the resulting data races on program accuracy.
Parallel Synchronization-Free Approximate Data Structure Construction
Proceedings of the 5th USENIX Workshop on Hot Topics in Parallelism
San Jose, California, June 2013
The full version of this paper is available.