# Research

Much of my current research focuses on two areas. First, how to build secure, robust systems. I have several projects in this area. Second, tradeoffs between the resources (time, power, etc.) that a computation consumes and the accuracy of the result that the computation produces.
• ### Reasoning About Approximate Computations

Many computations have the freedom to, within reasonable accuracy bounds, produce a range of results. As described further below, it is often possible to exploit this freedom to improve performance, reduce resource consumption, and give systems the ability to execute at a variety of points in an underlying accuracy versus performance tradeoff space. We have developed several techniques for reasoning about approximate computations and transformations for approximate computations.

• Probabilistically Accurate Program Transformations: Our initial exploration of this topic was empirical, driven by observing executions of transformed on sample inputs. Motivated by the empirical success of this approach, we are developing a theory of probabilistically accurate program transformations.

Our basic approach uses random variables to model the values that the program manipulates. The static analysis reasons about how these proability distributions propagate through the program in the presence of various rich program transformations (such as loop perforation). The end result is a probabilistic characterization of the accuracy of the result that the transformed program produces. This characterization is guaranteed to be accurate whenever the probability distributions accurately model the inputs.

Our SAS 2011 paper shows how to apply this approach to a range of computational patterns. Our POPL 2012 paper shows how to reason probabilistically about two kinds of transformations (implementation selection and reduction sampling) applied to programs consisting of DAGs of function nodes connected into a tree by reduction nodes. This model of computation is similar to the widely used Map/Reduce model. Our analysis may therefore serve as foundation for understanding the effects of performing Map/Reduce computations over only a subset of the specified inputs (which is common practice for computations that operate on large data sets stored across distributed platforms with partial failures).

• Reasoning About Relaxed Programs: It is possible to express many approximate computations as standard programs augmented with control variables. The program can then use nondeterministic assignments to control variables to make the full span of acceptable executions available. Our PLDI 2012 paper shows how to reason about such programs to ensure that they preserve important safety properties despite the nondeterminism.
• ### Performance, Power, and Accuracy Trade Offs

Many computations have an inherent trade off between accuracy, performance, and power - if the client of the computation is willing to accept a slightly less accurate result, it is often possible to adapt the computation to deliver that result while consuming substantially less power or compute time. But most implementations of these computations operate at only one point in this underlying trade-off space. In many cases the implementor of the computation is not even aware that this trade-off space exists and that the computation has the ability to operate successfully at a variety of points in this space.

My research group has developed a variety of techniques that automatically discover and exploit tradeoffs between accuracy, performance, and power. These techniques use rich program transformations (program transformations that may change the semantics of the program) to induce a search space of programs surrounding the original program. They then search this space to find programs with desirable accuracy, performance, and power consumption characteristics.

• Rely and Chisel: These two systems are designed to support computations that execute on approximate and/or unreliable hardware systems. The goal is to provide programming support that enables developers to reason about the reliability and accuracy of computations that execute on such hardware platforms.

Rely enables developers to specify reliability requirements (i.e., minimum probabilities with which the program must produce a correct result), which Rely compiler then checks. Chisel works with combined reliability and accuracy requirements. Starting with a specification of these requirements, the Chisel compiler automatically chooses which operations must execute precisely and which can execute unreliabily/approximately while still satisfying the reliability and accuracy requirements. Chisel also determines which variables can be stored in reliable memory and which can be stored in approximate memory. The selection of unreliable/approximate operations and memory is performed to maximize a resource objective such as energy savings.

• Statistical Accuracy and Performance Models: My ICS 2006 paper presents techniques for automatically deriving statistical performance and accuracy models that characterize the effect of discarding tasks in Jade computations. It is then possible to use these models to purposefully discard tasks to improve performance or reduce the power consumption of the computation while preserving acceptably accurate computation. My OOPSLA 2007 paper demonstrates how to use these techniques to eliminate barrier idling (idle time that incurred when processors wait at a barrier for other processors to complete tasks) in parallel computations.

• Loop Perforation: Loop perforation automatically transforms loops to skip some of the iterations that they would otherwise perform. Our MIT 2009 Technical Report shows how to use loop perforation to automatically augment computations with the ability to operate successfully at a variety of points in their underlying performance, accuracy, and power consumption trade-off space. It also shows how to use loop perforation to enable computations to dynamically adapt to fluctuations in the amount of delivered computational resources. Such fluctuations can be caused, for example, by dynamic voltage/frequency scaling, the loss of processors, or varying load.

Our ICSE 2010 paper shows how to use loop perforation as a profiling technique that helps developers find computations that are amenable to optimizations that trade accuracy in return for increased performance or reduced power consumption.

Our ESEC/FSE 2011 paper shows how to use loop perforation to automatically discover and exploit performance vs. accuracy tradeoffs that are inherently present (but, in the absence of loop perforation, hidden) in many applications.

• Dynamic Knobs: Programs sometimes come with static configuration parameters that control the performance, power consumption, and accuracy of various algorithms within the program. Our ASPLOS 2011 Paper and our MIT 2010 Technical Report show how to automatically find such parameters, convert them into dynamic knobs , and use the dynamic knobs to dynamically adapt the computation to respond to changes in the characteristics of the underlying computational platform.

• Automatic Parallelization with Statistical Accuracy Tests: Our MIT 2010 Technical Report presents the QuickStep compiler, which contains a collection of parallelization transformations (specifically, parallelism introduction transformations, accuracy enhancing transformations, and performance enhancing transformations). Starting with a sequential program, QuickStep automatically applies these transformations to generate a space of candidate parallel programs, using statistical accuracy tests to evaluate the acceptability of each candidate parallel program. QuickStep can automatically generate parallel computations (such as efficient computations that contain acceptably infrequent data races) that are inherently beyond the reach of previously proposed automatic parallelization strategies.

• ### Robust, Secure Software

Software systems are often perceived to be fragile and brittle, vulnerable to security exploits and errors that cause the system to crash. My research group has developed a set of techniques that make software systems more robust, resilient, reliable, and secure by enabling them to survive a range of otherwise fatal errors. The overall goal is to make the software survive any error, continue to execute, and provide acceptable service to its users. One particularly important application of this technology is to eliminate security vulnerabilities such as errors that enable remote software injection attacks.

My Onward! 2003 paper on Acceptability-Oriented Computing presents the conceptual foundations behind this approach to robust software systems. Our Onward! 2005 paper on Exploring the Acceptability Envelope further explores the implications of acceptability-oriented computing. Specific acceptability-oriented computing techniques include:

• Eliminating Infinite Loops: My research group has developed a system, Jolt, for automatically eliminating infinite loops. When activated, Jolt observes loop executions to determine if the state is the same on successive iterations. If so, it exits the loop by jumping to a location outside the loop. Our ECOOP 2011 paper describes the technique in more detail and presents experimental results that demonstrate its effectiveness in eliminating infinite loops in a set of benchmark applications.

• Input Rectification: My Onward! 2007 paper shows how to manually build input rectifiers that automatically transform inputs to eliminate security exploits and features that exercise untested or poorly tested parts of the software system. Our ICSE 2012 paper shows how to automatically construct rectifiers by learning characteristics of typical inputs that the system processes correctly. When placed between the system and its input sources, input rectifiers can increase security and robustness by keeping the system within a tested and reliable region of its execution envelope. A key concept is the comfort zone of a system - the region of the input space that the system can process without failure, typically because the system has been tested extensively on this region.

• Automatic Inference and Enforcement of Observed Invariants: In collaboration with Michael Ernst's research group, we have developed techniques for automatically inferring and enforcing observed invariants such as data structure consistency properties and properties of compiled binaries. The goal is to develop patches that correct the effects of errors, enabling the software to continue to execute and provide service to its users in spite of the errors.

Our ClearView system uses this technique to automatically patch otherwise fatal errors in compiled binaries with no access to source code or debugging information. It observes normal, error-free executions of the program to derive a model (in the form of invariants observed to be true in all observed executions) of the normal behavior of the program. It responds to errors that cause the program to crash by finding observed invariants that are violated when the program crashes, developing patches that change the state of the system (typically by changing the values in memory locations or registers) to enforce violated invariants, then observing executions of the system to find patches that successfully eliminate the effect of the error to enable the program to continue to execute. This technique can be especially effective in the context of a community of machines running similar software, since it can enable ClearView to obtain patches that make all machines invulnerable to an attack even though only a few have actually observed the attack.

See our SOSP 2009 paper on this topic for more information on ClearView. An earlier system automatically inferred and enforced data structure consistency constraints to successfully correct errors that violate data structure consistency properties and continue on to provide successful service to users. See our ISSTA 2006 paper on this topic for more information.

• Failure-Oblivious Computing: Out of bounds accesses can cause fatal memory corruption or cause the program to throw exceptions that terminate the computation. Failure-oblivious computing is a technique for enabling the program to survive otherwise fatal out of bounds memory accesses. This technique performs bounds checks on all memory accesses. It discards out of bounds writes and manufactures values to return as the result of out of bounds reads. Our results show that this technique enables a set of open-source server programs (Pine, Apache, Sendmail, Midnight Commander, and Mutt) to survive otherwise fatal buffer overflow attacks. See our OSDI 2004 paper on this topic for more information.

• Boundless Memory Blocks: Instead of discarding out of bounds writes, boundless memory blocks store out of bounds writes in a hash table, enabling the system to return the stored values on corresponding out of bounds reads. Conceptually, this technique gives each allocated block of memory unbounded size. Our results show that this technique enables a set of server programs to survive otherwise fatal buffer overflow attacks. See our ACSAC 2004 paper on this topic for more information.

• Cyclic Memory Allocation: Memory leaks can cause a system to exhaust its address space and fail. Cyclic memory allocation eliminates memory leaks by allocating a single fixed-size buffer for a given allocation site, then cyclically allocating objects from that site in that buffer. Our results show that this technique can eliminate remotely-exploitable errors that otherwise cause the system to leak memory. See our ISMM 2007 paper on this topic for more information.

• Data Structure Repair: Software errors and hardware failures can cause data structures in running programs to violate key data structure consistency properties. This violation can cause the program to produce unacceptable results or even fail. Our IEEE TSE 2006 paper presents techniques that repair the data structure, eliminate inconsistencies, and enable the system to continue to execute successfully. We have used our data structure repair system to repair inconsistent data structures and eliminate failures in a variety of applications, including an air-traffic control system, an open-source word processing system, and an interactive multiplayer game.

Earlier versions of this research appeared in Our ICSE 2005 paper, our ISSRE 2003 paper, our OOPSLA 2003 paper, and our SMS 2003 paper.

• ### Program Verification

My research group has developed two systems for verifying sophisticated properties of linked data structures.

• The Hob System: Hob focuses both on internal consistency properties of linked data structures (such as the consistency of the links in a tree or list) and consistency properties that involve multiple data structures (for example, the set of objects in one data structure must be a subset of the set of objects in another data structure). Hob uses sets of abstract objects in its specifications. It can therefore model sophisticated typestate properties of objects as well as important data structure consistency constraints. One particularly noteworthy aspect of the Hob system is its modular design: it includes a variety of techniques that enable modular specification and verification of data structures, including the use of pluggable, arbitrarily sophisticated program analysis and verification systems.

One of the most significant results of the Hob project was that its specification techniques enable the verification of externally relevant design properties. The specification for our Hob implementation of the Minesweeper game, for example, expresses the rules of the game in a way that is directly relevant to users of the game. These specifications stand in stark contrast to standard software specifications, which are largely incomprehensible to users of the system since they focus on the internal implementation details of the program rather than properties that are directly relevant to users.

Patrick Lam's PhD thesis presents a comprehensive overview of the Hob system. Our AOSD 2005 paper shows how we use aspect-oriented techniques to enable the modular specification and verification of programs that manipulate sophisticated linked data structures. Our Hob IEEE Transactions on Software Engineering paper focuses on how Hob enables multiple modular pluggable program analyses to work together to prove sophisticated correctness properties. See my publications page for more Hob publications.

• The Jahob System: Jahob is a comprehensive program specification and verification system. It uses integrated reasoning to solve verification problems - it combines a broad collection of automated reasoning systems (with varying performance, reasoning power, and scope characteristics) that work together within the Jahob framework to verify the automatically generated verification conditions. Jahob has been used to verify the full functional correctness of a collection of linked data structures. By full functional correctness we mean that the system verifies every property (with the exception of resource consumption properties such as running time and memory consumption) that the client relies on to use the data structure correctly.

Our PLDI 2008 paper describes how we used Jahob to perform the first verification of full functional correctness for a range of linked data structures, including mutable lists, trees, graphs, and hash tables. Our PLDI 2009 paper presents our integrated proof language, which enables developers to guide the correctness proofs while staying within the conceptual framework of the underlying imperative programming language. In this approach, the developer simply controls key proof choice points, with the powerful Jahob integrated reasoning system performing the vast majority of the verification. Our PLDI 2011 paper presents a technique for proving that operations on linked data structures commute - a key property for reasoning about many parallel and distributed computations. See my publications page for more Jahob publications, including a broad range of publications on the logical techniques behind Jahob.

Viktor Kuncak's PhD thesis describes the techniques that form the foundation of the Jahob system. Karen Zee's PhD thesis presents our experience using Jahob to verify the full functional correctness of a range of linked data structures.

• ### Analysis and Optimization of Multithreaded Programs

The key problem with analyzing multithreaded programs is the need to characterize the effect of all potential interleavings of statements from parallel threads. My research group pioneered a compositional approach to this problem in which the algorithm analyzes each thread once to obtain a result that characterizes all potential interactions between that thread and other parallel threads. We have used this approach to develop a variety of analyses for different kinds of multithreaded programs. Here are several specific analyses:

• The first flow-sensitive pointer analysis algorithm for programs with structured multithreading. The results have been used by a variety of research groups to enable optimizations and further analyses. We have a Web page that discusses this topic in more depth.

• The first combined pointer and escape analysis for programs with the unstructured form of multithreading present in Java. See our PPoPP 2001 paper on this topic for details.

• The first incrementalized pointer and escape analysis. Instead of analyzing the whole program, this algorithm builds on our compositional analysis to analyze only those parts of the program that may deliver useful optimization opportunities. An analysis policy monitors the analysis results to direct the incremental investment of analysis resources to those parts of the program that offer the highest expected optimization return. See our PLDI 2001 paper on this topic for details.

• A new symbolic analysis for characterizing regions of allocated memory blocks. In addition to loops and array accesses, the analysis is designed for programs that use pointers, pointer arithmetic, and recursion. The results have been used for static race detection for parallel divide and conquer programs, automatic parallelization of sequential divide and conquer programs, static detection of array bounds violations, and elimination of array bounds checks. Basically, the analysis can verify that the program is completely free of a wide range of memory access anomalies that can cause a wide range of very nasty errors.
All of these algorithms have been implemented, either in the Stanford SUIF compiler infrastructure (for C) or the MIT Flex compiler infrastructure (for Java) and have been used to analyze and and perform a range of optimizations on complete benchmark programs. See my home page for pointers to pages that discuss the research in more detail.

I have also prepared an invited paper and an invited talk for SAS 2001 on the analysis of multithreaded programs.

• ### Analysis and Optimization of Divide and Conquer Programs

Divide and conquer programs present an imposing set of analysis challenges. The natural formulation of these programs is recursive. For efficiency reasons, programs often use dynamic memory allocation to match the sizes of the data structures to the problem size, and often use pointers into arrays and pointer arithmetic to identify subproblems. Traditional analyses from the field of parallelizing compilers are of little or no help for this class of programs --- they are designed to analyze loop nests that access dense matrices using affine array index expressions, not recursive procedures that use pointers and pointer arithmetic.

We have developed the following analyses and optimizations:

• A new analysis that symbolically characterizes accessed regions of memory blocks. Because the analysis computes symbolic bounds for both pointers and array indices, it can be used to solve a wide variety of program analysis problems. To avoid problems associated with the use of traditional fixed-point techniques, the analysis formulates the problem as a system of symbolic constraints. It then reduces the system to a linear program; the solution to the linear program yields a solution to the original symbolic constraint system.

We used this algorithm to solve a range of analysis problems, including static race detection for parallel divide and conquer programs, automatic parallelization of sequential divide and conquer programs, static detection of array bounds violations, and elimination of array bounds checks. See our PLDI 2000 paper on this topic for more details.

• A new algorithm for increasing the granularity of the base case in divide and conquer programs. This algorithm unrolls recursive calls to reduce control flow overhead and increase the size of basic blocks in the computation. The result is a significant increase in the performance. See our LCPC 2000 paper on this topic for more details.

• The first published analysis for automatically parallelizing divide and conquer programs. This analysis has since been superseded by the symbolic bounds analysis listed above. Nevertheless, you can see our PPoPP 1999 paper on this topic for more details.
See my home page for pointers to pages that discuss this research in more detail.

• ### Synchronization Optimizations

My research group has investigated a variety of ways of eliminating synchronization overhead in multithreaded and automatically parallelized programs. These optimizations eliminate mutual exclusion synchronization, reduce its frequency, replace it with more efficient optimistic synchronization primitives, and replicate objects to eliminate the need for synchronization. Specific techniques include: /P>

• Synchronization Elimination: We have developed several combined pointer and escape analyses that are designed, in part, to find unnecessary synchronization operations. Our OOPSLA 1999 paper presents an escape analysis that finds objects that do not escape their allocating thread. The compiler uses the results of this analysis to generate customized, synchronization-free versions of the operations that execute on such objects. Our PPoPP 2001 paper presents an algorithm that analyzes interactions between threads to find and eliminate synchronization operations that are temporally separated by thread creation actions.

• Lock Coarsening: Lock coarsening reduces the frequency with which the computation acquires and releases locks. We have developed two approaches: data lock coarsening , which associates one lock with multiple objects that tend to be accessed together, and computation lock coarsening, which coalesces multiple critical sections that acquire and release the same lock multiple times into a single critical section that acquires and releases the lock only once. Our experimental results show that lock elimination can significantly reduce the synchronization overhead in programs that use mutual exclusion synchronization.

Our JPDC 1998 paper presents an algorithm that performs both data and computation lock coarsening for object-based programs. Our POPL 1997 paper presents a set of general transformations that move lock constructs both within and between procedures to coarsen the lock granularity.

• Dynamic Feedback: There is a trade-off associated with lock coarsening: increasing the granularity reduces the synchronization overhead, but also increases the size of the critical regions, which may reduce the amount of available concurrency. We have developed an algorithm that generates several different versions of the program; the versions differ in how aggressively they apply the lock coarsening transformation. When the program runs, the generated code samples the performance of each version, then uses the version with the best performance. The code periodically resamples to adapt to changes in the best version. Our TOCS 1999 paper presents the algorithm and experimental results. We also published a PLDI 1997 paper on this topic.

• Optimistic Synchronization: Optimistic synchronization primitives such as load linked/store conditional provide an attractive alternative to mutual exclusion locks. Our PPoPP 1997 paper presents our experience using optimistic synchronization to implement atomic operations in automatically parallelized object-based programs. Our results show that using optimistic synchronization can improve the performance and significantly reduce the amount of memory required to execute the computation.

• Adaptive Replication: In many programs it is possible to eliminate synchronization bottlenecks by replicating frequently accessed objects that cause these bottlenecks. Each thread that updates such an object creates its own local replica and performs all updates locally on that replica with no synchronization and no interaction with other threads. When the computation needs to access the final value of the object, it combines the values stored in the replicas to generate the final value. A key question is determining which objects to replicate - replicating all objects can cause significant and unnecessary memory and computation overheads. We have developed a technique that dynamically measures the contention at objects to replicate only those objects that would otherwise cause synchronization bottlenecks.

Our ICS 1999 paper presents experimental results from an implementation of this technique. The results show that, for our set of benchmark programs, adaptive replication can improve the overall performance by up to a factor of three over versions that use no replication, and can reduce the memory consumption by up to a factor of four over versions that fully replicate updated objects. Furthermore, adaptive replication never significantly harms the performance and never increases the memory usage without a corresponding increase in the performance.

• Data Race Freedom: A multithreaded program has a data race if two threads concurrently access the same location without synchronization, and at least one of the accesses is a write. Data races typically reflect the failure of the programmer to correctly synchronize an atomic operation, and are viewed as one of the primary complications associated with developing multithreaded programs.

We have recently developed an extended type system for Java; this system guarantees that any well-typed program is free of data races. It enables the programmer to specify the protection mechanism for each object as part of the object's type. The type can specify either the mutual exclusion lock that protects the object from unsynchronized concurrent accesses, or that threads can safely access the object without synchronization because either 1) the object is immutable, 2) the object is accessible to only the current thread and is not shared between threads, or 3) the current thread holds the unique reference to the object. The type checker uses the type declarations to verify that the program uses each object only in accordance with its declared protection mechanisms.

The result is that programmers can aggressively remove superfluous synchronization without introducing data races. See our OOPSLA 2001 paper on this topic for more details.

• ### Automatic Parallelization

Traditional parallelizing compilers uses data dependence analysis to parallelize loop nests that access dense matrices using affine access functions. While this approach works well in its target domain, it is not designed for irregular, object-based programs whose parallel tasks include the computation of many different procedures.

We have developed a new analysis approach, commutativity analysis , that exploits the structure of the object-oriented paradigm to view the computation as consisting of operations on objects. It then analyzes the computation at this granularity to determine if operations commute (i.e. generate the same result regardless of the order in which they execute). If all of the operations in a given computation commute, the compiler can automatically generate parallel code. We have used this technique to successfully parallelize a variety of complete application programs, with the automatically parallelized versions running between 14 and 19 times faster (on a 24 processor machine) than the original sequential versions. See our TOPLAS 1997 paper on this topic for more details.

• ### Future Directions

I am currently exploring ways to relate the referencing properties of objects to the changing roles that they play in the computation. Conceptually, each object's role depends, in part, on the data structures in which it participates. As the object moves between data structures, its role often changes to reflect its new relationships. We are developing dynamic analyses to discover these roles, and static analyses that verify that the program uses each object correctly in its current role. We expect to use the results to enhance the programmer's understanding of large programs and to optimize various aspects of distributed programs such as communication and failure propagation. One early result is a new type system for ensuring that multithreaded programs are free of data races. This system allows the programmer to identify how threads obtain and relinquish control of different kinds of objects, ensuring that the program observes the correct acquisition protocol for each object. See our OOPSLA 2001 paper on this topic for more details.

In collaboration with Daniel Jackson, I am investigating ways to check that a program correctly implements high-level design properties expressed using object models. My contribution focuses on new, very precise, program analyses that leverage the design information to focus the analysis on properties of interest to the programmer. The National Science Foundation is currently supporting this research through an ITR grant.

Finally, I have a range of Java analysis and optimization projects underway. These include the exploration of techniques for implementing Real-Time Java, space optimizations for Java programs, techniques that leverage program analysis information to provide very fast atomic transactions, and a range of memory system optimizations such as region-based allocation, elimination of write barriers, and (leveraging program analysis information about object lifetimes) explicit alloc/free optimizations.

• ### Other Areas

I have also done research in hardware synthesis and credible compilation .