My research focuses on software systems and related topics. The broad goal is to obtain better software - making software more robust, resilient, and secure, improving the performance, verifying that the software satisfies important correctness, acceptability, reliability, or accuracy properties, or making systems (both software and hardware) easier to specify, build, maintain, or improve. Much of my research involves automating tasks traditionally performed by programmers or software engineers. In addition to theoretical research, I often build (sometimes quite complex) systems to better understand and develop the key concepts. Within this broad range of interests I have performed a number of groundbreaking research projects:
Automatic Patch Generation: Software defects are a serious problem that can impair application functionality, lead to system crashes, open up security vulnerabilities, and consume substantial developer time and effort. My research group pioneered multiple groundbreaking techniques in automatic patch generation. These techniques include the first use of machine learning to recognize correct patches, the first algorithm to infer patch generation transforms from previous successful patches, and a hot patching system that can eliminate security vulnerabilities in stripped, running x86 binaries. All of these techniques apply to real-world defects in large production software systems of up to a million lines of code or more.
Code Transfer: Over the last several decades developers have delivered a large body of software applications that implement substantial amounts of useful functionality. Developers today often manually reuse this functionality by copying code across applications. One of the major challenges associated with this practice is the need to manually adapt the copied code to execute in the new application context. My research group has pioneered techniques, including automatic name translation, data structure translation, and removal of irrelevant code, that automate many of the challenges associated with reusing code and functionality across applications.
Approximate Computing: Approximate computing exploits opportunities to trade off the accuracy of the computation in return for increased performance, reduced power consumption, and/or increased robustness. My research group pioneered much of the foundational research in approximate computing, including loop perforation and task skipping for generating efficient and accurate approximate program execution, techniques for reasoning about the performance and accuracy of approximate computations, approximate programming languages that enable developers to specify and verify approximate computations, and foundational techniques in approximate parallel computing.
Acceptability-Oriented Computing: Software systems are often perceived to be fragile and brittle, vulnerable to security exploits and errors that cause the system to crash. My research group pioneered a new approach, acceptabilty-oriented computing, for eliminating these problems. Instead of attempting to deliver perfect software with no defects, we instead developed a set of simple techniques that detect and repair execution integrity errors. The overall goal is to make the software survive any error, continue to execute, and provide acceptable service to its users. Experimental results obtained by applying our implemented systems to real-world bugs in large, widely deployed applications highlight the benefits of this approach. One particularly important application of this technology is the elimination of security vulnerabilities such as memory errors that enable remote software injection attacks.
Computer Security: Security is a critical issue in modern computing systems. In the last several decades my group has completed multiple successful computer security projects and we are still continuing to perform research in this area.
Parallel Computing: Parallel computing plays a central role in performance engineering and has been a fruitful source of interesting research problems since the inception of the field. I and my research group have delivered landmark results in this field for decades. Prominent examples of these results include commutativity analysis, which discovers and exploits commuting operations on objects to automatically parallelize sequential programs, techniques for optimizing synchronization in parallel computations, techniques for analyzing divide and conquer programs, techniques for analyzing multithreaded programs, and Jade, an implicitly parallel programming language that has been implemented on the full range of parallel computing platforms, from tightly coupled shared memory machines to loosely coupled networked computing platforms.
Data Structure Consistency: Data structure consistency is a core program correctness property. Inconsistent data structures can cause programs to produce incorrect results, crash, or infinite loop. Motivated by these problems, we have developed two orthogonal approaches to obtaining data structure consistency:
Data Structure Repair: Software errors (such as data structure implementation defects or memory errors in other components) and hardware errors can cause data structures in running programs to violate key data structure consistency properties. My research group developed techniques that identify and repair inconsistent data structures. Results from real-world applications with data structure consistency errors, including air-traffic control software, highlight the robustness and resilience benefits that data structure repair can provide.
Formal Verification of Data Structure Implementations: This research pioneered new program verification techniques for verifying that implementations of linked data structures preserve key data structure consistency properties. One prominent result is the use of integrated reasoning to obtain the first verification of full functional correctness for a range of linked data structures such as mutable lists, trees, graphs, and hash tables.
Pointer and Escape Analysis: My research group developed the first pointer and esape analysis algorithms for multithreaded programs. The implemented analysis algorithms have been used for a variety of purposes, including hardware synthesis, synchronization elimination, stack allocation, region-based allocation, and the analysis of aspect-oriented programs.
Credible Compilation: Instead of simply producing an executable, a credible compiler also produces a proof that the compiler correctly compiled the program. Goals include increasing trust in the compiler, enabling more aggressive optimizations, and supportng the safe incorporation of code from multiple people and organizations in the compiler.
My credible compilation research produced the first relational program logic, a logic for proving relationships between two programs. The credible compilation project used relational logic to prove the correctness of the translation between the source and target program. Relational program logics have since been used for a wide variety of tasks in program analysis and verification and software engineering. My research group, for example, has used relational logic to prove relational acceptability properties of approximate programs (see our PLDI 2012 paper for more information).
Jade: My PhD thesis on Jade presented the design and implementation of an implicitly parallel programming language. Developers use the language identify tasks and specify how each task accesses shared objects. The Jade implementation then extracts the parallelism, maps the tasks onto the parallel computing platform, and automatically generates any communication required to ensure that each task accesses the correct version of each shared object.
The design philosophy behind Jade is that the developer writes a program with sequential semantics over a single shared object store. The implementation is then responsible for generating efficient parallel execution on the target parallel computing platform. This approach simplifies parallel programming by eliminating the need for the developer to manage the parallel execution and communication, both notoriously difficult tasks that developers struggle to perform successfully in practice. Jade also eliminates complexities such as data races, deadlock, and accessing incorrect data. These complexities are the source of many of the difficulties that developers encounter when attempting to develop explicitly parallel programs.