Michael Ernst's research interests

I am recruiting strong students who are interested in software engineering, programming languages, security, testing, and related topics.

Also see:

My primary research area is programmer productivity, which spans the spectrum from software engineering, to program analysis (both static and dynamic), to programming language design. My broader computer science research interests range over a wide variety of fascinating topics, with no end to the fun in sight. Here are the ones about which I have published. (Superseded papers are not listed.)

Contents:

Programming language design

IR '95: Intermediate Representations Workshop Proceedings (1995)
This workshop examined current trends and research in the design and use of intermediate representations for manipulating computer programs.
Predicate dispatching: A unified theory of dispatch (ECOOP '98, the 12th European Conference on Object-Oriented Programming, 1998)
Predicate dispatching generalizes and subsumes previous method dispatch mechanisms (object-oriented dispatch, pattern matching, and more) by permitting arbitrary predicates to control method applicability and by using logical implication between predicates as the overriding relationship.
Using predicate fields in a highly flexible industrial control system (Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2005), 2005)
Predicate-oriented programming has not previously been evaluated outside a research context. This paper describes a case study that evaluates the use of predicate fields in a large-scale application that is in daily use.
Practical pluggable types for Java (ISSTA 2008, Proceedings of the 2008 International Symposium on Software Testing and Analysis, 2008)
We have created a framework for pluggable type-checking in Java (including backward-compatible syntax). Checkers built with the framework found real errors in existing software.
Type Annotations specification (JSR 308) (2008)
This specification extends Java's annotation system to permit annotations on any use of a type (including generic type arguments, method receivers, etc.). It is planned for inclusion in Java 7, but it is usable immediately with full backward compatibility with existing compilers and JVMs.

Immutability (side effects)

A practical type system and language for reference immutability (Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2004), 2004)
This paper presents a type system, language, implementation, and evaluation of a safe mechanism for enforcing reference immutability, where an immutable reference may not be used to cause side effects to any object reachable from it.
Javari: Adding reference immutability to Java (Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2005), 2005)
A compiler can guarantee that an immutable reference is not used to cause side effects to any reachable object. This paper extends previous proposals in many ways, including formal type rules and support for Java generics.
Object and reference immutability using Java generics (ESEC/FSE 2007: Proceedings of the 11th European Software Engineering Conference and the 15th ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2007)
This paper shows how to provide compiler-checked guarantees of both object immutability and reference immutability, without changes to Java syntax, by providing a single “immutability” generic parameter for each class.
Inference of reference immutability (ECOOP 2008 — Object-Oriented Programming, 22nd European Conference, 2008)
This paper presents a precise, scalable, novel algorithm for inference of reference immutability (as defined by the Javari language), and an experimental evaluation on substantial programs.
A formal definition and evaluation of parameter immutability. (Automated Software Engineering, 2009)
This paper defines reference immutability formally and precisely. The paper gives new algorithms for inferring immutability that are more scalable and precise than previous approaches. The paper compares our algorithms, previous algorithms, and the formal definition.

Static analysis

Value dependence graphs: Representation without taxation (Proceedings of the 21st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1994)
This paper describes the value dependence graph (VDG), a sparse, functional, dataflow-like program representation based on demand dependences. It also presents some of the analyses and transformations that the VDG simplifies.
Practical fine-grained static slicing of optimized code (TR, 1994)
Slicing helps visualize dependences and restrict attention relevant program components. This paper describes techniques, and an implementation, for slicing on arbitrary program values, omitting irrelevant parts of statements, and exploiting optimizations.
Serializing parallel programs by removing redundant computation (TR, 1994)
This paper introduces and evaluates methods for automatically transforming a parallel algorithm into a less parallel one that takes advantage of only the parallelism available at run time, thus performing less computation to produce the same results.
Slicing pointers and procedures (abstract) (TR, 1995)
This paper describes how to efficiently extend program slicing to arbitrary pointer manipulations, aggregate values, and procedure calls. The implementation is the first for an entire practical programming language.
An empirical analysis of C preprocessor use (IEEE Transactions on Software Engineering, 2002)
Undisciplined use of the C preprocessor can greatly hinder program understanding and manipulation. This paper examines 1.2 million lines of C code to determine what types of problematic uses appear in practice.
Selecting, refining, and evaluating predicates for program analysis (TR, 2003)
This paper investigates several techniques, notably dynamic ones based on random selection and machine learning, for predicate abstraction — selecting the predicates that are most effective for a program analysis.
Static deadlock detection for Java libraries (ECOOP 2005 — Object-Oriented Programming, 19th European Conference, 2005)
This paper gives a technique for determining whether any invocation of a library can lead to deadlock in the client program; it can also be extended to the closed-world (whole-program) case.

Security

A simulation-based proof technique for dynamic information flow (PLAS 2007: ACM SIGPLAN Workshop on Programming Languages and Analysis for Security, 2007)
A dynamic analysis of information flow (e.g., end-to-end confidentiality) is sound if its estimates of the amount of information revealed can never be too low. This paper proves such a soundness result by simulating an analyzed program with a pair copies connected by a limited-bandwidth channel.
Quantitative information flow as network flow capacity (PLDI 2008, Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008)
To obtain a more precise measurement of the amount of secret information a program might reveal, we replace the usual technique of tainting (reachability from secret inputs) with a maximum-flow computation on a graph representation of execution with edge capacities. With appropriate optimizations, the technique scales to check realistic properties in several large C programs.
Automatic creation of SQL injection and cross-site scripting attacks (TR, 2008)
By generating inputs that cause the program to execute particular statements, then modifying those inputs into attack vectors, it is possible to prove the presence of real, exploitable security vulnerabilities in PHP programs.

Refactoring

Automated support for program refactoring using invariants (ICSM 2001, Proceedings of the International Conference on Software Maintenance, 2001)
This paper shows how program invariants can productively be used to identify candidate refactorings, which are small-scale program transformations that improve program structure, performance, or other features.
Converting Java programs to use generic libraries (Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2004), 2004)
When type parameters are added to library code, client code should be upgraded to supply parameters at each use of library classes. This paper presents a sound and precise combined pointer and type-based analysis that does so.
Refactoring for parameterizing Java classes (ICSE'07, Proceedings of the 29th International Conference on Software Engineering, 2007)
Programmers must change their programs to take advantage of generic types. This paper presents a type-constraint-based analysis that converts non-generic classes such as List into generic classes such as List<T>.
How do programs become more concurrent? A story of program transformations (TR, 2008)
This paper is a historical analysis of the transformations that programmers used to convert sequential programs into concurrent versions.
Refactoring sequential Java code for concurrency via concurrent libraries (TR, 2008)
This paper presents a refactoring tool, Concurrencer, that transforms sequential code into parallel code that uses the java.util.concurrent libraries.

Testing

Improving test suites via operational abstraction (ICSE'03, Proceedings of the 25th International Conference on Software Engineering, 2003)
This paper proposes a technique for selecting test cases that is similar to structural code coverage techniques, but operates in the semantic domain of program behavior rather than in the lexical domain of program text. The technique outperforms branch coverage in test suite size and in fault detection.
Reducing wasted development time via continuous testing (Fourteenth International Symposium on Software Reliability Engineering, 2003)
Early notification of problems enables cheaper fixes. This paper evaluates how much developer time could be saved by continuous testing, which uses extra CPU cycles to continuously run tests in the background.
Continuous testing in Eclipse (2nd Eclipse Technology Exchange Workshop (eTX), 2004)
This paper describes the design principles, user interface, architecture, and implementation of a publicly-available continuous testing plug-in for the Eclipse development environment.
Automatic mock object creation for test factoring (ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE'04), 2004)
A set of small, fast-running tests can be more useful than a single larger test. This paper proposes a way to automatically factor a large test case into smaller tests, using mock objects to model the environment.
An experimental evaluation of continuous testing during development (ISSTA 2004, Proceedings of the 2004 International Symposium on Software Testing and Analysis, 2004)
Continuous testing during development provides early notification of errors. This paper reports a controlled experiment to measure whether and how programmers benefit from continuous testing.
Eclat: Automatic generation and classification of test inputs (ECOOP 2005 — Object-Oriented Programming, 19th European Conference, 2005)
This paper presents an automatic mechanism for selecting tests that are likely to expose errors — tests whose run-time behavior is maximally different from succeeding runs. The paper also gives techniques for test input generation and for converting a test input into a test case.
Automatic test factoring for Java (ASE 2005: Proceedings of the 20th Annual International Conference on Automated Software Engineering, 2005)
Test factoring creates fast, focused unit tests from slow system-wide tests. Each new unit test exercises only a subset of the functionality exercised by the system test.
An empirical comparison of automated generation and classification techniques for object-oriented unit testing (ASE 2006: Proceedings of the 21st Annual International Conference on Automated Software Engineering, 2006)
This paper experimentally evaluates four test generation strategies: the four combinations of model-based vs. exception-based, and symbolic vs. random. The model-based symbolic combination is new.
Finding the needles in the haystack: Generating legal test inputs for object-oriented programs (M-TOOS 2006: 1st Workshop on Model-Based Testing and Object-Oriented Systems, 2006)
An automatically inferred model of legal method call sequences can bias test generation toward legal method sequences, increasing coverage and creating data structures beyond the ability of undirected random generation.
Feedback-directed random test generation (ICSE'07, Proceedings of the 29th International Conference on Software Engineering, 2007)
This paper presents a technique that improves random test generation by incorporating feedback from executing previously-generated inputs. The technique found serious, previously-unknown errors in widely-deployed applications.
Theories in practice: Easy-to-write specifications that catch bugs (TR, 2008)
A “theory” is a claim about one or more objects — an executable predicate that generalizes over a (possibly infinite) set of tests. Theories serve as a type of specification that can complement or supersede individual tests.
ReCrash: Making software failures reproducible by preserving object states (ECOOP 2008 — Object-Oriented Programming, 22nd European Conference, 2008)
ReCrash is a lightweight technique that monitors a program for failures such as crashes. When a failure occurs in the field, ReCrash creates multiple unit tests that reproduce it. This eases debugging and fixing the errors.
Finding bugs in dynamic web applications (ISSTA 2008, Proceedings of the 2008 International Symposium on Software Testing and Analysis, 2008)
This paper extends dynamic test generation, based on interleaved testing and symbolic execution, to the new domain of web appliations, and finds 214 bugs in 4 PHP web applications.

Bug prediction

Predicting problems caused by component upgrades (ESEC/FSE 2003: Proceedings of the 9th European Software Engineering Conference and the 11th ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2003)
A software upgrade may break a customer's system because of differences between it and the vendor's test environment. This paper shows how to predict such problems without having to integrate and test.
Finding latent code errors via machine learning over program executions (ICSE'04, Proceedings of the 26th International Conference on Software Engineering, 2004)
This paper shows the efficacy of a technique that performs machine learning over correct and incorrect programs, then uses the resulting models to identify latent errors in other programs.
Early identification of incompatibilities in multi-component upgrades (ECOOP 2004 — Object-Oriented Programming, 18th European Conference, 2004)
This paper extends the technique of “Predicting problems caused by component upgrades” [ESEC/FSE 2003] to deal with multi-module upgrades and arbitrary calling patterns. It also presents 4 other enhancements and a case study of upgrading the C standard library as used by 48 Unix programs.
Formalizing lightweight verification of software component composition (SAVCBS 2004: Specification and Verification of Component-Based Systems, 2004)
This paper formalizes (and corrects a few mistakes in) previous work on predicting problems caused by component upgrades. It presents an outline for a proof that the technique permits only sound upgrades.
Prioritizing warnings by analyzing software history (MSR 2007: International Workshop on Mining Software Repositories, 2007)
Bug-finding tools prioritize their warning messages, but often do so poorly. This paper gives a new prioritization, based on which problems programmers have fixed most quickly in the past — those are likely to be the most important ones.
Which warnings should I fix first? (ESEC/FSE 2007: Proceedings of the 11th European Software Engineering Conference and the 15th ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2007)
This paper determines how a bug-finding tool should prioritize its warnings. The prioritization is computed from the bug fix history of the program — the problems programmers considered important enough to fix.

Invariant detection

Dynamically discovering pointer-based program invariants (TR, 1999)
This paper extends dynamic invariant detection to pointer-directed data structures by linearizing those structures and, more significantly, by permitting inference of disjunctive invariants (like “A or B”).
Quickly detecting relevant program invariants (ICSE 2000, Proceedings of the 22nd International Conference on Software Engineering, 2000)
Dynamic invariant detection can output many unhelpful properties as well as those of interest. This paper gives four techniques that eliminate uninteresting invariants or add relevant ones, thus increasing the technique's usefulness to programmers.
Dynamically Discovering Likely Program Invariants (Ph.D. dissertation, 2000)
This dissertation overviews dynamic invariant detection as of summer 2000. It includes materials from the papers on the basic invariant detection techniques (IEEE TSE, 2001), relevance improvements (ICSE 2000), and extensions to pointer-based data structures (UW TR, 1999).
Dynamically discovering likely program invariants to support program evolution (IEEE Transactions on Software Engineering, 2001)
Program properties (such as formal specifications or assert statements) are useful for a variety of programming tasks. This paper shows how to dynamically infer program properties by looking for patterns and relationships among values computed at run time.
Efficient incremental algorithms for dynamic detection of likely invariants (Proceedings of the ACM SIGSOFT 12th Symposium on the Foundations of Software Engineering (FSE 2004), 2004)
This paper shows how to perform dynamic invariant detection both incrementally, efficiently, and without using only a trivial grammar; previous implementations suffered from one or more such problems.
The Daikon system for dynamic detection of likely invariants (Science of Computer Programming, 2007)
This paper discusses the Daikon tool, including its features, applications, architecture, and development process. It is not a paper about dynamic invariant detection per se.

Dynamic analysis

Static and dynamic analysis: Synergy and duality (WODA 2003: ICSE Workshop on Dynamic Analysis, 2003)
This position paper is intended to provoke thought and discussion regarding the relationship between static and dynamic analysis, which it claims are not as different as many have assumed.
WODA 2003: ICSE Workshop on Dynamic Analysis (2003)
This workshop brought together practitioners and academics to discuss topics from the structure of the dynamic analysis field, to how to enable better and easier progress, to specific new analysis ideas.
Improving adaptability via program steering (ISSTA 2004, Proceedings of the 2004 International Symposium on Software Testing and Analysis, 2004)
Program steering selects modalities for a program that may operate in several modes. This paper's experiments show that program steering can substantially improve performance in unanticipated situations.
Learning from executions: Dynamic analysis for software engineering and program understanding (2005)
A broad overview of the field of dynamic analysis, including applications of dynamic analysis, dynamic analysis algorithms, and implementation details. The slides capture only part of the tutorial, and naturally the tutorial only captures part of this exciting field.
Detection of web service substitutability and composability (WS-MaTe 2006: International Workshop on Web Services — Modeling and Testing, 2006)
Programmers, end users, and even program would like to discover and reuse software services. This paper presents an approach to indicate when Web Services may be substituted for one another or composed.
Dynamic inference of abstract types (ISSTA 2006, Proceedings of the 2006 International Symposium on Software Testing and Analysis, 2006)
This paper presents a dynamic analysis for computing abstract types, which indicate which values and variables may interact at run time. The paper also presents implementations and experiments for C++ and Java.
Inference and enforcement of data structure consistency specifications (ISSTA 2006, Proceedings of the 2006 International Symposium on Software Testing and Analysis, 2006)
This paper presents an automatic technique for recovering from data structure corruption errors. It involves inferring correctness constraints, then repairing any errors that occur at run time. The evaluation includes a hostile “Red Team” evaluation.

Software engineering

Panel: Perspectives on software engineering (ICSE 2001, Proceedings of the 23rd International Conference on Software Engineering, 2001)
This talk argues that tools are key to having impact on software engineering; they must be published; case studies are more fruitful than experiments; researchers have the responsibility to target the state of the art; static and dynamic tools should be combined; and lightweight formal tools are the right place to start.
ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2005) (2005)
PASTE 2005 brought together the program analysis, software tools, and software engineering communities to focus on applications of program analysis techniques in software tools.
The Groupthink specification exercise (Software Engineering Education in the Modern Age: Challenges and Possibilities, 2006)
The Groupthink specification exercise is a fun group activity that teaches students about specifications, teamwork, and communication. This paper describes both the goals of the activity (along with an assessment of it), and the mechanics of how to run it.

Verification

Static verification of dynamically detected program invariants: Integrating Daikon and ESC/Java (Proceedings of RV'01, First Workshop on Runtime Verification, 2001)
Dynamic invariant detection proposes likely (not certain) invariants; static verification requires program explicit identification of a goal and program annotation. This paper combines the two techniques to overcome the weaknesses of each.
Verifying distributed algorithms via dynamic analysis and theorem proving (TR, 2002)
Automatic theorem-provers require human direction in the form of lemmas and invariants. We investigate a dynamic technique for providing these intermediate steps. In three case studies, they reduced or eliminated human effort.
Automatic generation of program specifications (ISSTA 2002, Proceedings of the 2002 International Symposium on Software Testing and Analysis, 2002)
Sound program verifiers generally require program specifications, which are tedious and difficult to generate. A dynamic analysis can automatically produce unsound specifications. Combining the two techniques overcomes both weaknesses and demonstrates that the dynamic step, while unsound, can be quite accurate in practice.
Invariant inference for static checking: An empirical evaluation (Proceedings of the ACM SIGSOFT 10th International Symposium on the Foundations of Software Engineering (FSE 2002), 2002)
Tool unsoundness and incompleteness may hinder users performing a task, or users may benefit even from imperfect output. In a study of program verification aided by dynamic invariant detection, even very poor output from the invariant detector aided users to a statistically significant degree.
Using simulated execution in verifying distributed algorithms (Software Tools for Technology Transfer, 2004)
This paper proposes integration of dynamic analysis with traditional theorem-proving in ways that extend beyond mere testing. Generalizing over the test runs can reveal necessary lemmas, and the structure of the proof can mirror the structure of the execution.
An overview of JML tools and applications (Software Tools for Technology Transfer, 2005)
This paper overviews the Java Modeling Language (JML) notation for detailed design and gives a brief description of some of the tools that take it as an input or produce it as an output.
Verification for legacy programs (VSTTE: Verified Software: Theories, Tools, Experiments, 2005)
Legacy (under-documented and -specified) code will be with us forever. This paper suggests ways to cope with such systems, both to ameliorate short-term problems and to advance toward a future in which all code is verified.

Artificial intelligence

Image/map correspondence using curve matching (AAAI Spring Symposium on Robot Navigation, 1989)
Correspondences between a real-world image and a map or terrain model can assist in navigation. This paper describes a technique to find such correspondences by focusing only on salient curves (roads, rivers, ridgelines, etc.) rather than the entire image.
Automatic SAT-compilation of planning problems (IJCAI-97, Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 1997)
Planning problems can be solved by transformation into a propositional satisfiability problem, solution, and transformation of the SAT solution into a plan. This paper describes the first automatic translator from planning to SAT and evaluates the space of encodings of planning problems as SAT formulas.

Theory

ML typechecking is not efficient (Papers of the MIT ACM Undergraduate Conference, 1989)
ML programmers have the intuition that ML's typechecking algorithm runs in linear time. This paper is an early description of the surprising result (not due to me) that ML typechecking runs in time much worse than exponential in the size of its input.
Adequate Models for Recursive Program Schemes (Bachelors thesis, 1989)
This paper is a pedagogical exposition of adequacy for recursive program schemes in the monotone frame. Adequacy states that for any term of base type, the operational and denotational meanings are identical, and it is typically proved in the continuous frame.
Heraclitean encryption (TR, 1994)
Heraclitean encryption permits an encryptor to create many independent decryption keys for converting a particular codetext into plaintext. This permits tracing of decryption keys and, in the event of a compromise, determination of which decryptor leaked his or her key.
Playing Konane mathematically: A combinatorial game-theoretic analysis (UMAP Journal, 1995)
This paper presents a combinatorial game-theoretic analysis of Konane, an ancient Hawaiian stone-jumping game. Combinatorial game theory indicates, for a given board position, which player wins, and how great that player's advantage is.
Method and system for controlling unauthorized access to information distributed to users (1996)
Software patents are outside the intent of the patent system, and the patent office's review of them is a joke. That said, here is my patent, on a system for having unique decryption keys for each purchaser of an encrypted product, so as to determine who revealed a decryption key if one is made public.
Graphs induced by Gray codes (Discrete Mathematics, 2002)
A Gray code on n bits induces a graph over n vertices such that vertices i and j are adjacent when bit positions i and j flip consecutively during the code. This paper answers some questions about what sorts of graphs can (and cannot) be induced by Gray codes.

Miscellaneous

Self-reference in English (1989)
This paper shows that certain types of linguistic paradoxes cannot themselves be expressed without special linguistic conventions; this fact has since been dubbed “Ernst's Paradox”. This paper is recapitulated by “Quotational Ambiguity,” by George Boolos, in Proceedings of the Conference in Honor of W. V. O. Quine, San Marino, May 1990 (edited by Paulo Leonardi).
Intellectual property in computing: (How) should software be protected? An industry perspective (Memo, 1992)
This was the first colloquium on intellectual property in computing to bring the debate to the technical and social realm; previous analyses had been from a narrow legal perspective. Computer industry executives discussed how "look and feel" user interface copyrights and software patents could affect the industry.

Michael Ernst