Lecture 2: Introduction to Inductive Synthesis
One of the simplest interfaces for program synthesis is Inductive synthesis. In inductive synthesis, the goal is to generate a function that matches a given set of input/output examples.The literature makes a distinction between Programming by Example (PBE), and Programming by Demonstration (PBD). In Programming by example, the goal is to infer a function given only a set of inputs and outputs, whereas in Programming by Demonstration, the user also provides a trace of how the output was computed.
For example, in Programming-by-example, if I want to convey to the system that I want it to synthesize the factorial function I may give it an example:
History
The idea of directing a computer through examples dates back to the 1970s, when Patrick Winston at MIT published the seminal work “Learning Structural Descriptions from Examples”1. This work was among the first to look into the problem of generalizing from a set of observations, although it was not really about trying to automate programming.A good candidate for “first PBD system” is Pygmalion 2. Pygmalion was framed as an "interactive 'remembering' editor for iconic data structures"; the high-level idea was that the state of a program would be described through icons and relationships between icons. Computation would then correspond to manipulation of these icons, as stated in their "Basic pygmalion metaphor".
The idea of a "remembering editor" was that if you perform such a manipulation by hand once, the editor can remember how to perform the manipulation so you can apply a similar manipulation in other contexts. Like a lot of early AI work, PYGMALION was very heavy on philosophy and metaphor and very weak on algorithms, particularly around the crucial question of how to generalize from a given demonstration so that the learned program could apply in other situations.
At around the same time, Summers looked more systematically at the question of how to generalize from a demonstration to a program, particularly at the question of how to derive looping structure 34. His algorithm was based on pattern matching, and was relatively brittle, but is still considered an important algorithm in the field.
Over time, much of the interest from the AI community shifted to Machine Learning, and to approaches to infer functions from large amounts of noisy data instead of a small number of careful demonstrations, with very little progress being made in the PBD and PBE space. There was another burst of interest in the mid 1990s that is best exemplified by the work of Tessa Lau, which tried to bring back insights from machine learning into PBE/PBD. Tessa Lau started this work as a graduate student at UW while working with Daniel Weld and she continued as a researcher at IBM 5. The goal was to move away from ad-hoc and brittle approaches and to develop general techniques that could be adapted to a variety of PBE problems.
The work focused on two major techniques: Version space generalization and Inductive logic programming, both of which will be covered later. This line of work caused a lot of excitement for a while, but it petered out after it was clear that they were not solving the problem well enough to be practical. Interestingly, only a couple of years before FlashFill launched the moden wave of programming-by-example systems, Tessa Lau published an article titled "Why PBD systems fail: Lessons learned for usable AI"6, which articulated many of the pitfalls that prevented the success of PBD systems.
Framing the PBD/PBE Problem
In traditional machine learning, the focus has historically been on the second challenge. The trick has been to pick spaces of programs that are either extremely expressive (e.g. neural networks) so that there are many different ways to match any set of observations and the challenge is how to avoid over-training. Or to pick a space that is too restricted (e.g. SVMs) such that it is more or less impossible to match all the observations, but you assume some observations are wrong anyway, so you can trade off how many samples you match against other criteria that make it more likely that your solution will work well enough in general.
The modern emphasis in PBE, however, has been to focus more on the space of programs itself. The focus on restricting the space of programs is not new; many early systems did this to an extreme case by just having a short list of programs that the system would scan through looking for one that matched the examples. What recent advances in synthesis have brought to the table are powerful mechanisms to search arbitrary program spaces. This has allowed us to design the space of programs in a way that excludes undesirable solutions from the space and focuses the search on "reasonable" programs. The ability to carefully control the program space does not completely eliminate the need to rank programs to give priority to the most likely ones. Even with a carefully designed program space, the problem is still underspecified. But the idea is that if you can search large but highly constrained space efficiently, you are more likely to get what you are looking for. By having the ability to reason about arbitrary (and very large) spaces of programs you can get the benefits of the list-of-programs approach without its inherent brittleness.
What is a program?
This is a good point to consider the question of what we actually mean when we talk about a program in the context of program synthesis. A program is a description of how to perform a computation. In general, describing a program requires a notation, a programming language that allows you to describe many different computations by composing individual syntactic elements each with well defined meaning. We are all familiar with popular programming languages such asPyton
, JavaScript
or C
.
At the other extreme, the notation of arithmetic, for example, can also be considered a programming language; it includes
syntactic elements such as numbers and arithmetic operators (+, −, ×), each with
a well defined meaning which can be used to describe a particular kind of computation. Unlike a language
like Python
, the language of arithmetic is not universal; it can only be used to described
a very narrow class of computations, so we can use it to compute the tip in a restaurant bill, but not
to determine the smallest element in a list of numbers.
As we will see over the rest of this course, the choice of an appropriate notation is crucial to the
success of program synthesis. Even if your end goal is to generate code in Pyton
or C
,
it is often useful to frame the program synthesis problem in terms of a more narrow notation that
more precisely captures only those programs that are actually relevant to the task at hand.
We will often refer to these more specialized langauges as Domain Specific Languages (DSLs).
It is usually convenient to think of them simply as subsets of more general languages, where
the programmer is prevented from using particular constructs, and is provided only with a limited
set of functions or subroutines.
Throughout the rest of this course, we will be defining small DSLs as synthesis targets. Often, it will
be enough to define their semantics informally or through examples, or where more precision is warranted,
we will define it in terms of a general purpose programming language. In many settings, we will be using
the notation of functional programming, which will be familiar to anyone who has programmed in Haskell
or Ocaml
, but may seem a bit foreign to some. We will say more about this notation when we use it,
but at a high-level, these are a few things to keep in mind about this notation, and why it makes a good
target for synthesis:
- No side effects. Computation in a functional language happens by evaluating pure functions, with no side effects and no mutation. This can be annoying when programming by hand, but can often simplify the job of the program synthesizer. A consequence of this is that functions are actually pure functions, if you give them the same inputs, they will produce the same outputs. This can also simplify the reasoning process significantly.
-
Concise and expressive.
Think about a program in Java that reverses a list. It's a relatively long program that involves a class declaration, some method declarations,
some loops, some constructors; maybe it would look something like this:
You can probably write code that is slightly cleaner than the code above, but not by much. From a synthesis perspective, that is a lot of code to write with a lot of opportunities to get it wrong. By contrast, in Haskell, a function to reverse a list can be defined like this:class ListReverser{static List reverseList(List myList) {List output = new ArrayList();for (int i = 0; i < myList.size(); i++) {output.add(myList.get(myList.size() - 1 - i));}return output;}}The whole function is a single expression that says that if the list is an empty list, you just return the empty list, and if it is not empty then it will have a head followed by the rest of the list, and then you should reverse the rest of the list and concatenate that with a list containing only the head. This conciseness is very useful when synthesizing programs, because it allows you to synthesize non-trivial programs while only having to discover small amounts of code.reverse lst = case lst of[] -> []head:rest -> (reverse rest) ++ [head]
There are long-running debates that sometimes get very religious in nature as to what is the best programming language for this or that purpose. For this course, though, we are not really interested in the question of what language you should use for writing a particular system. What we really care about is the notation that we are going to use when framing a synthesis problem; this notation will often have to be problem specific, but it is very important that this notation be concise and have enough expressiveness to solve our problem, but not much more.
Representing a program
When programming, we are used to representing programs as text, strings of characters with indentation and special characters to indicate, for example, the beginning of a block of code. When synthesizing or manipulating code, however, we want to represent code as a data-structure. The most common representation is an Abstract Syntax Tree (AST), which is just a tree with different kinds of nodes for different kinds of constructs in the language. There is usually a very close correspondence between the structure of the AST and the structure of a parse tree of the program. What makes it abstract is that the data-structure can usually ignore information about things like spacing or special characters like brackets, colons and semicolons that are there just to make the program more readable. As an example, consider the language of arithmetic expressions. The syntax of such a language can be represented as a context free grammar.data AST = Num Int| Plus AST AST| Times AST AST
abstract class AST {...}class Num extends AST{int val;public Num(int val){ this.val = val; }...}class Plus extends AST{AST left;AST right;public Plus(AST left, AST right){ this.left =left; this.right = right; }...}class Times extends AST{AST left;AST right;...}
Plus (Num 5) (Times (Num 3) (Num 2))
, while the second would be constructed
as Times (Plus (Num 5) (Num 3)) (Num 2)
, or in Java,
new Plus( new Num(5), new Times( new Num(3), new Num(2)))
and
new Times( new Plus( new Num(5), new Num(3)), new Num(2))
respectively.
In order to represent the structure of a DSL in a language independent way, we will often use context free grammar notation to describe its ASTs. So for example, for the arithmetic language above, we may simply write.
Search techniques
Explicit Enumeration
One class of search technique is Explicit enumeration. At a high-level, the idea is to explicitly construct different programs until one finds a program that satisfies the observations. In general, though, the space of possible programs that one can generate to satisfy a given specification is too large to enumerate efficiently, so a key aspect of these approaches is how to avoid generating programs that have no hope of satisfying the observations, or which can be shown to be redundant with other programs we have already enumerated. An important distinction in explicit enumeration techniques is whether they are top down or bottom up. In bottom up enumeration, the idea is to start by discovering low-level components and then discover how to assemble them together into larger programs. By contrast, top-down enumeration starts by trying to discover the high-level structure of the program first, and from there it tries to enumerate the low-level fragments. Essentially, in both cases we are explicitly constructing ASTs, but in one case we are constructing them from the root down, and in the other case we are constructing them from the leafs up.
Symbolic search
In explicit search, the synthesizer always maintains one or more partially constructed programs that it is currently considering. By contrast, in symbolic search techniques the synthesizer maintains a symbolic representation of the space of all programs that are considered valid. Different symbolic representations lead to different search algorithms. Two of the most popular symbolic representations in use today are Version Space Algebras and Constraint Systems.As an analogy, suppose we want to search for an integer value of n such that 4∗n=28. An enumerative search would try all the values one by one until it got to n=7, and then it would declare success. By contrast, a symbolic search technique may perform some algebraic manipulation to deduce that n=28/4=7. In this case, symbolic search is clearly better, but even for arithmetic, symbolic manipulation is not always the best choice. Binary search, for example, can be considered a form of explicit search that is actually quite effective in finding solutions to equations that may be too complicated to do algebraic manipulation efficiently.
Defining the space of programs.
One of the key design decisions when using program synthesis is the space of programs that will be considered. One way to do this is to simply define a small domain specific language, and then consider all possible programs within this language. Rather than using a specific programming language to describe the ASTs for the target language, it is common to describe the ASTs in the form of a context free grammar. Defining the space of programs as all the valid programs in a DSL has a number of advantages, especially when the language has a simple structure that can be defined as a context free grammar. First, it is easy to enumerate all programs in a DSL, or to sample randomly from the space. This makes this approach most popular under enumerative search strategies. In addition to a context free grammar, the language may have a type system that can help rule out solutions that will be clearly illegal.In contrast, Constraint-based approaches often rely on parametric representations of the space, where different choices of parameters correspond to different choices for what the program will look like. Parametric representations are more general than grammars; you can usually encode the space represented by a grammar with a parametric representation as long as you are willing to bound the length of programs you want to consider. These parametric programs are often referred to as generative models, especially when the choices controled by the free parameters have probabilities associated with them. In future lectures we will further explore these different design choices.
Example
As a running example, consider the following language:On the one hand, the language is very rich; it includes recursion, concatenation, sorting, search; you can write a ton of interesting programs with this language. For example, the program to reverse a list would be written as follows:
in: [1,2,3,4,5,6,7,8]out: [8,7,6,5,4,3,2,1]
Symmetries
One important aspect in defining the space of programs is the question of symmetries. In program synthesis, we say that a program space has a lot of symmetries if there many different ways of representing the same program. For example, consider the following grammar:1: Patrick H. Winston, Learning Structural Descriptions From Examples, 1970(bibtex)
2: David Canfield Smith, PYGMALION: A Creative Programming Environment, 1976(bibtex)
3: Phillip D. Summers, A Methodology for LISP Program Construction from Examples, 1976(bibtex)
4: Phillip D. Summers, A Methodology for LISP Program Construction from Examples, (bibtex)
5: Tessa A. Lau, Daniel S. Weld, Programming by Demonstration: An Inductive Learning Formulation, 1999(bibtex)
6: Tessa Lau, Why Programming-By-Demonstration Systems Fail: Lessons Learned for Usable AI, 2009(bibtex)