Lecture 2: Introduction to Inductive Synthesis
One of the simplest interfaces for program synthesis is Inductive synthesis. In inductive synthesis, the goal is to generate a function that matches a given set of input/output examples. The literature makes a distinction between Programming by Example (PBE), and Programming by Demonstration (PBD). In Programming by example, the goal is to infer a function given only a set of inputs and outputs, whereas in Programming by Demonstration, the user also provides a trace of how the output was computed. For example, in Programming-by-example, if I want to convey to the system that I want it to synthesize the factorial function I may give it an example:
$factorial(6) = 720$
As one can see, this is a highly under-specified problem, since there is an enormous number
of possible functions that may return $720$ given the input $6$. By contrast,
with programming-by-demonstration, one may provide a more detailed trace of the computation:
$factorial(6) = 6 * (5 * (4 * (3 * (2 * 1)))) = 720$
In general, the full trace in programming-by-demonstration contains more information that
makes it easier to infer the intended computation. The line between PBD and PBE can be blurry,
however. For some domains (like string manipulation), it’s relatively easy to derive a trace from an output,
and different systems may capture the trace at different levels of fidelity.
History
The idea of directing a computer through examples dates back to the 1970s, when Patrick Winston at MIT published the seminal work “Learning Structural Descriptions from Examples”Winston:1970. This work was among the first to look into the problem of generalizing from a set of observations, although it was not really about trying to automate programming. A good candidate for “first PBD system” is Pygmalion Smith:1976. Pygmalion was framed as an "interactive 'remembering' editor for iconic data structures"; the high-level idea was that the state of a program would be described through icons and relationships between icons. Computation would then correspond to manipulation of these icons, as stated in their "Basic pygmalion metaphor".
a program is a series of
EDITING CHANGES to a DISPLAY DOCUMENT. Input
to a program is an initial display document, i.e. a display screen
containing images. Programming consists of editing the document.
The result of a computation is a modified document containing the desired information.
(emphasis in the original)Smith:1976
The idea of a "remembering editor" was that if you perform such a manipulation
by hand once, the editor can remember how to perform the manipulation so you can
apply a similar manipulation in other contexts. Like a lot of early AI work,
PYGMALION was very heavy on philosophy and metaphor and very weak on algorithms,
particularly around the crucial question of how to generalize from a given
demonstration so that the learned program could apply in other situations.
At around the same time, Summers looked more systematically at the question of how to
generalize from a demonstration to a program, particularly at the question
of how to derive looping structure Summers:1976Summers:1977. His algorithm
was based on pattern matching, and was relatively brittle, but is still considered
an important algorithm in the field.
Over time, much of the interest from the AI community shifted to Machine Learning, and to approaches
to infer functions from large amounts of noisy data instead of a small number of careful demonstrations,
with very little progress being made in the PBD and PBE space. There was another burst of interest in the
mid 1990s that is best exemplified by the work of Tessa Lau, which tried to bring back insights from
machine learning into PBE/PBD. Tessa Lau started this work as a graduate student at UW while working with
Daniel Weld and she continued as a researcher at IBM Lau:1998. The goal was to move away from ad-hoc and brittle
approaches and to develop general techniques that could be adapted to a variety of PBE problems.
The work focused on two major techniques: Version space generalization and Inductive logic programming,
both of which will be covered later. This line of work caused a lot of excitement for a while,
but it petered out after it was clear that they were not solving the problem well enough to be practical.
Interestingly, only a couple of years before FlashFill launched the moden wave of programming-by-example systems,
Tessa Lau published an article titled "Why PBD systems fail: Lessons learned for usable AI"Lau09,
which articulated many of the pitfalls that prevented the success of PBD systems.
Framing the PBD/PBE Problem
What is a program?
This is a good point to consider the question of what we actually mean when we talk about a program in the context of program synthesis. A program is a description of how to perform a computation. In general, describing a program requires a notation, a programming language that allows you to describe many different computations by composing individual syntactic elements each with well defined meaning. We are all familiar with popular programming languages such asPyton
, JavaScript
or C
.
At the other extreme, the notation of arithmetic, for example, can also be considered a programming language; it includes
syntactic elements such as numbers and arithmetic operators ($+$, $-$, $\times$), each with
a well defined meaning which can be used to describe a particular kind of computation. Unlike a language
like Python
, the language of arithmetic is not universal; it can only be used to described
a very narrow class of computations, so we can use it to compute the tip in a restaurant bill, but not
to determine the smallest element in a list of numbers.
As we will see over the rest of this course, the choice of an appropriate notation is crucial to the
success of program synthesis. Even if your end goal is to generate code in Pyton
or C
,
it is often useful to frame the program synthesis problem in terms of a more narrow notation that
more precisely captures only those programs that are actually relevant to the task at hand.
We will often refer to these more specialized langauges as Domain Specific Languages (DSLs).
It is usually convenient to think of them simply as subsets of more general languages, where
the programmer is prevented from using particular constructs, and is provided only with a limited
set of functions or subroutines.
Throughout the rest of this course, we will be defining small DSLs as synthesis targets. Often, it will
be enough to define their semantics informally or through examples, or where more precision is warranted,
we will define it in terms of a general purpose programming language. In many settings, we will be using
the notation of functional programming, which will be familiar to anyone who has programmed in Haskell
or Ocaml
, but may seem a bit foreign to some. We will say more about this notation when we use it,
but at a high-level, these are a few things to keep in mind about this notation, and why it makes a good
target for synthesis:
- No side effects. Computation in a functional language happens by evaluating pure functions, with no side effects and no mutation. This can be annoying when programming by hand, but can often simplify the job of the program synthesizer. A consequence of this is that functions are actually pure functions, if you give them the same inputs, they will produce the same outputs. This can also simplify the reasoning process significantly.
-
Concise and expressive.
Think about a program in Java that reverses a list. It's a relatively long program that involves a class declaration, some method declarations,
some loops, some constructors; maybe it would look something like this:
class ListReverser{ static List reverseList(List myList) { List output = new ArrayList(); for (int i = 0; i < myList.size(); i++) { output.add(myList.get(myList.size() - 1 - i)); } return output; } } You can probably write code that is slightly cleaner than the code above, but not by much. From a synthesis perspective, that is a lot of code to write with a lot of opportunities to get it wrong. By contrast, in Haskell, a function to reverse a list can be defined like this:reverse lst = case lst of [] -> [] head:rest -> (reverse rest) ++ [head] The whole function is a single expression that says that if the list is an empty list, you just return the empty list, and if it is not empty then it will have a head followed by the rest of the list, and then you should reverse the rest of the list and concatenate that with a list containing only the head. This conciseness is very useful when synthesizing programs, because it allows you to synthesize non-trivial programs while only having to discover small amounts of code.
Representing a program
When programming, we are used to representing programs as text, strings of characters with indentation and special characters to indicate, for example, the beginning of a block of code. When synthesizing or manipulating code, however, we want to represent code as a data-structure. The most common representation is an Abstract Syntax Tree (AST), which is just a tree with different kinds of nodes for different kinds of constructs in the language. There is usually a very close correspondence between the structure of the AST and the structure of a parse tree of the program. What makes it abstract is that the data-structure can usually ignore information about things like spacing or special characters like brackets, colons and semicolons that are there just to make the program more readable. As an example, consider the language of arithmetic expressions. The syntax of such a language can be represented as a context free grammar.
$ \begin{array}{lcl}
expr & := & term ~ | \\
~&~& term + expr \\
term & := & ( expr ) ~ | \\
~&~& term * term \\
~&~& N \\
\end{array}
$
The grammar captures a lot of syntactic information about the language. It describes for example, that
in an expression $ 5 + 3 * 2 $, the multiplication takes precedence over the addition, but we can change
that by adding parenthesis around $5 + 3$.
An AST, however, can afford to ignore these syntactic details. For this example, we can define an AST as a
data-structure with three different types of nodes:
Plus (Num 5) (Times (Num 3) (Num 2))
, while the second would be constructed
as Times (Plus (Num 5) (Num 3)) (Num 2)
, or in Java,
new Plus( new Num(5), new Times( new Num(3), new Num(2)))
and
new Times( new Plus( new Num(5), new Num(3)), new Num(2))
respectively.
In order to represent the structure of a DSL in a language independent way, we will often use
context free grammar notation to describe its ASTs. So for example, for the arithmetic language above,
we may simply write.
$
\begin{array}{lll}
\begin{array}{lcl}
expr & := & N \\
~&~&|~ expr + expr \\
~&~&|~ expr * expr \\
\end{array} &
~~~~~~~~~~~~~
or
~~~~~~~~~~~~~
&
\begin{array}{lcl}
expr & := & N \\
~&~&|~ Plus(expr, expr) \\
~&~&|~ Times(expr, expr) \\
\end{array}
\end{array}
$
Search techniques
Explicit Enumeration
One class of search technique is Explicit enumeration. At a high-level, the idea is to explicitly construct different programs until one finds a program that satisfies the observations. In general, though, the space of possible programs that one can generate to satisfy a given specification is too large to enumerate efficiently, so a key aspect of these approaches is how to avoid generating programs that have no hope of satisfying the observations, or which can be shown to be redundant with other programs we have already enumerated. An important distinction in explicit enumeration techniques is whether they are top down or bottom up. In bottom up enumeration, the idea is to start by discovering low-level components and then discover how to assemble them together into larger programs. By contrast, top-down enumeration starts by trying to discover the high-level structure of the program first, and from there it tries to enumerate the low-level fragments. Essentially, in both cases we are explicitly constructing ASTs, but in one case we are constructing them from the root down, and in the other case we are constructing them from the leafs up. For example, suppose we want to discover a program $reduce ~ (map ~ in ~ \lambda x. x + 5) ~ 0 ~ (\lambda x. \lambda y. (x + y))$. In a bottom up search, you start with expressions like $(x+y)$ and $(x+5)$ and build from those expressions to the functions such as $\lambda x. x + 5$ and from there you assemble the full program. In contrast, top down search would start with an expression such as $reduce ~ \Box ~ \Box ~ \Box$, and then discover that the first parameter to reduce is $map ~ \Box ~ \Box$, and progressively complete the program down to the low-level expressions.Symbolic search
In explicit search, the synthesizer always maintains one or more partially constructed programs that it is currently considering. By contrast, in symbolic search techniques the synthesizer maintains a symbolic representation of the space of all programs that are considered valid. Different symbolic representations lead to different search algorithms. Two of the most popular symbolic representations in use today are Version Space Algebras and Constraint Systems. As an analogy, suppose we want to search for an integer value of $n$ such that $4*n = 28$. An enumerative search would try all the values one by one until it got to $n=7$, and then it would declare success. By contrast, a symbolic search technique may perform some algebraic manipulation to deduce that $n=28/4=7$. In this case, symbolic search is clearly better, but even for arithmetic, symbolic manipulation is not always the best choice. Binary search, for example, can be considered a form of explicit search that is actually quite effective in finding solutions to equations that may be too complicated to do algebraic manipulation efficiently.Defining the space of programs.
One of the key design decisions when using program synthesis is the space of programs that will be considered. One way to do this is to simply define a small domain specific language, and then consider all possible programs within this language. Rather than using a specific programming language to describe the ASTs for the target language, it is common to describe the ASTs in the form of a context free grammar. Defining the space of programs as all the valid programs in a DSL has a number of advantages, especially when the language has a simple structure that can be defined as a context free grammar. First, it is easy to enumerate all programs in a DSL, or to sample randomly from the space. This makes this approach most popular under enumerative search strategies. In addition to a context free grammar, the language may have a type system that can help rule out solutions that will be clearly illegal. In contrast, Constraint-based approaches often rely on parametric representations of the space, where different choices of parameters correspond to different choices for what the program will look like. Parametric representations are more general than grammars; you can usually encode the space represented by a grammar with a parametric representation as long as you are willing to bound the length of programs you want to consider. These parametric programs are often referred to as generative models, especially when the choices controled by the free parameters have probabilities associated with them. In future lectures we will further explore these different design choices.Example
As a running example, consider the following language:
$
\begin{array}{rcll}
lstExpr & := & sort(lstExpr) & \mbox{sorts a list given by lstExpr.} \\
~ & ~ & lstExpr[intExpr,intExpr] & \mbox{selects sub-list from the list given by the start and end position}\\
~ & ~ & lstExpr + lstExpr & \mbox{concatenates two lists}\\
~ & ~ & recursive(lstExpr) & \mbox{calls the program recursively on its argument list; if the list is empty, returns empty without a recursive call} \\
~ & ~ & [0] & \mbox{a list with a single entry containing the number zero} \\
~ & ~ & in & \mbox{the input list } in \\
intExpr &:= & firstZero(lstExpr) & \mbox{position of the first zero in a list} \\
~ & ~ & len(lstExpr) & \mbox{length of a given list} \\
~ & ~ & 0 & \mbox{constant zero} \\
~ & ~ & intExpr + 1 & \mbox{adds one to a number} \\
\end{array}
$
In this language, there are two types of expressions, list expressions $lstExpr$, which evaluate to a list, and
integer expressions $intExpr$ which evaluate to an integer. Programs in this language have only one input,
a list $in$.
On the one hand, the language is very rich; it includes recursion, concatenation, sorting, search; you can write a ton of interesting programs with this language. For example, the program to reverse a list would be written as follows:
$ recursive(in[0 + 1, len(in)]) + in[0, 0] $
Now, consider the following input/output example:
Symmetries
One important aspect in defining the space of programs is the question of symmetries. In program synthesis, we say that a program space has a lot of symmetries if there many different ways of representing the same program. For example, consider the following grammar:
$ \begin{array}{lcl}expr & := & var * N ~ | \\ ~&~&expr + expr \end{array} $
Now, if we wanted to generate the expression $w*5+ x*2 + y*3 + z*2$, the grammar above
allows us to generate it in many different ways.
$ \begin{array}{c}
(w*5+ x*2) + (y*3 + z*2) \\
w*5+ (x*2 + (y*3 + z*2)) \\
w*5+ ((x*2 + y*3) + z*2) \\
((w*5+ x*2) + y*3) + z*2 \\
\ldots
\end{array} $
So the grammar above is said to have a lot of symmetries. By contrast, we can define a program
space with the grammar below.
$ \begin{array}{lcl}expr & := & var * N ~ | \\ ~&~&(var * N) + Expr \end{array} $
Now, only the second expression in the list above can be generated by this grammar. This grammar
in effect forces right associativity of arithmetic expressions, significantly reducing the symmetries
in the search space. There are still symmetries due to commutativity of addition, but we have
eliminated at least one source of them.
Does this matter? It depends on the search technique and the representation we are using of the search space.
Constraint based techniques and some enumerative techniques can be extremely sensitive to symmetries, and will benefit enormously from
a representation of the space that eliminates as many of them as possible. On the other hand, there are
some techniques that we will study that are mostly oblivous to symmetries.