Lecture 2: Introduction to Inductive Synthesis

One of the simplest interfaces for program synthesis is Inductive synthesis. In inductive synthesis, the goal is to generate a function that matches a given set of input/output examples.

The literature makes a distinction between Programming by Example (PBE), and Programming by Demonstration (PBD). In Programming by example, the goal is to infer a function given only a set of inputs and outputs, whereas in Programming by Demonstration, the user also provides a trace of how the output was computed.

For example, in Programming-by-example, if I want to convey to the system that I want it to synthesize the factorial function I may give it an example:

$factorial(6) = 720$

As one can see, this is a highly under-specified problem, since there is an enormous number of possible functions that may return $720$ given the input $6$. By contrast, with programming-by-demonstration, one may provide a more detailed trace of the computation:

$factorial(6) = 6 * (5 * (4 * (3 * (2 * 1)))) = 720$

In general, the full trace in programming-by-demonstration contains more information that makes it easier to infer the intended computation. The line between PBD and PBE can be blurry, however. For some domains (like string manipulation), it’s relatively easy to derive a trace from an output, and different systems may capture the trace at different levels of fidelity.

History

The idea of directing a computer through examples dates back to the 1970s, when Patrick Winston at MIT published the seminal work “Learning Structural Descriptions from Examples”Winston:1970. This work was among the first to look into the problem of generalizing from a set of observations, although it was not really about trying to automate programming.

A good candidate for “first PBD system” is Pygmalion Smith:1976. Pygmalion was framed as an "interactive 'remembering' editor for iconic data structures"; the high-level idea was that the state of a program would be described through icons and relationships between icons. Computation would then correspond to manipulation of these icons, as stated in their "Basic pygmalion metaphor".

a program is a series of EDITING CHANGES to a DISPLAY DOCUMENT. Input to a program is an initial display document, i.e. a display screen containing images. Programming consists of editing the document. The result of a computation is a modified document containing the desired information. (emphasis in the original)Smith:1976

The idea of a "remembering editor" was that if you perform such a manipulation by hand once, the editor can remember how to perform the manipulation so you can apply a similar manipulation in other contexts. Like a lot of early AI work, PYGMALION was very heavy on philosophy and metaphor and very weak on algorithms, particularly around the crucial question of how to generalize from a given demonstration so that the learned program could apply in other situations.

At around the same time, Summers looked more systematically at the question of how to generalize from a demonstration to a program, particularly at the question of how to derive looping structure Summers:1976Summers:1977. His algorithm was based on pattern matching, and was relatively brittle, but is still considered an important algorithm in the field.

Over time, much of the interest from the AI community shifted to Machine Learning, and to approaches to infer functions from large amounts of noisy data instead of a small number of careful demonstrations, with very little progress being made in the PBD and PBE space. There was another burst of interest in the mid 1990s that is best exemplified by the work of Tessa Lau, which tried to bring back insights from machine learning into PBE/PBD. Tessa Lau started this work as a graduate student at UW while working with Daniel Weld and she continued as a researcher at IBM Lau:1998. The goal was to move away from ad-hoc and brittle approaches and to develop general techniques that could be adapted to a variety of PBE problems.

The work focused on two major techniques: Version space generalization and Inductive logic programming, both of which will be covered later. This line of work caused a lot of excitement for a while, but it petered out after it was clear that they were not solving the problem well enough to be practical. Interestingly, only a couple of years before FlashFill launched the moden wave of programming-by-example systems, Tessa Lau published an article titled "Why PBD systems fail: Lessons learned for usable AI"Lau09, which articulated many of the pitfalls that prevented the success of PBD systems.

Framing the PBD/PBE Problem

Lecture2:Slide12 There are two core challenges in the PBE/PBD paradigm. The first challenge is How do you find a program that matches the observations? Where the observations can be either input/output examples or richer execution traces as in PBD. The second challenge is How do you know the program you found is the one you were actually looking for? At the end of the day, both PBD and PBE are fundametally under-specified problems, and there is a potentially large space of possible programs that match the given observations, so how do we know which one is the one the user actually wants?

In traditional machine learning, the focus has historically been on the second challenge. The trick has been to pick spaces of programs that are either extremely expressive (e.g. neural networks) so that there are many different ways to match any set of observations and the challenge is how to avoid over-training. Or to pick a space that is too restricted (e.g. SVMs) such that it is more or less impossible to match all the observations, but you assume some observations are wrong anyway, so you can trade off how many samples you match against other criteria that make it more likely that your solution will work well enough in general.

The modern emphasis in PBE, however, has been to focus more on the space of programs itself. The focus on restricting the space of programs is not new; many early systems did this to an extreme case by just having a short list of programs that the system would scan through looking for one that matched the examples. What recent advances in synthesis have brought to the table are powerful mechanisms to search arbitrary program spaces. This has allowed us to design the space of programs in a way that excludes undesirable solutions from the space and focuses the search on "reasonable" programs. The ability to carefully control the program space does not completely eliminate the need to rank programs to give priority to the most likely ones. Even with a carefully designed program space, the problem is still underspecified. But the idea is that if you can search large but highly constrained space efficiently, you are more likely to get what you are looking for. By having the ability to reason about arbitrary (and very large) spaces of programs you can get the benefits of the list-of-programs approach without its inherent brittleness.

What is a program?

This is a good point to consider the question of what we actually mean when we talk about a program in the context of program synthesis. A program is a description of how to perform a computation. In general, describing a program requires a notation, a programming language that allows you to describe many different computations by composing individual syntactic elements each with well defined meaning. We are all familiar with popular programming languages such as Pyton, JavaScript or C. At the other extreme, the notation of arithmetic, for example, can also be considered a programming language; it includes syntactic elements such as numbers and arithmetic operators ($+$, $-$, $\times$), each with a well defined meaning which can be used to describe a particular kind of computation. Unlike a language like Python, the language of arithmetic is not universal; it can only be used to described a very narrow class of computations, so we can use it to compute the tip in a restaurant bill, but not to determine the smallest element in a list of numbers.

As we will see over the rest of this course, the choice of an appropriate notation is crucial to the success of program synthesis. Even if your end goal is to generate code in Pyton or C, it is often useful to frame the program synthesis problem in terms of a more narrow notation that more precisely captures only those programs that are actually relevant to the task at hand. We will often refer to these more specialized langauges as Domain Specific Languages (DSLs). It is usually convenient to think of them simply as subsets of more general languages, where the programmer is prevented from using particular constructs, and is provided only with a limited set of functions or subroutines.

Throughout the rest of this course, we will be defining small DSLs as synthesis targets. Often, it will be enough to define their semantics informally or through examples, or where more precision is warranted, we will define it in terms of a general purpose programming language. In many settings, we will be using the notation of functional programming, which will be familiar to anyone who has programmed in Haskell or Ocaml, but may seem a bit foreign to some. We will say more about this notation when we use it, but at a high-level, these are a few things to keep in mind about this notation, and why it makes a good target for synthesis:

No side effects. Computation in a functional language happens by evaluating pure functions, with no side effects and no mutation. This can be annoying when programming by hand, but can often simplify the job of the program synthesizer. A consequence of this is that functions are actually pure functions, if you give them the same inputs, they will produce the same outputs. This can also simplify the reasoning process significantly.
Concise and expressive. Think about a program in Java that reverses a list. It's a relatively long program that involves a class declaration, some method declarations, some loops, some constructors; maybe it would look something like this: class ListReverser{ static List reverseList(List myList) { List output = new ArrayList(); for (int i = 0; i < myList.size(); i++) { output.add(myList.get(myList.size() - 1 - i)); } return output; } } You can probably write code that is slightly cleaner than the code above, but not by much. From a synthesis perspective, that is a lot of code to write with a lot of opportunities to get it wrong. By contrast, in Haskell, a function to reverse a list can be defined like this: reverse lst = case lst of [] -> [] head:rest -> (reverse rest) ++ [head] The whole function is a single expression that says that if the list is an empty list, you just return the empty list, and if it is not empty then it will have a head followed by the rest of the list, and then you should reverse the rest of the list and concatenate that with a list containing only the head. This conciseness is very useful when synthesizing programs, because it allows you to synthesize non-trivial programs while only having to discover small amounts of code.

There are long-running debates that sometimes get very religious in nature as to what is the best programming language for this or that purpose. For this course, though, we are not really interested in the question of what language you should use for writing a particular system. What we really care about is the notation that we are going to use when framing a synthesis problem; this notation will often have to be problem specific, but it is very important that this notation be concise and have enough expressiveness to solve our problem, but not much more.

Representing a program

When programming, we are used to representing programs as text, strings of characters with indentation and special characters to indicate, for example, the beginning of a block of code. When synthesizing or manipulating code, however, we want to represent code as a data-structure. The most common representation is an Abstract Syntax Tree (AST), which is just a tree with different kinds of nodes for different kinds of constructs in the language. There is usually a very close correspondence between the structure of the AST and the structure of a parse tree of the program. What makes it abstract is that the data-structure can usually ignore information about things like spacing or special characters like brackets, colons and semicolons that are there just to make the program more readable. As an example, consider the language of arithmetic expressions. The syntax of such a language can be represented as a context free grammar.

$ \begin{array}{lcl} expr & := & term ~ | \\ ~&~& term + expr \\ term & := & ( expr ) ~ | \\ ~&~& term * term \\ ~&~& N \\ \end{array} $

The grammar captures a lot of syntactic information about the language. It describes for example, that in an expression $ 5 + 3 * 2 $, the multiplication takes precedence over the addition, but we can change that by adding parenthesis around $5 + 3$. An AST, however, can afford to ignore these syntactic details. For this example, we can define an AST as a data-structure with three different types of nodes: data AST = Num Int | Plus AST AST | Times AST AST The block above is Haskell notation that says that an AST type is either a Num, which includes an integer value, or a Plus with two child ASTs or a Times with two child ASTs. You can define similar types in other languages. Note that the distinction between expressions and terms is no longer relevant in this abstract notation, it was only introduced in the grammar for the purpose of disambiguating the parsing of expressions like $5 + 3 * 2$. The choice of language is not particularly important; we can represent the AST in Java as a class hierarchy as shown below, although as usual you can see that the Java version is significantly more verbose. abstract class AST { ... } class Num extends AST{ int val; public Num(int val){ this.val = val; } ... } class Plus extends AST{ AST left; AST right; public Plus(AST left, AST right){ this.left =left; this.right = right; } ... } class Times extends AST{ AST left; AST right; ... } In the AST representation, the distinction between $5 + (3*2)$ and $(5+3)*2$ does not need parenthesis. Instead, it is directly reflected in the data-structure. The first would be constructed as Plus (Num 5) (Times (Num 3) (Num 2)), while the second would be constructed as Times (Plus (Num 5) (Num 3)) (Num 2), or in Java, new Plus( new Num(5), new Times( new Num(3), new Num(2))) and new Times( new Plus( new Num(5), new Num(3)), new Num(2)) respectively.

In order to represent the structure of a DSL in a language independent way, we will often use context free grammar notation to describe its ASTs. So for example, for the arithmetic language above, we may simply write.

$ \begin{array}{lll} \begin{array}{lcl} expr & := & N \\ ~&~&|~ expr + expr \\ ~&~&|~ expr * expr \\ \end{array} & ~~~~~~~~~~~~~ or ~~~~~~~~~~~~~ & \begin{array}{lcl} expr & := & N \\ ~&~&|~ Plus(expr, expr) \\ ~&~&|~ Times(expr, expr) \\ \end{array} \end{array} $

Search techniques

Lecture2:Slide17 In any program synthesis system, the Invention challenge involves two questions: What is the space of programs, and how is it going to be searched. In the context of inductive synthesis, we will explore a few different classes of search techniques and their associated representations.

Explicit Enumeration

One class of search technique is Explicit enumeration. At a high-level, the idea is to explicitly construct different programs until one finds a program that satisfies the observations. In general, though, the space of possible programs that one can generate to satisfy a given specification is too large to enumerate efficiently, so a key aspect of these approaches is how to avoid generating programs that have no hope of satisfying the observations, or which can be shown to be redundant with other programs we have already enumerated. An important distinction in explicit enumeration techniques is whether they are top down or bottom up. In bottom up enumeration, the idea is to start by discovering low-level components and then discover how to assemble them together into larger programs. By contrast, top-down enumeration starts by trying to discover the high-level structure of the program first, and from there it tries to enumerate the low-level fragments. Essentially, in both cases we are explicitly constructing ASTs, but in one case we are constructing them from the root down, and in the other case we are constructing them from the leafs up.

For example, suppose we want to discover a program $reduce ~ (map ~ in ~ \lambda x. x + 5) ~ 0 ~ (\lambda x. \lambda y. (x + y))$. In a bottom up search, you start with expressions like $(x+y)$ and $(x+5)$ and build from those expressions to the functions such as $\lambda x. x + 5$ and from there you assemble the full program. In contrast, top down search would start with an expression such as $reduce ~ \Box ~ \Box ~ \Box$, and then discover that the first parameter to reduce is $map ~ \Box ~ \Box$, and progressively complete the program down to the low-level expressions.

Symbolic search

In explicit search, the synthesizer always maintains one or more partially constructed programs that it is currently considering. By contrast, in symbolic search techniques the synthesizer maintains a symbolic representation of the space of all programs that are considered valid. Different symbolic representations lead to different search algorithms. Two of the most popular symbolic representations in use today are Version Space Algebras and Constraint Systems.

As an analogy, suppose we want to search for an integer value of $n$ such that $4*n = 28$. An enumerative search would try all the values one by one until it got to $n=7$, and then it would declare success. By contrast, a symbolic search technique may perform some algebraic manipulation to deduce that $n=28/4=7$. In this case, symbolic search is clearly better, but even for arithmetic, symbolic manipulation is not always the best choice. Binary search, for example, can be considered a form of explicit search that is actually quite effective in finding solutions to equations that may be too complicated to do algebraic manipulation efficiently.

Defining the space of programs.

One of the key design decisions when using program synthesis is the space of programs that will be considered. One way to do this is to simply define a small domain specific language, and then consider all possible programs within this language. Rather than using a specific programming language to describe the ASTs for the target language, it is common to describe the ASTs in the form of a context free grammar. Defining the space of programs as all the valid programs in a DSL has a number of advantages, especially when the language has a simple structure that can be defined as a context free grammar. First, it is easy to enumerate all programs in a DSL, or to sample randomly from the space. This makes this approach most popular under enumerative search strategies. In addition to a context free grammar, the language may have a type system that can help rule out solutions that will be clearly illegal.

In contrast, Constraint-based approaches often rely on parametric representations of the space, where different choices of parameters correspond to different choices for what the program will look like. Parametric representations are more general than grammars; you can usually encode the space represented by a grammar with a parametric representation as long as you are willing to bound the length of programs you want to consider. These parametric programs are often referred to as generative models, especially when the choices controled by the free parameters have probabilities associated with them. In future lectures we will further explore these different design choices.

Example

As a running example, consider the following language:

$ \begin{array}{rcll} lstExpr & := & sort(lstExpr) & \mbox{sorts a list given by lstExpr.} \\ ~ & ~ & lstExpr[intExpr,intExpr] & \mbox{selects sub-list from the list given by the start and end position}\\ ~ & ~ & lstExpr + lstExpr & \mbox{concatenates two lists}\\ ~ & ~ & recursive(lstExpr) & \mbox{calls the program recursively on its argument list; if the list is empty, returns empty without a recursive call} \\ ~ & ~ & [0] & \mbox{a list with a single entry containing the number zero} \\ ~ & ~ & in & \mbox{the input list } in \\ intExpr &:= & firstZero(lstExpr) & \mbox{position of the first zero in a list} \\ ~ & ~ & len(lstExpr) & \mbox{length of a given list} \\ ~ & ~ & 0 & \mbox{constant zero} \\ ~ & ~ & intExpr + 1 & \mbox{adds one to a number} \\ \end{array} $

In this language, there are two types of expressions, list expressions $lstExpr$, which evaluate to a list, and integer expressions $intExpr$ which evaluate to an integer. Programs in this language have only one input, a list $in$. On the one hand, the language is very rich; it includes recursion, concatenation, sorting, search; you can write a ton of interesting programs with this language. For example, the program to reverse a list would be written as follows:

$ recursive(in[0 + 1, len(in)]) + in[0, 0] $

Now, consider the following input/output example: in: [1,2,3,4,5,6,7,8] out: [8,7,6,5,4,3,2,1] If I had the full expressiveness of a general purpose language, say Haskell or Python, there would be an infinite number of programs that could potentially match the example above. But our sample DSL is much more restricted; in fact, it is so restricted that the shortest program matching the example above is actually the correct reversal program. So we can see that the right choice of language can have a significant impact on the our ability to discover programs.

Symmetries

One important aspect in defining the space of programs is the question of symmetries. In program synthesis, we say that a program space has a lot of symmetries if there many different ways of representing the same program. For example, consider the following grammar:

$ \begin{array}{lcl}expr & := & var * N ~ | \\ ~&~&expr + expr \end{array} $

Now, if we wanted to generate the expression $w*5+ x*2 + y*3 + z*2$, the grammar above allows us to generate it in many different ways.

$ \begin{array}{c} (w*5+ x*2) + (y*3 + z*2) \\ w*5+ (x*2 + (y*3 + z*2)) \\ w*5+ ((x*2 + y*3) + z*2) \\ ((w*5+ x*2) + y*3) + z*2 \\ \ldots \end{array} $

So the grammar above is said to have a lot of symmetries. By contrast, we can define a program space with the grammar below.

$ \begin{array}{lcl}expr & := & var * N ~ | \\ ~&~&(var * N) + Expr \end{array} $

Now, only the second expression in the list above can be generated by this grammar. This grammar in effect forces right associativity of arithmetic expressions, significantly reducing the symmetries in the search space. There are still symmetries due to commutativity of addition, but we have eliminated at least one source of them. Does this matter? It depends on the search technique and the representation we are using of the search space. Constraint based techniques and some enumerative techniques can be extremely sensitive to symmetries, and will benefit enormously from a representation of the space that eliminates as many of them as possible. On the other hand, there are some techniques that we will study that are mostly oblivous to symmetries.

Introduction to Program Synthesis