Lecture 3: Bottom Up Explicit Search.
Simple bottom up search
The simplest bottom up synthesis algorithm works by explicitly constructing all possible programs from a grammar starting with the terminals in the language. As one can imagine, this can be very inefficient, since the space of all expressions grows very large even with very small programs. The key idea behind this algorithm is to prune the set of primitives at every step by eliminating those that are deemed to be "observationally equivalent"; i.e. those which produce the same outputs on those inputs that were given as a specification. The key ideas of this algorithm were first presented in a paper by Albarghouthi, Gulwani and Kinkaid AlbarghouthiGK13, although a very similar algorithm was discovered independently and presented only a few months later by Udupa et al.Udupa:2013. The high-level algorithm is shown below.grow
operation, which uses the non-terminals in the grammar to
construct new terms from all the terms in plist
, and the elimEquivalents
step, which
eliminates all terms that are deemed to be redundant by virtue of being equivalent to other terms in the list.
A key idea behind this algorithm is that the check of equivalence is not an real equivalence check, which would
be expensive. Instead, the expressions are tested on the target inputs, and any two expression that produce
the same outputs on these inputs are deemed equivalent, regardless of whether they are truly equivalent or not.
This is what is referred to as "observational equivalence", the idea being that since we only care about the
behavior of the synthesized program on the given inputs, any behavior difference on other inputs is irrelevant.
plist
will contain
all the terminals in the grammar: in, [0], 0
. After the first call to grow
, the
set of expressions grows quite dramatically, as it now includes all expressions that can be creating by composing
the original terminals using the different production rules in the grammar. As we can see in the figure, however,
many of these expressions are clearly equivalent. For example, sort([0])
is equivalent to [0]
,
and firstZero([0])
is equivalent to 0
. More interestingly, in[0,0]
and
[0][0,0]
may not be equivalent in general, but if I only have two inputs [0,7,3,2,5,6,3]
and [0,2,13,5,9,1,0]
,
then in[0,0]
and [0][0,0]
are observationally equivalent, since they are equivalent
on all available inputs; on any program that uses in[0,0]
, we can replace it with
[0][0,0]
, and it will produce the same output on the given inputs.
For each equivalence class of observationally equivalent
programs, elimEquivalents
eliminates all but one of them. Each equivalent program that is eliminated
actually leads to exponential savings, since it spares us from having to construct the exponentially many programs
that could be constructed from that sub-program.
This algorithm as described is extremely simple, but it is already quite powerful. First, it naturally explores
small programs before large programs, so it automatically finds the smallest program satisfying the specification.
Additionally, it is easy to introduce heuristics into grow
and elimEquivalents
to direct
the search for programs so that programs that are deemed more desirable are discovered first, or to speed up the search
based on prior knowledge of which programs are more likely to be correct. A second benefit of this algorithm is that
it works with black-box language building blocks. For example, for the language above, there is no need to have source
code for sort
or firstZero
; the algorithm just needs to be able to execute them.
The algorithm also does a good job of coping with symmetries, and is able to exploit properties of the building
blocks even without access to their source code; for example, even without being told that sort
is idempotent, the algorithm would immediately discard sort(sort(in))
after deeming it equivalent
to sort(in)
. However, the algorithm is not a magic bullet, there are some important conditions that need
to be met for the algorithm to work correctly, conditions that our language in the running example doesn't actually meet!
Formal Requirements
The bottom-up search algorithm comes with some important formal requirements. The most important requirement is that the semantics of a program fragment should not depend on the context. The algorithm relies on being able to prune an expression $e_2$ on the basis of there being another expression $e_1$ with the same semantics already in the search space. But this only works if the following condition is satisfied:
Context independent equivalence: Given two expressions $e_1$ and $e_2$ evaluated on inputs $\sigma$, bottom up search requires that the following condition be true:
$\forall \sigma. ObsEquiv(e_1, e_2,\sigma) \Rightarrow \forall \mathcal{C}.
ObsEquiv(\mathcal{C}[e_1], \mathcal{C}[e_2],\sigma)$
Where $ObsEquiv(e_1, e_2, inputs)$ is the observational equivalence of two expressions under a given set of inputs.
In the notation $\mathcal{C}[e]$, $\mathcal{C}$ is a context, i.e. an expression with a hole,
and $\mathcal{C}[e]$ is the expression you get by filling the hole in $\mathcal{C}$ with the expression
$e$. In other words, if two expressions $e_1$ and $e_2$ are equivalent for a given set of inputs,
then any expression that includes $e_1$ as a sub-expression will be equivalent to the same expression with $e_2$ replacing $e_1$.
As an example, the property clearly holds for arithmetic expressions. For example, the expression
$(x+2)/5$ equals $(x-2)$ when $x=3$. Therefore, in any expression that includes $(x+2)/5$ as a
sub-expression we can replace $(x+2)/5$ by $(x-2)$ and its value will remain unchanged for $x=3$.
This follows from the fact that arithmetic expressions are evaluated in a bottom up fashion irrespective
of their context.
On the other hand, the property will generally not hold for imperative programs. For example, even though the expressions above are equal when $x=3$, the programs below will not be equal even when fed the value $x=3$.
Where $ObsEquiv(e_1, e_2, inputs)$ is the observational equivalence of two expressions under a given set of inputs.
rec(in[1, len(in)])
and []
. For the input
list [1,2,3]
, the two expressions evaluate to the empty list. But they will not be equivalent in all contexts.
In particular, the expressions rec(in[1, len(in)])+[0]
and []+[0]
will produce very different outcomes.
Does this mean that we will not be able to use bottom-up search with observational equivalence on this language?
Not quite, but we will have to be more careful in defining the notion of observational equivalence that will be used to prune the search space; doing
this correctly can be quite tricky.
For example, one seemingly quick fix is to simply enforce that expressions with rec
will never be observationally equivalent to anything else.
This will prevent the context sensitive calls from being merged without affecting the ability to merge context insensitive calls. This will
handle the case where $e_1$ and $e_2$ have rec
calls, but will not guarantee context independent equivalence. Why not?
Because the context may also have calls to rec
. So for example, consider the expressions in[0,0]
and
[0]
. For the input [0,1,2,3]
the two expressions are the same, but under that input, the expressions
rec(in[1, len(in)])+in[0,0]
and rec(in[1, len(in)])+[0]
are not equivalent.
A different way to salvage bottom up search for this language, or for any language that involves recursive calls, is to
require trace completeness. This means that you require that the user provided examples include examples for any recursive call.
For example, if I want to provide examples for the reverse
function, I would provide the example [1,2,3]=>[3,2,1]
but trace completeness would also require me to provide the example [2,3]=>[3,2]
. Now, when constructing examples bottom-up,
any expression that includes rec
can be checked to see if there is an example with those arguments. If there is, then
we know from the example what the result of the rec
call should be, so the call does not actually need to be evaluated. And if the arguments
are not in the list of examples, the expression can be ruled out.
Trace completeness may sound like a cop-out, and in some ways it is. For programming by example problems, requiring trace completeness may be
asking too much from the user. But there are many applications of program synthesis, for example for reverse engineering, where it is relatively
cheap to request additional examples, so for those applications, trace completeness may be an adequate solution.
Even when context indepedent equivalence is satisfied, there is another important limitation
of the algorithm: scalaibility.
Even with very aggressive prunning it is hard to scale the algorithm beyond a handful of terms. Moreover,
while the algorithm does a good job at discovering how to connect discrete components, it does very poorly when there
is a need to discover constants in a program, whether integers or reals.
Synthesis through Unification (STUN)
One way to address the scalability challenge is to modularize the search. One way to do this is that rather than trying to synthesize a program that works for all inputs in one shot, one can search for multiple programs that work for different situations and then find a way of combining them together into a program that works for all inputs. This idea was first formalized by Alur, Cerny and Radhakrishna Alur2015 in an algorithm called Synthesis through Unification(STUN). To illustrate the idea, suppose you are given the following input output pairs:
$
\begin{array}{rcll}
intExpr &:= & fst & \mbox{first element of the input pair} \\
~ & ~ & snd & \mbox{second element of the input pair} \\
~ & ~ & 0 & \mbox{constant zero} \\
~ & ~ & intExpr ~+~ 1 & \mbox{adds one to a number} \\
~ & ~ & if(boolExpr)~ intExpr~ else~ intExpr & \mbox{conditional expression} \\
boolExpr & := & intExpr ~>~ intExpr & \mbox{integer comparison} \\
~ & ~ & boolExpr ~\&~ boolExpr & \mbox{conjunction}
\end{array}
$
The simple bottom-up search would require enumerating a fairly large set of programs, before hitting on the correct program,
but we can also observe that the program $fst$ actually works for half of the given inputs, and the program snd
works for all
the others. Moreover, we can see that all the inputs on which fst
produces the correct answer have the property that
fst < snd
, so from that we can construct the program if(fst < snd) fst else snd
. The STUN approach makes this intuition systematic,
by providing a strategy for synthesizing programs that work for subsets of the inputs and then discovering how to combine
them into a complete program that works for all inputs. The general strategy works even for cases where we cannot
simply introduce arbitrary branches, but there is also a more specialized version of the approach that works particularly well
in cases where you can introduce branches Alur2017.
STUN
procedure on those inputs on which
the current program Prog
does not work. This will produce a new program Prog'
, which works
on all those inputs. Then the two programs Prog
and Prog'
need to be unified into a single
program that works for all inputs.
The full algorithm also needs to deal with the scenario where either the recursive call to STUN or the generation
of a better program fail to find a program. In some cases, unification may require the programs to satisfy some
additional conditions in order to succeed, so the algorithm needs to track those as well. Both of these are low-level
details not captured by the figure. In the paper, the crucial decision of whether to try to continue to refine the current solution,
or to recursively call the STUN procedure is handled by a simple heuristic: pick a random input, if that input fails,
use that input to search for a better solution, if it succeeds, then perform the recursive call.
This is a crude heuristic based on the intuition
that the recursive call only happens when the current solution already works for a high-enough fraction of the current inputs.
This is the general framework, but any specific instance of the algorithm needs to address the question of
how to actually perform the unification of the two programs that work on different sets
of inputs.
Example: Arithmetic with top-level branches
For this example, consider the following problem. We have a range of values, represented by its two extremes $(a,b)$, and we want to discover the lower bound of the new range when we multiply this range by another value $c$. Below are a few examples which will become the input to our algorithm.
$
\begin{array}{rcl}
expr &=& expr + expr \\
~&~& |~ expr * expr \\
~&~& |~ a ~|~ b ~|~ c ~ |~ - expr \\
~&~& if(bexp) ~ expr ~ else ~ expr \\
bexp &=& expr > expr ~|~ expr > 0 \\
\end{array}
$
In the animation above, it is possible to see how after reaching only depth 2
of an explicit search, we can already find an expression that works for a subset of the
inputs. In fact, we can find expressions that together work for all the inputs.
So now, the challenge is to unify them into a single expression that works for all inputs.
In this case, we do the unification by discovering a branch that separates the inputs that
work correctly for one branch from the inputs that work correctly for another.
In this case, we can easily find that the expression c > 0
precisely separates
the two cases, so we can unify the two solutions into a general program
if(c > 0) a * c else b * c
.
What if the language does not have top level branches?
Consider the following alternative variant of the language above.
$
\begin{array}{rcl}
var &=& a ~|~ b ~|~ c\\
expr &=& expr + expr \\
~&~& |~ expr * expr \\
~&~& |~ var ~ |~ - expr \\
~&~& if(bexp) ~ var ~ else ~ var \\
bexp &=& expr > expr ~|~ expr > 0 \\
\end{array}
$
The language is very similar, but now we are left without the option of performing unification
by introducing top-level branches.
This means we need a different mechanism for performing unification, that is, for joining
the two programs that work for different subsets of the input into a new program that
works for all inputs. In order to do this, we will use something that in
the literature is known as antiunification (even though the STUN paper Alur2015
does not use that terminology).
Unification and Antiunification
In the literature, the term unification generically means finding a common structure for two different expressions. More precisely, however, the literature distinguishes between Unification, where you find the common structure by replacing variables with expressions, and Antiunification, where you find the common structure by replacing expressions with variables. So for example, if I have two expressions $x + 5$ and $7 + y$, unification finds the common structure as $Unify(x+5, 7+y) = [7 + 5, (x->7, y->5)]$. That is, it tells us that the two expressions can be turned into the same expression by replacing $x$ with 7 and $y$ with 5. By contrast, if you have two expressions $7*3 + 2$ and $5*3+2$, antiunification identifies the common structure by introducing variables as $Antiunify(7*3+2, 5*3+2) = x*3+2$. The STUN paper does not actually use this terminology, but it is actually fairly standard, particularly in the inductive logic programming literature. So just keep in mind when reading the STUN paper that it uses Unification in the more generic sense that can potentially mean antiunification depending on the context.STUN without top-level branches
Without the ability to introduce branches, we need an alternative way of combining together two expressions. In the example above, once we discover that there are two expressionsa*c
and b*c
that together cover all the inputs,
we can use antiunification to produce a common expression $v$*c
where
$v$ stands for a fragment of code, so now we need to discover what this
missing code fragment is by recursively solving a (hopefully smaller) synthesis problem.
In this case, we will discover that $v$=if(c>0) a else b
.
One thing to note is that when the expression $b*c$ was discovered, the synthesizer
could have just as easily discovered the expression $-b$. This would have been a problem,
because whereas antiunification of $b*c$ and $a*c$ worked exactly as expected,
the expression $-b$ cannot be antiunified with $a*c$ except by making the whole
expression a variable. The way STUN deals with this is that when a recursive call to
STUN is performed, the algorithm can also pass additional constraints that the
expression discovered by the recursive call must satisfy. Therefore, when recursively
calling STUN with the example (8, 11)* -1 => -11
, we can impose the
additional constraint that the discovered expression must be antiunified with
$a*c$. This would force the recursive call to STUN to produce $b*c$ instead of $-b$.
The original STUN paperAlur2015 describes other examples of the Unification
operation $\oplus$. In particular, it describes an approach for bit-vectors that
is also based on antiunification, but we will not describe it here.
Hierarchical Search
The approaches described earlier provide a way of modularizing the search by first discovering components that work on some fragments of the input space and then discovering how to weave them together to cover the entire space. This input-based modularization is only one of many different approaches of breaking up the search into independent search spaces. An alternative approach to improve the scalability of bottom-up search is to search the space hierarchically. This can be done in cases where the program can be split into different levels of abstraction, and where one can perform the search at each level independently. One example of this is the recent work by Wang, Cheung and Bodik Wang:2017. In this case, their goal is to synthesize complex SQL queries from examples. The key insight in their paper is that the problem can be decomposed in a hierarchical way. To understand how this can work, consider the following query language, which is a much simplified version of the language their system actually supports.
$
\begin{array}{rcl}
Rel &=& T \\
&|& \mbox{Select } Fields \mbox{ from } Rel \mbox{ where } Pred\\
&|& Rel , Rel \\
Pred & = & exp = exp \\
& | & exp > exp \\
& | & Pred ~\&~ Pred \\
Fields & = & table.name \mbox{ as } name \\
& | & table.name \mbox{ as } name, Fields\\
\end{array}
$
In the language above, a relation is either the name of a table $T$, a query on a
relation, or a cross product on two different relations.
Now, suppose you are given some tables, for example:
Building=A1
,
or the fact that we are joining on matching the Dept
code.
The key idea in their hierarchical search approach is to perform the search in two levels.
First, use a simple bottm-up search to discover the structure of the query,
in a language that replaces all predicates with holes as shown below.
For each query in that language that produces output in the right shape,
we can now search to see if there is a set of predicates that can be
inserted into the holes that will cause the query to produce the right
set of records.
$
\begin{array}{rcl}
Rel &=& T \\
&|& \mbox{Select } Fields \mbox{ from } Rel \mbox{ where } \Box\\
&|& Rel , Rel \\
Fields & = & table.name \mbox{ as } name \\
& | & table.name \mbox{ as } name, Fields\\
\end{array}
$
An important ingredient in making this work is that we need to be able
to evaluate the queries with holes and tell whether they can potentially
produce an output with the right shape. The key idea is to define a semantics
for queries with holes that is guaranteed to produce a superset of the
records that any instantiation of the holes may produce. This is easy
to do in the language above by simply treating every hole as $True$,
although in the paper it gets a little more tricky because they also support
aggregates. In general, though, if the queries with holes are guaranteed
to produce a superset of the results, then we can search the space of
queries with holes and safely rule out any query that does not produce
a superset of our desired results.
In the case of the example above, we may find that both the queries
below produce results of the right shape.