Lecture 7: Constraint-based Synthesis with Sketch.
The techniques described in the previous lecture used symbolic representations of
the program space, but they still involved a fair amount of enumeration. We now
focus on a class of techniques that are "more symbolic", and have more flexibility
in capturing complex program spaces, although at the expense of a significant computational
cost.
For this lecture and the next, we will be using the Sketch synthesis system
as a canonical example
sketchthesis, although there are other
systems such as Brahma which are based on similar principles
Jha:2010. The similarities
and differences between these different systems will be elaborated at the end of the unit.
Constraint-based synthesis at a glance
The high-level idea in constraint-based synthesis is to represent the program space as
a parametric program $P[c]$, so that different values of the parameters correspond
to different programs in the space. The idea is to translate requirements on the behavior
of the program $P[c]$ into constraints on the parameters $c$, so that any value of $c$
that satisfies the constraints $\varphi(c)$ is guaranteed to lead to a program $P[c]$ satisfying all
the requirements.
In order for this approach to work, we need three ingredients.
First, we need a mechanism for creating parametric programs from a high-level
definition of the program space. Second, we need a mechanism for constructing
constraint systems from these parametric programs and their requirements, and finally,
we need efficient mechanisms for solving the resulting constraint systems.
We start by addressing the first point.
From program spaces to parametric programs.
There are two major approaches for defining the parametric programs that are
the starting point of constraint-based synthesis. The first approach is to
provide the user with a high-level notation for describing a program
space, and then have a compiler that converts this definition into a
parametric program. This is the approach taken by Brahma or by the SyGuS solvers.
In the case of Brahma, the user simply provides a bag of components, and
the system automatically produces a parametric program where different choices
of parameters correspond to different ways of connecting the components together.
In the case of the SyGuS solvers, the user provides a context-free grammar
for a space of expressions, and the solver generates a parametric program
from this grammar.
The alternative approach, implemented in Sketch,
is to provide the user with a rich and expressive language for directly writing
parametric programs. This expressiveness provides the programmer with significant
control over the program space and its encoding as a parametric program. That control
allows an expert user to carefully engineer a program space to maximize the efficiency
of the synthesis process, but it also introduces an extra level of complexity for
less sophisticated users who must deal with the added complexity of defining their
program space as a parametric program. Sketch tries to alleviate this burden by
providing powerful abstraction facilities that allow potentially complex definitions
of program spaces to be encapsulated and reused across many different programs.
Sketch: a language for parametric programs.
The most authoritative source for the sketch language is the
sketch manual.
In this section, we provide a brief overview of the key principles behind the language.
At a high-level, sketch is a simple imperative language with support for many
of the features we have come to expect from modern languages including heap
allocated structures, high-order functions and polymorphism (known as generics in Java).
There are three features, however, that distinguish Sketch from other languages:
Unknown constants, harnesses and generator functions.
Unknown constants. An unknown constant in Sketch is expressed as $??$.
The type of this constant is inferred from context; it can be an integer,
a boolean, a character or a fixed size array of either. At synthesis time, sketch
replaces each unknown constant with a fixed constant so that all the
requirements are satisfied. For example, the simplest sketch program
that illustrates the main ideas in the language is shown below.
int doublevalue(int in){
int t = in * ??;
assert t == in + in;
return t;
}
In the program, the unknown constant must be replaced with an integer constant.
The assertion imposes the requirement that
t==in+in
, which clearly
forces the unknown constant to resolve to the number 2. The assertion, however,
is only valid in the context of a
test harness.
Test harnesses. A test harness is simply a function that when invoked
must not trigger any assertion violations. For example, in order to force
the
doublevalue
function above to synthesize to the correct function,
we can use the following test harness.
harness void test1(){
doublevalue(5);
doublevalue(7);
doublevalue(3);
}
We could have also excluded the
assert
inside the
doublevalue
function
itself and instead placed the assertion in the test harness.
harness void test1(){
assert doublevalue(5) == 10;
assert doublevalue(7) == 14;
assert doublevalue(3) == 6;
}
Since we are focusing on the inductive synthesis case, we will focus on the case where
the test harness does not take any inputs, and instead just invokes the desired functions
using fixed values. Later in the course we will explore more general test harnesses that
can impose constraints that must hold for all inputs.
Lecture7:Slide5;
Lecture7:Slide6
Generator function. At this point, we already have a language expressive enough
to discover some interesting aspects of a program. For example, if a program
involves an affine expression, say over a variable
x
, but we do not want to have to think about the
constants involved, we can just express it as
x*??+??
. Or, for example,
if at some point we are not sure whether we should use variable
x
or variable
y
, we can use
?? ? x : y
, using the
?:
ternary operator
like the one available in C. In order to support the description of more general program
spaces, we need some additional machinery, which we borrow from the generative programming literature.
In particular, Sketch uses the notion of a
generator, which looks like a function,
but with the property that it will get fully inlined and partially evaluated into
its calling context.
As a simple example directly from the Sketch manual, consider the problem of specifying the set of linear functions of two
parameters
x
and
y
.
That space of functions can be described with the following simple
generator function:
generator int legen(int i, int j){
return ??*i + ??*j + ??;
}
The generator function can be used anywhere in the code in the same way a function would, but the
semantics of generators are different from functions. In particular, every call to the generator
will be replaced by a concrete piece of code in the space of code fragments defined by the
generator. Different calls to the generator function can produce different code fragments. For
example, consider the following use of the generator.
harness void main(int x, int y){
assert legen(x, y) == 2*x + 3;
assert legen(x,y) == 3*x + 2*y;
}
Calling the solver on the above code produces the following output
void _main (int x, int y){
assert ((((2 * x) + (0 * y)) + 3) == ((2 * x) + 3));
assert (((3 * x) + (2 * y)) == ((3 * x) + (2 * y)));
}
Note that each invocation of the generator function was replaced by a concrete code fragment
in the space of code fragments defined by the generator.
Lecture7:Slide9;
Lecture7:Slide11
Up to this point, though, the generator may seem like a typesafe macro, little more
than syntactic sugar. What gives generators their real power is the ability to
be recursive. For example, the generator in the figure, describes a grammar of
expressions, which can either be a variable
x
,
an unknown bit-vector constant, or the bitwise combination or bitwise negation of recursively
generated expressions. Each recursive invocation of the generator can have its own distinct
values for the unknown constants. This idiom of using generators to define a
space of programs as a context free grammar is quite common
across many different applications of Sketch.
In addition to being recursive, generators can also be high-order,
meaning that they can take other functions or even other generators as parameters.
An example of this is the
rep
generator also shown in the figure.
This generator takes as a parameter a function or a generator
f
, and
applies it $n$ times.
generator void rep(int n, fun f){
if(n>0){
f();
rep(n-1, f);
}
}
This very simple generator actually implements a very important
computational pattern, one where a particular kind of operation needs to be performed
multiple times, but each time may actually correspond to a distinct operation.
For example, consider the code below:
bit[32] reverseSketch(bit[32] in) {
bit[32] t = in;
int s = 1;
generator void tmp(){
bit[32] m = ??;
t = ((t << s) & m )| ((t >> s) & (~m));
s = s*??;
}
rep(??, tmp);
return t;
}
The goal of the sketch above is to reverse the bits in a 32-bit word through
a combination of shifts and masks.
The generator
tmp
reflects the basic computational pattern
for each step, where the word is shifted left and right by some amount,
and a mask determines which bits to keep from the left shift and which
from the right shift. After that, the shift amount is multiplied by a constant.
We know the computation involves some number of such operations, but not how
many. The generator
rep
is ideally suited for that purpose.
Note that the first parameter
n
, which defines
the depth of the recursion, does not have to be a
constant; the number of iterations is part of what the synthesizer needs to discover.
The result of solving this sketch against a suitable harness would look something like this:
void reverseSketch (bit[32] in, ref bit[32] _out) implements reverse/*reverse.sk:7*/
{
bit[32] __sa0 = {0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1};
_out = ((in << 1) & __sa0) | ((in >> 1) & (~(__sa0)));
bit[32] __sa0_0 = {0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1};
_out = ((_out << 2) & __sa0_0) | ((_out >> 2) & (~(__sa0_0)));
bit[32] __sa0_1 = {0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1};
_out = ((_out << 4) & __sa0_1) | ((_out >> 4) & (~(__sa0_1)));
bit[32] __sa0_2 = {0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1};
_out = ((_out << 8) & __sa0_2) | ((_out >> 8) & (~(__sa0_2)));
bit[32] __sa0_3 = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
_out = ((_out << 16) & __sa0_3) | ((_out >> 16) & (~(__sa0_3)));
return;
}
An important thing to note in the generated code is that much of the control
structure in the generator has completely disappeared. All the branches
and all the recursive calls have been partially evaluated away, so what is left is
just the code the user actually wants, albeit with somewhat ugly variable names.
Formalizing generator functions.
Lecture7:Slide16;
Lecture7:Slide17;
Lecture7:Slide20;
Lecture7:Slide21
The first step in understanding how constraints are produced from a program written in Sketch
is to state more precisely the semantics of generators. This will
be done following the formalization in a 2013 journal paper on Sketch
Solar-Lezama13.
As mentioned before, Sketch is really just a notation for writing parametric programs.
A program in Sketch can be thought of as a parametric function, parameterized by a function
$\phi$. This function $\phi$ is really just a table that tells us the value of each of the
different constants inside a Sketch.
If it were not for recursive generators, it would be straightforward to simply assign a unique
name to each distinct unknown constant in the sketch, and then make $\phi$ just a
mapping from that name to a corresponding value as illustrated in the figure. However,
recursive generators introduce a wrinkle into this story because the same syntactic
instance of a hole is supposed to have different values for different instances of the
generator. We formalize this by making $\phi$ a function of a
context in addition
to a hole.
The idea is that when you write a sketch with generators, the compiler internally
assigns each callsite for a generator a unique code. When a generator is called,
it is assigned a calling context which summarizes where the generator
was called. This context is then passed to $\phi$ as illustrated in the figure.
Note that when a generator is called from another generator, the new callsite
name is appended to the existing context, but when it is called from a function,
there is no prior context to consider, and only the callsite name is used.
Also, if a hole is used outside of a generator, just within a normal
function, then it will just have the empty context.
The result, as illustrated in the last frame in the figure,
is that a sketch with recursive generators can have a potentially unbounded
set of values. In practice, Sketch avoids this by bounding the depth of recursion
for generators, a bound that is defined by a command-line flag "
--bnd-inline-amnt
".
With a bound on the depth of recursion $\phi$ becomes once again just a table, mapping
a finite set of
hole names and a calling contexts to values. Over the next lecture, then,
we focus on the process of generating constraints on $\phi$, and solving them
in order to find values that allow the sketch to satisfy all its assertions.