Introduction to Program Synthesis

© Armando Solar-Lezama. 2018. All rights reserved.

Lecture 7: Constraint-based Synthesis with Sketch.

The techniques described in the previous lecture used symbolic representations of the program space, but they still involved a fair amount of enumeration. We now focus on a class of techniques that are "more symbolic", and have more flexibility in capturing complex program spaces, although at the expense of a significant computational cost.

For this lecture and the next, we will be using the Sketch synthesis system as a canonical examplesketchthesis, although there are other systems such as Brahma which are based on similar principlesJha:2010. The similarities and differences between these different systems will be elaborated at the end of the unit.

Constraint-based synthesis at a glance

The high-level idea in constraint-based synthesis is to represent the program space as a parametric program $P[c]$, so that different values of the parameters correspond to different programs in the space. The idea is to translate requirements on the behavior of the program $P[c]$ into constraints on the parameters $c$, so that any value of $c$ that satisfies the constraints $\varphi(c)$ is guaranteed to lead to a program $P[c]$ satisfying all the requirements.

In order for this approach to work, we need three ingredients. First, we need a mechanism for creating parametric programs from a high-level definition of the program space. Second, we need a mechanism for constructing constraint systems from these parametric programs and their requirements, and finally, we need efficient mechanisms for solving the resulting constraint systems. We start by addressing the first point.

From program spaces to parametric programs.

There are two major approaches for defining the parametric programs that are the starting point of constraint-based synthesis. The first approach is to provide the user with a high-level notation for describing a program space, and then have a compiler that converts this definition into a parametric program. This is the approach taken by Brahma or by the SyGuS solvers. In the case of Brahma, the user simply provides a bag of components, and the system automatically produces a parametric program where different choices of parameters correspond to different ways of connecting the components together. In the case of the SyGuS solvers, the user provides a context-free grammar for a space of expressions, and the solver generates a parametric program from this grammar.

The alternative approach, implemented in Sketch, is to provide the user with a rich and expressive language for directly writing parametric programs. This expressiveness provides the programmer with significant control over the program space and its encoding as a parametric program. That control allows an expert user to carefully engineer a program space to maximize the efficiency of the synthesis process, but it also introduces an extra level of complexity for less sophisticated users who must deal with the added complexity of defining their program space as a parametric program. Sketch tries to alleviate this burden by providing powerful abstraction facilities that allow potentially complex definitions of program spaces to be encapsulated and reused across many different programs.

Sketch: a language for parametric programs.

The most authoritative source for the sketch language is the sketch manual. In this section, we provide a brief overview of the key principles behind the language. At a high-level, sketch is a simple imperative language with support for many of the features we have come to expect from modern languages including heap allocated structures, high-order functions and polymorphism (known as generics in Java). There are three features, however, that distinguish Sketch from other languages: Unknown constants, harnesses and generator functions.

Unknown constants. An unknown constant in Sketch is expressed as $??$. The type of this constant is inferred from context; it can be an integer, a boolean, a character or a fixed size array of either. At synthesis time, sketch replaces each unknown constant with a fixed constant so that all the requirements are satisfied. For example, the simplest sketch program that illustrates the main ideas in the language is shown below. int doublevalue(int in){ int t = in * ??; assert t == in + in; return t; } In the program, the unknown constant must be replaced with an integer constant. The assertion imposes the requirement that t==in+in, which clearly forces the unknown constant to resolve to the number 2. The assertion, however, is only valid in the context of a test harness.

Test harnesses. A test harness is simply a function that when invoked must not trigger any assertion violations. For example, in order to force the doublevalue function above to synthesize to the correct function, we can use the following test harness. harness void test1(){ doublevalue(5); doublevalue(7); doublevalue(3); } We could have also excluded the assert inside the doublevalue function itself and instead placed the assertion in the test harness. harness void test1(){ assert doublevalue(5) == 10; assert doublevalue(7) == 14; assert doublevalue(3) == 6; } Since we are focusing on the inductive synthesis case, we will focus on the case where the test harness does not take any inputs, and instead just invokes the desired functions using fixed values. Later in the course we will explore more general test harnesses that can impose constraints that must hold for all inputs.

Lecture7:Slide5; Lecture7:Slide6 Generator function. At this point, we already have a language expressive enough to discover some interesting aspects of a program. For example, if a program involves an affine expression, say over a variable x, but we do not want to have to think about the constants involved, we can just express it as x*??+??. Or, for example, if at some point we are not sure whether we should use variable x or variable y, we can use ?? ? x : y, using the ?: ternary operator like the one available in C. In order to support the description of more general program spaces, we need some additional machinery, which we borrow from the generative programming literature. In particular, Sketch uses the notion of a generator, which looks like a function, but with the property that it will get fully inlined and partially evaluated into its calling context. As a simple example directly from the Sketch manual, consider the problem of specifying the set of linear functions of two parameters x and y. That space of functions can be described with the following simple generator function: generator int legen(int i, int j){ return ??*i + ??*j + ??; } The generator function can be used anywhere in the code in the same way a function would, but the semantics of generators are different from functions. In particular, every call to the generator will be replaced by a concrete piece of code in the space of code fragments defined by the generator. Different calls to the generator function can produce different code fragments. For example, consider the following use of the generator. harness void main(int x, int y){ assert legen(x, y) == 2*x + 3; assert legen(x,y) == 3*x + 2*y; } Calling the solver on the above code produces the following output void _main (int x, int y){ assert ((((2 * x) + (0 * y)) + 3) == ((2 * x) + 3)); assert (((3 * x) + (2 * y)) == ((3 * x) + (2 * y))); } Note that each invocation of the generator function was replaced by a concrete code fragment in the space of code fragments defined by the generator.

Lecture7:Slide9; Lecture7:Slide11 Up to this point, though, the generator may seem like a typesafe macro, little more than syntactic sugar. What gives generators their real power is the ability to be recursive. For example, the generator in the figure, describes a grammar of expressions, which can either be a variable x, an unknown bit-vector constant, or the bitwise combination or bitwise negation of recursively generated expressions. Each recursive invocation of the generator can have its own distinct values for the unknown constants. This idiom of using generators to define a space of programs as a context free grammar is quite common across many different applications of Sketch.

In addition to being recursive, generators can also be high-order, meaning that they can take other functions or even other generators as parameters. An example of this is the rep generator also shown in the figure. This generator takes as a parameter a function or a generator f, and applies it $n$ times. generator void rep(int n, fun f){ if(n>0){ f(); rep(n-1, f); } } This very simple generator actually implements a very important computational pattern, one where a particular kind of operation needs to be performed multiple times, but each time may actually correspond to a distinct operation. For example, consider the code below: bit[32] reverseSketch(bit[32] in) { bit[32] t = in; int s = 1; generator void tmp(){ bit[32] m = ??; t = ((t << s) & m )| ((t >> s) & (~m)); s = s*??; } rep(??, tmp); return t; } The goal of the sketch above is to reverse the bits in a 32-bit word through a combination of shifts and masks. The generator tmp reflects the basic computational pattern for each step, where the word is shifted left and right by some amount, and a mask determines which bits to keep from the left shift and which from the right shift. After that, the shift amount is multiplied by a constant. We know the computation involves some number of such operations, but not how many. The generator rep is ideally suited for that purpose. Note that the first parameter n, which defines the depth of the recursion, does not have to be a constant; the number of iterations is part of what the synthesizer needs to discover. The result of solving this sketch against a suitable harness would look something like this: void reverseSketch (bit[32] in, ref bit[32] _out) implements reverse/*reverse.sk:7*/ { bit[32] __sa0 = {0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1}; _out = ((in << 1) & __sa0) | ((in >> 1) & (~(__sa0))); bit[32] __sa0_0 = {0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1}; _out = ((_out << 2) & __sa0_0) | ((_out >> 2) & (~(__sa0_0))); bit[32] __sa0_1 = {0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1}; _out = ((_out << 4) & __sa0_1) | ((_out >> 4) & (~(__sa0_1))); bit[32] __sa0_2 = {0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1}; _out = ((_out << 8) & __sa0_2) | ((_out >> 8) & (~(__sa0_2))); bit[32] __sa0_3 = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}; _out = ((_out << 16) & __sa0_3) | ((_out >> 16) & (~(__sa0_3))); return; } An important thing to note in the generated code is that much of the control structure in the generator has completely disappeared. All the branches and all the recursive calls have been partially evaluated away, so what is left is just the code the user actually wants, albeit with somewhat ugly variable names.

Formalizing generator functions.

Lecture7:Slide16; Lecture7:Slide17; Lecture7:Slide20; Lecture7:Slide21 The first step in understanding how constraints are produced from a program written in Sketch is to state more precisely the semantics of generators. This will be done following the formalization in a 2013 journal paper on SketchSolar-Lezama13. As mentioned before, Sketch is really just a notation for writing parametric programs. A program in Sketch can be thought of as a parametric function, parameterized by a function $\phi$. This function $\phi$ is really just a table that tells us the value of each of the different constants inside a Sketch.

If it were not for recursive generators, it would be straightforward to simply assign a unique name to each distinct unknown constant in the sketch, and then make $\phi$ just a mapping from that name to a corresponding value as illustrated in the figure. However, recursive generators introduce a wrinkle into this story because the same syntactic instance of a hole is supposed to have different values for different instances of the generator. We formalize this by making $\phi$ a function of a context in addition to a hole.

The idea is that when you write a sketch with generators, the compiler internally assigns each callsite for a generator a unique code. When a generator is called, it is assigned a calling context which summarizes where the generator was called. This context is then passed to $\phi$ as illustrated in the figure. Note that when a generator is called from another generator, the new callsite name is appended to the existing context, but when it is called from a function, there is no prior context to consider, and only the callsite name is used. Also, if a hole is used outside of a generator, just within a normal function, then it will just have the empty context.

The result, as illustrated in the last frame in the figure, is that a sketch with recursive generators can have a potentially unbounded set of values. In practice, Sketch avoids this by bounding the depth of recursion for generators, a bound that is defined by a command-line flag "--bnd-inline-amnt". With a bound on the depth of recursion $\phi$ becomes once again just a table, mapping a finite set of hole names and a calling contexts to values. Over the next lecture, then, we focus on the process of generating constraints on $\phi$, and solving them in order to find values that allow the sketch to satisfy all its assertions.