Introduction to Program Synthesis

© Armando Solar-Lezama. 2018. All rights reserved.

Lecture 8: Constraint based inductive synthesis and SAT.

Lecture8:Slide3 In this lecture, we describe the approach used by sketch to generate constraints from sketchesSolar-Lezama13. We also describe some alternative approaches for generating constraints and then elaborate on the basics of how they are solved.

From sketch to constraints

Before we jump into the details of how to generate constraints from a Sketch, we need to understand something about how to define the semantics of a simple language. There are many different formalisms for doing this, but a very popular one in the context of imperative language is to define the semantics of an expression as a function from a state to a value. Specifically, the denotational semantics for expressions is a function \[ \newcommand{\esem}{\mathcal{A}} \esem[\![ \cdot ]\!] : Expr \rightarrow \Sigma \rightarrow Val \] Where $\Sigma$ is just the set of possible states for a program. In other words, for a given expression, say $x + 5$, $\esem[\![ x+5 ]\!] $ will give us a function that given a state containing values for each variable, will give us back a value, hopefully corresponding to the value of x in the state plus five.

Similarly, the semantics of a statement (also often called command in the literature) are represented as a function that maps the state of a program to a new state. Specifically, the denotational semantics for a statement is a function \[ \newcommand{\csem}{\mathcal{C}} \csem[\![ \cdot ]\!] : cmd \rightarrow \Sigma \rightarrow \Sigma \] Lecture8:Slide5; Lecture8:Slide6; Lecture8:Slide7 For example, the figure illustrates the semantics for a very simple imperative language. Note that the semantics are described recursively following the syntactic structure of the language. For example, the semantics of constants are just the value of the constant, and the semantics for variables are defined in terms of the state, which assigns a value to every variable. For simplicity, the language elides the distinction between booleans and integers, just using 1 as a stand-in for true.

For commands, the semantics are also defined recursively. The most interesting rule is the rule for assignment of the form x:=expr, which produces a new state that is just like the state before the assignment, but with $x$ now mapped to a new value corresponding to the result of evaluating $expr$ on the initial state. Sequential composition is just what one would expect, the semantics is just the result of chaining together the semantics of the two corresponding statements. The rule for if is interesting notationwise. To understand the notation, remember that a state is a mapping from variable names to values. So the if rule produces a new mapping that will return either the values created by the then branch, or the values created by the else branch depending on whether the condition evaluated to true or not.

Loops are interesting because they necessarily involve recursion. The recursion stops only when we reach a state where the loop condition evaluates to false.

Symbolic execution of a sketch

Lecture8:Slide8; Lecture8:Slide9; Lecture8:Slide10 The basic idea when creating constraints from a sketch is that we want to perform symbolic execution. Unlike the standard execution which runs a program and produces states mapping variables to values, our symbolic execution will run a program and produce symbolic values and constraints.

The formalism is similar from before, but for expressions, the semantics are now a function that takes in a state mapping variable names to symbolic values and producing symbolic values, which are really just symbolic representations of a function from an assignment to holes $\phi$ to a concrete value. An important thing to note is that the denotation function is parametric on the context $\tau$, which is important in correctly generating constraints for generators.

For example, the figure shows the semantics of a few basic expressions. The most interesting is the semantics for a hole with a unique label $??_i$. Just like we saw before, given an assignment $\phi$, the value of the hole is simply whatever $\phi$ assigns to that hole under the current context.

More interesting still are the semantics of commands. Unlike before, two things happen when we evaluate a command. First, the state may change, as variables are assigned new values. But also, the set of valid assignments may be restricted, for example, by an assertion. So the semantics of a command take in a state and a representation of a set of valid assignments and produces a new state and a new set of valid assignments. For example, after an assignment statement, the set of valid assignments remains unchanged, but the state is updated with the assigned variable now mapping to a new symbolic value. When an assert is executed, on the other hand, the state remains unchanged, but the set of valid assignments is now restricted to only those assignments that cause the expression to evaluate to true under the current state.

Lecture8:Slide11; Lecture8:Slide12; Lecture8:Slide13; Lecture8:Slide14; Lecture8:Slide15; Lecture8:Slide16 The semantics of branches and loops are a little more involved. In the case of branches, each branch is evaluated on the set of values that satisfy the branch, and the results of the two branches are combined at the end as illustrated by the animation. The state is also modified by the branch in the same way as it was in the case of the simple imperative language.

Loops follow the same logic, but with the caveat that loop evaluation is recursive. This is problematic because unlike the standard execution where we could stop the recursion as soon as we got a state where the condition evaluated to true, in this case we are doing symbolic execution, so even if we wanted to we would not be able to tell when the expression evaluates to true. Note that the definition does not even guard the recursion by a conditional. In principle, we could compute the expression to the right recursively until we reach a fixpoint, that is, sooner or later, additional recursive calls to $W$ will stop contributing anything to the resulting set, and at that point we can stop recursing. In practice, Sketch simply continues this process until we reach a hard-coded limit determined by a compile-time flag "--bnd-unroll-amnt".

Representing Sets and Symbolic Expressions

Lecture8:Slide17 The symbolic execution defined earlier relies on our ability to compactly represent both the symbolic values $\Psi$ and the set of viable candidates $\Phi$. For the symbolic values, the representation is simply as an AST with unknowns at the leaves. For the set $\Phi$, we represent them as predicates. The idea of representing sets as predicates is very common in many different areas of program analysis and synthesis. The idea is to represent a set $\Phi$ as a predicate $P_\Phi(\phi)$ such that $P_\Phi(\phi) \mbox{ iff } \phi\in \Phi$. Thus, for example, the predicate $true$ corresponds to the universal set, and the standard operations of Union and intersection correspond to and and or of the corresponding predicates respectively.

From the semantics, for example, we see that assert restricts the set to those $\phi$ for which the expression is true. This means that if we initially had a set represented by a predicate $P_\Phi(\phi)$, then after the assert, the set would be represented by the new predicate \[ P_\Phi(\phi) \wedge f(\phi)=1 \mbox{ where } f(\phi) = \esem[\![ e ]\!] ^\tau \sigma \phi \] Similarly, the union in the semantics of if would be represented as \[ P_{\Phi_1}(\phi) \vee P_{\Phi_2}(\phi) \] The figure above provides an example of the representation of the set of valid examples for a given sketch. The nodes labeled as mux simply select from their two inputs based on whether a condition is true or false. The and joins together the constraints from the two asserts, each of which involves an or because the asserts are guarded by if conditions. Another point to note is that the representation is not quite a tree, but a DAG. This is simply an optimization to exploit sharing in the underlying expression.

Optimizing the representation

Lecture8:Slide46; Lecture8:Slide47; Lecture8:Slide48; Lecture8:Slide48; Lecture8:Slide49; Lecture8:Slide50; Lecture8:Slide51; Lecture8:Slide52; Lecture8:Slide53; Lecture8:Slide54; Lecture8:Slide55; Lecture8:Slide56; Lecture8:Slide57; Lecture8:Slide58; Lecture8:Slide59; Lecture8:Slide60; Lecture8:Slide61 There are two major optimizations that are used to reduce the size of the representation: structural hashing and algebraic simplification. Structural hashing is illustrated by the animation. The idea is simply to identify common sub-expressions and represent them with the same node. Because the representation is a DAG, it is sufficient to traverse it from the leaves down to the root node in one pass. For every node we record its type and the ID of its parents. If two nodes of the same type share the same parents, they get merged into a single node.

Structural hashing is most powerful when combined with algebraic simplification. This involves rewriting the DAG based on algebraic equalities. Each rewrite simplifies the representation, but also potentially helps uncover additional shared structure as illustrated in the figure. The current Sketch solver release has a hand-crafted simplifier, although in recent work, we have explored the automatic synthesis of the simplification layerRohit0002S16.