Lecture 8: Constraint based inductive synthesis and SAT.
Lecture8:Slide3
In this lecture, we describe the approach used by sketch to generate constraints from sketches
Solar-Lezama13.
We also describe some alternative approaches for generating constraints and then elaborate on the basics of how they are solved.
From sketch to constraints
Before we jump into the details of how to generate constraints from a Sketch, we need to understand something about how
to define the semantics of a simple language. There are many different formalisms for doing this, but a very popular
one in the context of imperative language is to define the semantics of an expression as a function from a state to
a value. Specifically, the
denotational semantics for expressions is a function
\[
\newcommand{\esem}{\mathcal{A}}
\esem[\![ \cdot ]\!] : Expr \rightarrow \Sigma \rightarrow Val
\]
Where $\Sigma$ is just the set of possible states for a program. In other words, for a given expression, say $x + 5$,
$\esem[\![ x+5 ]\!] $ will give us a function that given a state containing values for each variable,
will give us back a value, hopefully corresponding to the value of x in the state plus five.
Similarly, the semantics of a statement (also often called command in the literature)
are represented as a function that maps the state of a program to a new
state. Specifically, the denotational semantics for a statement is a function
\[
\newcommand{\csem}{\mathcal{C}}
\csem[\![ \cdot ]\!] : cmd \rightarrow \Sigma \rightarrow \Sigma
\]
Lecture8:Slide5;
Lecture8:Slide6;
Lecture8:Slide7
For example, the figure illustrates the semantics for a very simple imperative language. Note
that the semantics are described recursively following the syntactic structure of the language.
For example, the semantics of constants are just the value of the constant, and the semantics for
variables are defined in terms of the state, which assigns a value to every variable. For simplicity,
the language elides the distinction between booleans and integers, just using 1 as a stand-in for true.
For commands, the semantics are also defined recursively. The most interesting rule is the
rule for assignment of the form
x:=expr
, which produces a new state that is just like the
state before the assignment, but with $x$ now mapped to a new value corresponding to the result of
evaluating $expr$ on the initial state. Sequential composition is just what one would expect, the semantics
is just the result of chaining together the semantics of the two corresponding statements.
The rule for
if
is interesting notationwise. To understand the notation, remember that a state
is a mapping from variable names to values. So the if rule produces a new mapping that will return either
the values created by the
then branch, or the values created by the
else branch
depending on whether the condition evaluated to true or not.
Loops are interesting because they necessarily involve recursion. The recursion stops only when we
reach a state where the loop condition evaluates to false.
Symbolic execution of a sketch
Lecture8:Slide8;
Lecture8:Slide9;
Lecture8:Slide10
The basic idea when creating constraints from a sketch is that we want to perform
symbolic execution. Unlike the standard execution which runs a program and produces
states mapping variables to values, our symbolic execution will run a program and produce
symbolic values and
constraints.
The formalism is similar from before, but for expressions, the semantics are now a function that
takes in a state mapping variable names to symbolic values and producing symbolic values, which
are really just symbolic representations of a function from an assignment to holes $\phi$ to
a concrete value. An important thing to note is that the denotation function is parametric on
the context $\tau$, which is important in correctly generating constraints for generators.
For example, the figure shows the semantics of a few basic expressions. The most interesting
is the semantics for a hole with a unique label $??_i$. Just like we saw before, given an
assignment $\phi$, the value of the hole is simply whatever $\phi$ assigns to that
hole under the current context.
More interesting still are the semantics of commands. Unlike before, two things happen when
we evaluate a command. First, the state may change, as variables are assigned new values.
But also, the set of valid assignments may be restricted, for example, by an assertion.
So the semantics of a command take in a state and a representation of a set of valid
assignments and produces a new state and a new set of valid assignments.
For example, after an assignment statement, the set of valid assignments remains unchanged,
but the state is updated with the assigned variable now mapping to a new symbolic value.
When an assert is executed, on the other hand, the state remains unchanged, but the
set of valid assignments is now restricted to only those assignments that cause the expression
to evaluate to true under the current state.
Lecture8:Slide11;
Lecture8:Slide12;
Lecture8:Slide13;
Lecture8:Slide14;
Lecture8:Slide15;
Lecture8:Slide16
The semantics of branches and loops are a little more involved. In the case of branches, each branch
is evaluated on the set of values that satisfy the branch, and the results of the two branches are
combined at the end as illustrated by the animation. The state is also modified by the branch in
the same way as it was in the case of the simple imperative language.
Loops follow the same logic, but with the caveat that loop evaluation is recursive. This is problematic
because unlike the standard execution where we could stop the recursion as soon as we got a state
where the condition evaluated to true, in this case we are doing symbolic execution, so even if
we wanted to we would not be able to tell when the expression evaluates to true. Note that the
definition does not even guard the recursion by a conditional. In principle, we could compute the
expression to the right recursively until we reach a fixpoint, that is, sooner or later,
additional recursive calls to $W$ will stop contributing anything to the resulting set, and at that point
we can stop recursing. In practice, Sketch simply continues this process until we reach a hard-coded limit
determined by a compile-time flag "
--bnd-unroll-amnt
".
Representing Sets and Symbolic Expressions
Lecture8:Slide17
The symbolic execution defined earlier relies on our ability to compactly represent both the
symbolic values $\Psi$ and the set of viable candidates $\Phi$. For the symbolic values,
the representation is simply as an AST with unknowns at the leaves. For the set $\Phi$, we
represent them as predicates. The idea of representing sets as predicates is very common
in many different areas of program analysis and synthesis. The idea is to represent
a set $\Phi$ as a predicate $P_\Phi(\phi)$ such that $P_\Phi(\phi) \mbox{ iff } \phi\in \Phi$.
Thus, for example, the predicate $true$ corresponds to the universal set, and the
standard operations of Union and intersection correspond to and and or of the corresponding
predicates respectively.
From the semantics, for example, we see that assert restricts the set to those $\phi$ for which
the expression is true. This means that if we initially had a set represented by
a predicate $P_\Phi(\phi)$, then after the assert, the set would be represented
by the new predicate
\[
P_\Phi(\phi) \wedge f(\phi)=1 \mbox{ where } f(\phi) = \esem[\![ e ]\!] ^\tau \sigma \phi
\]
Similarly, the union in the semantics of
if
would be represented as
\[
P_{\Phi_1}(\phi) \vee P_{\Phi_2}(\phi)
\]
The figure above provides an example of the representation of the set of valid examples
for a given sketch. The nodes labeled as
mux
simply select from their
two inputs based on whether a condition is true or false. The
and
joins
together the constraints from the two asserts, each of which involves an
or
because the asserts are guarded by
if
conditions.
Another point to note is that the representation is not quite a tree, but a DAG.
This is simply an optimization to exploit sharing in the underlying expression.
Optimizing the representation
Lecture8:Slide46;
Lecture8:Slide47;
Lecture8:Slide48;
Lecture8:Slide48;
Lecture8:Slide49;
Lecture8:Slide50;
Lecture8:Slide51;
Lecture8:Slide52;
Lecture8:Slide53;
Lecture8:Slide54;
Lecture8:Slide55;
Lecture8:Slide56;
Lecture8:Slide57;
Lecture8:Slide58;
Lecture8:Slide59;
Lecture8:Slide60;
Lecture8:Slide61
There are two major optimizations that are used to reduce the size of the
representation: structural hashing and algebraic simplification.
Structural hashing is illustrated by the animation. The idea is simply to identify
common sub-expressions and represent them with the same node. Because
the representation is a DAG, it is sufficient to traverse it from the leaves down
to the root node in one pass. For every node we record its type and the ID of its parents.
If two nodes of the same type share the same parents, they get merged into a single node.
Structural hashing is most powerful when combined with algebraic simplification. This involves
rewriting the DAG based on algebraic equalities. Each rewrite simplifies the representation, but
also potentially helps uncover additional shared structure as illustrated in the figure.
The current Sketch solver release has a hand-crafted simplifier, although in recent work,
we have explored the automatic synthesis of the simplification layer
Rohit0002S16.