Lecture 13: Functional Synthesis with Sketch.
In the last lecture we already saw an example where Sketch was used to synthesize a code fragment that
we wanted to be correct for all inputs. In sketch, this can be done by adding parameters to the test harness.
These parameters will be treated as universally quantified variables, so the goal of the synthesizer will
be to solve for a piece of code that works for all values of these inputs within a prescribed domain.
In the presence of such universally quantified variables, Sketch uses the CEGIS algorithm described in
Lecture 10.
In this lecture, we will focus on three aspects of Sketch that relate to the synthesis with universally quantified
variables: how to deal with loops and recursion, some important cases where CEGIS fails, and some limitations of the theory
reasoning in sketch.
Loops and recursion.
Lecture13:Slide7
In the last lecture, we saw one approach for dealing with loops: generate a verification condition and synthesize loop
invariants and variants that allow us to prove total correctness of the synthesized code. This approach is very powerful
and has the advantage of generating a provably correct program. However, it also has some important limitations.
First, for some programs, the loop invariants and variants may end up being much more complex than the code itself, so
forcing the synthesizer to synthesize these in addition to the code itself can make the synthesis problem much harder.
For programs manipulating heap based data-structures, or for programs where the correctness condition involve proving
the equivalence of two different algorithms, synthesizing all the artifacts necessary for verification will simply be too hard.
The sketch solver follows a different approach: Unrolling loops. The idea is to exploit the following identity:
while(b){ if(b){
C C;
} is equivalent to while(b){
. C;
. }
. }
Lecture13:Slide13
Unrolling the loop means applying this equivalence repeatedly. At some point we must stop the conversion and we simply replace
the while loop with
assert false;
. This assertion means that if there is any input that would have caused the loop
to iterate more, then that input will cause an assertion failure.
This means that the loop unrolling generally has to be accompanied by an assumption on the inputs that will prevent inputs
that would have caused the loop to iterate too much. For example, if a program contains a loop of the form:
i = 0;
while(i < N){
i = i+1;
}
Then if the loop is being unrolled 5 times, we need an assume that ensures that $N$ cannot be greater than 5.
In sketch, there are two ways to impose such bounds. One is to use the
assume
construct;
in this case we could add
assume N < 5;
right before the loop.
Alternatively, one can do this by bounding the number of bits for input values. In sketch, any integer
input to the test harness defaults to a 5-bit possitive integer, so it ranges from 0 to 31, but this range
can be extended or restricted by using the
--bnd-inbits n
flag, which restricts the number
of bits of integer inputs to
n
. The unrolling amount itself is controlled by a flag
--bnd-unroll-amnt n
which determines how many times a given loop will be unrolled.
An obvious question is why replace the last instance of the loop with
assert false
instead of
assume false
? If we used assume instead of assert, then any input that would have caused the loop
to iterate more than the unroll amount would automatically be ignored, and there would be no need of an extra
assumption about the ranges of input values. The reason Sketch does not do this is that if sketch used
an assume, then it would be too easy for the synthesizer to solve a sketch by causing its loops to iterate
beyond their iteration bound.
There is a similar story with recursion. Recursive functions get inlined into their calling context
up to a depth determined by the
--bnd-inline-amnt n
flag.
It is worth noting that both in the case of loops and in the case of recursion, if the number
of iterations or the number of recursive calls can be determined at compile time, then
the respective flags will be ignored in faver of this known amount. So for example,
if a program contains a loop of the form
for(int i=0; i<100; ++i)
, this
loop will be unrolled 100 times regardless of the value of the
--bnd-unroll-amnt
flag.
Beyond CEGIS.
Lecture13:Slide15;
Lecture13:Slide16;
Lecture13:Slide17;
Lecture13:Slide18;
Lecture13:Slide19;
Lecture13:Slide20;
Lecture13:Slide21;
Lecture13:Slide22
As mentioned in
Lecture 10, cegis can sometimes fail
in cases where a single counterexample eliminates only a small number of candidate solutions.
One particularly important example where this failure happens was first observed by Jha, Gulwani,
Seshia and Tiwari
Jha:2010.
They were focusing on the problem of assembling a set of components to satisfy some specification.
A simple way of doing this in sketch is shown in the figure. It involves replicating each component
$N$ times and adding a set of assertions to impose constraints that, for example, each compoent is used
only once. A seemingly more efficient approach is to invoke each component only once with a temporary input,
and to assert that the output condition is satisfied assuming that the inputs
are equal to the corresponding outputs.
This formulation of the problem is seemingly more efficient, since it does not require invoking
each of the $N$ components $N$ times, but it is a very bad fit for CEGIS. The problem is that
the assumptions are too strong and can be controlled by the unknown parameters, so it is easy for the synthesizer
to get a counterexample to work by making the assumptions invalid. For example, consider the concrete
case illustrated in the figure, where the components correspond to
out=in+10, out=in*2
and
out=in+1
.
Now suppose the specification we want to satisfy is
out == (in+10)*2 + 1
. Given an incorrect solution
that first multiplies times 2, then adds 10, and then adds 1, the checker can produce a counterexample. The problem
is that the counterexample includes not just the counterexample input, abut also the temporary values that correspond
to this particular input and this particular order of the components. Unfortunately, these temporary values are so specific,
that even swapping the +1 and the +10 components (which has no effect on the semantics of the composition) is enough
to invalidate the counterexample and cause it to fail the assumptions.
This is a general problem of synthesis problems that have the form
\[
\exists \phi . \forall in, temp. P(\phi, in, temp) \Rightarrow Q(\phi, in, temp)
\]
where the condition $P(\phi, in, temp)$ is very strong and for a given counterexample, it is easier
for the synthesizer to invalidate $P$ than to find the $\phi$ that satisfies $Q$.
The solution proposed by Jha et al. is to change the form of the constraint to
\[
\exists \phi . \forall in. \exists temp. P(\phi, in, temp) \wedge Q(\phi, in, temp)
\]
In general, the two formulas are not equivalent, the formula on the top requires that $Q$ is valid for
all $temp$ that satisfy $P$, whereas the one on the bottom only requires there to exist some value
of $temp$ that satisfies $P$ and $Q$. On the other hand, the formula on the bottom demands the
existence of a $temp$ that satisfies $P$, whereas the formula on the top can be satisfied if no such
$temp$ exists. However, the two formulas are equivalent if $P(\phi, in, temp)$ is satisfied
by one and only one value of $temp$ for any $\phi, in$ pair.
What do we gain from adding additional quantifier alternation?
In the context of CEGIS, this extra quantification has the benefit that the counterexample no longer
has to include the temporary values. For the synthesis phase, the counterexample can be limited to
$in$. For example, when we have 2 counterexample inputs $in_1, in_2$, the constraint on $\phi$ would now be.
\[
\exists \phi . \exists temp_1, temp_2. P(\phi, in_1, temp_1) \wedge Q(\phi, in_1, temp_1) \wedge P(\phi, in_2, temp_2) \wedge Q(\phi, in_2, temp_2)
\]
Note that there is no additional quantifier alternation because the universal quantifier is expanded out into the individual examples.
This basically means that the synthesizer is now responsible for coming up with the temporary values that match the counterexample
and its choice of control values.
The extra quantifier alternation would cause problems for the verification phase, but for the verification phase, it is
easy just to use the original form of the constraint.
The same idea is used by sketch as part of its
model
functionality
SinghSXKS14.
To explain this idea, it is important to first understand the notion of
uninterpreted functions.
An uninterpreted function is a function for which we know nothing about it other than the fact that it is a function,
meaning that when given the same input it will produce the same output.
These are very useful in program verification when we need to verify the equivalence of two pieces of code
that both use the same complex routine, but we do not want to have to reason about the details of
the routine. For example, if we know that $f$ is a function, we can prove that $f(3+x + 2)$ is equivalent
to $f(x + 5)$ without having to know anything about $f$ other than the fact that it is a function
that will produce the same output given the same integer input.
In some cases, uninterpreted functions can be too restrictive, we want to be able to state additional properties
on the output of a function. These uninterpreted functions with extra properties are often called
partially interpreted functions.
Sketch allows you to
model the behavior of a complex function through an uninterpreted function with additional properties.
For example, we can model the behavior of a square root function with the model shown below.
model int msqrt(int i){
int rv = sqrtuf(i);
if(i==0){
assert rv == 0;
}
assert rv*rv <= i;
assert (rv+1)*(rv+1) > i;
return rv;
}
The internal call to
sqrtuf
is to an uninterpreted function. The assertions that come after that
call relate the return value to the inputs. In the case of this example, they ensure that the value is the
integer square root of the input. The assertions in the model act exactly like the
predicate $P$ in the general formula above and could lead to the exact same problems. In order to avoid this, Sketch
uses the approach outlined earlier to ensure that CEGIS works properly.
The approach used by sketch to cope with these models is slightly more complex than what has been described here, but
the high-level idea is the same. Students who want to learn more are encouraged to read the original paper by Singh et al
SinghSXKS14.
Limitations of theory reasoning.
In sketch, the inputs to a harness are limited to integers, booleans (type
bit
in Sketch) and arrays of them.
Additionally, the Sketch can invoke uninterpreted function that produce any of these types as output.
However, as has been alluded already, the verification phase of Sketch will not actually check for all possible values
of these inputs, only for values within a particular bound. The flag
--bnd-inbits n
determines how many
bits the inputs are allowed to have, and in the case of variable length arrays, the
additional flag
--bnd-arr-size n
limits the maximum size of these arrays (default is 32).
It is also important to note that sketch only considers non-negative values for any of its integer inputs.
In addition to the bounds on the input sizes, it is worth recalling that by default, sketch uses a
one-hot representation for all
internal integer values. This can lead to situations where the internal representation of an integer grows to a very large size
despite the fact that when executing the program, all integers could be expected to stay within a given range.
This problem will manifest itself as sketch running out of memory, usually during the synthesis phase.
One way to deal with this problem is to use the
--bnd-int-range k
flag. This tells sketch that for all inputs in range,
we do not expect any value anywhere in the computation to be outside the range $[-k, k]$. Sketch then uses this information
to keep the representations to grow beyond $2*k$ in size.
Alternatively, sketch also has a native integer solver that can be enabled with the flag
--slv-nativeints
. This
solver does not rely on a one-hot encoding, and therefore scales better for problems where larger integers are expected.