Introduction to Program Synthesis

© Armando Solar-Lezama. 2018. All rights reserved.

Lecture 10: Introduction to functional synthesis.

We are now ready to move beyond inductive synthesis to richer forms of specification. Within these richer forms of specification, the literature distinguishes between functional synthesis, where the goal is to synthesize functions that map inputs to outputs, or more generally an input state to an output state, and reactive synthesis, where the goal is to synthesize a system that will run continuously for an unbounded amount of time and will react to an ongoing stream of inputs with a sequence of outputs. Reactive synthesis is itself a broad field that is unfortunately beyond the scope of this course. The focus for this unit will be on functional synthesis.

The general synthesis problem is to find a program $P$ that satisfies a specification. There are three major issues in this form of synthesis: The first is to establish the form of the specification. The second is the approach to be used to establish the correctness of the eventual solution. And finally, there is the search strategy. All three of these issues are tightly intertwined. Different forms of specification can enable different verification strategies, and the interplay between the verification strategy and the search is crucial in enabling scalability to complex problems. At one extreme of this interplay are correct-by-construction techniques that ensure that potentially incorrect programs are never even considered; at the other extreme are generate-and-check approaches that completely decouple the search from the verification problem. In between, though there is a rich space of techniques where the correctness constraints drive the search and steer the synthesizer towards solutions that are actually easy to verify.

Framing the synthesis problem

Lecture10:Slide4; Lecture10:Slide5; Lecture10:Slide6; Lecture10:Slide7 The most traditional form of specification in this space is through the use of pre-conditions and post-conditions. A pre-condition is a predicate that all valid inputs to a function must satisfy, while a post-condition is a predicate that all outputs must satisfy. Informally speaking, the precondition is a promise that the environment makes to the function about the parameters it will pass and the context in which the function will be invoked. The post-condition is a promise that the function makes, conditional on the environment having lived up to its promise.

One of the challenges in attempting full functional specification of behavior is that it can sometimes be difficult to pin down what the expected behavior of a function ought to be. As an example, illustrated in the figure, consider the problem of fully specifying the behavior of a sort function. Attempt 1 seems reasonable, but it is clearly not enough. For example, given the input [3, 2, 4, 2, 1], an output of the form [5, 5, 5, 5, 5] would clearly satisfy the specification despite not conforming to our expectation of the behavior of sort. Attempt 2 seems more promising, since it also requires that every element on the output matches some element of the input, but it is still insufficient, since it would accept [3,3,3,3,3] as a valid output. Attempt 3 further strengthens the definition with the requirement that every element in the input also be present in the output, but even that is still insufficient, as it would allow an output such as [1,2,3,4,4]. Attempt 5 is actually fully consistent with our expectation of the behavior of sort, but it is also fairly unintuitive. It introduces a helper function $p$ that is required to be a permutation function that fully characterizes which output came from which input.

Lecture10:Slide8 At some level, this problem is unavoidable, at least until the synthesizer is capable of reading our mind. However, it can be significantly ameliorated through multi-modal interaction. The idea of multi-modal synthesis was first articulated in the work of Singh and Solar-Lezama on storyboard programmingSinghS12, is that for complex programs, it will be difficult to fully articulate the behavior of a program in any one formalism. However, different formalisms can make it easier to articulate different properties, so by allowing the programmer to provide specifications in multiple different formalisms, a user can fully constrain the behavior of the desired program. For the running example, Attempt 3 may not be enough to fully specify the behavior of the program, but combined with a concrete input/outptu example that exposes the expected behavior when there are multiple instances of the same value may be enough to fully constrain the implementation to the correct one given even a very general template of the desired implementation.

Interplay of verification and synthesis

At a high-level, there are four main mechanisms that we will be exploring to ensure the correctness for programs: constraint-based techniques based on symbolic execution, abstract interpretation, type-based analysis and deductive verification.

Lecture10:Slide10 The constraint-based techniques actually constitute a broad category of verification techniques, but the general setup is illustrated in the figure. In this approach, a program is converted into a predicate of the form $\forall in. Q(in)$ that must hold for all inputs (and also potentially for all values of temporary variables introduced by the symbolic execution process). If the predicate indeed holds, then we can conclude that the program is correct with respect to its specification.

The method of generating verification conditions that we will present in Lecture 13, is a sound but incomplete member of this category. Sound means that if it claims correctness (if the universally quantified formula is valid), then we can be sure that the program is correct, but if the formula does not hold (if we can find an $in$ for which $Q$ is not true), then the program may or may not be incorrect, so it is incomplete because it cannot verify all correct programs. Bounded model-checking, which will be presented in Lecture 12, is by contrast an unsound but complete instance of this class of techniques.

Lecture10:Slide11 These solver-based techniques generalize easily to the synthesis case with the introduction of an extra quantifier over the space of programs. Historically, however, the problem of solving general formulas with quantifier alternation has proven quite difficult, and even though there are off-the-shelf solvers for quantified boolean formulas with arbitrary quantifier alternation, they are not widely used in the synthesis community because they tend not to be very efficient for synthesis problems. Instead, there are two major techniques to deal with this quantifier alternation: quantifier elimination and Counterexample Guided Inductive Synthesis (CEGIS).

The main idea in quantifier elimination is to algebraically eliminate any variables that appear in the universal quantifier. The major downside of quantifier elimination is that it can be very expensive. For boolean variables, each variable eliminated can grow the size of the formula exponentially, and once we go beyond linear arithmetic there may not even be effective procedures for variable elimination. However, in some instances it can be very efficient, and even when it cannot be applied fully, it can be a useful preprocessing step before attempting other methods. Lecture10:Slide13

As an example, consider the simple program in the figure. In the example, the correctness condition reduces to a simple predicate, and simple algebraic manipulation completely eliminates the universally quantified variables from the predicate, leaving us with a constraint that all valid assignments $\phi$ must satisfy.

Counterexample Guided Inductive Synthesis

The key ideas behind CEGIS were first described by Solar-Lezama, Tancau, Bodik, Seshia and Saraswat in a 2006 papersketch06, although it wasn't until a later paper in 2008 sketch08 that the term CEGIS was coined by Solar-Lezama, Bodik and Jones and the algorithm was explained in the general form used here.

CEGIS is a form of generate and check, where a synthesizer generates candidate programs that are checked by an off-the-shelf checking procedure. The key idea in CEGIS, however, is to use a checker capable of producing counterexample inputs. This allows us to use an inductive synthesis procedure instead of simply producing proposals blindly. The inductive synthesis procedure is forced to produce proposals that work for all the counterexample inputs discovered so far. The idea is to avoid producing new candidates that will fail in similar ways from previously rejected programs.

Lecture10:Slide15; Lecture10:Slide16; Lecture10:Slide17; Lecture10:Slide18; Lecture10:Slide19; Lecture10:Slide20; Lecture10:Slide21; Lecture10:Slide22; Lecture10:Slide23; Lecture10:Slide24; Lecture10:Slide25; Lecture10:Slide26; Lecture10:Slide27; Lecture10:Slide28; Lecture10:Slide29; Lecture10:Slide30; Lecture10:Slide31 As the figure illustrates, every time a candidate proposal is rejected by the checker and a counterexample is generated, the counterexample has the effect of not just eliminating the offending program, but it rules out every other program that would have also failed on that same input. In the figure, the red circle represents the set of viable programs. Initially, the synthesizer searches for a program that works for an initial random input $in_0$. Every subsequent input $in_i$ is the result of solving a checking problem where we look for an input that fails for the current candidate program. The process continues until it is not possible to find additional counterexamples, at which point the process stops.

CEGIS was first proposed in the context of constraint-based synthesis, where the inductive synthesis problem itself could be expressed as a series of constraints on the parameters $\phi$, but the idea is actually more general. One of the features that has made CEGIS so popular is the ability to mix and match different inductive synthesis procedures with different checking mechanisms. In that way, CEGIS provides a bridge between all the inductive synthesis techniques studied in the previous unit and the expressive specifications that will be explored in this unit.

Beyond CEGIS

CEGIS is very powerful, but it is not magic, and there are well documented instances where it fails very dramatically. The simplest program that illustrates the failures of CEGIS is the program below: void foo(int x, int y){ assert x != y+??; } Lecture10:Slide32; Lecture10:Slide33 There is clearly no value for the unknown constant that would guarantee that x != y + ?? for all inputs x and y. However, for any finite set of (x,y) pairs, it is possible to find a constant that will make the inequality hold. To better understand the nature of this example, consider the grid in the figure. As the figure illustrates, CEGIS works best in cases where the buggy program will fail on large sets of inputs. By contrast, the example above has the property that a buggy program will seem to work fine on most inputs, failing only on a single very specific input pair. The biggest sterngth of CEGIS, the fact that it only has to consider small number of inputs at a time, is also its biggest weakness when it comes to these kinds of problems.

In contrast, a number of other synthesis approaches have been proposed that rely on the other kinds of verification mentioned in the beginning: abstract interpretation, type-based analysis and deductive verification. The key feature in these kinds of approaches is a much tighter coupling between the synthesis and verification techniques. The key observation in all of these approaches is that synthesis can in fact be easier than verification, because unlike traditional verification, where the verifyer has to be able to certify whatever crazy program the developer writes, the synthesizer can be steered to produce programs that are easy to verify by the chosen verification technique.

Lecture10:Slide34 Abstraction. In Lecture 19, we will talk in more detail about Abstract interpretation and abstraction based analysis, but the high-level idea is illustrated by the figure. In this example, we are approximating sets of values with ranges. So for example, the inputs are stated to range from one to ten. It is relatively easy to propagate ranges through a program, even one with unknowns. So in this case, we can see that $t$ ranges from $\phi_1$ to $10*\phi_1$, and that $v$ ranges from $\phi_2$ to $10*\phi_2$. From this information we can infer that as long as $\phi_1 > 10*\phi_2$, $t$ will be greater than $v$. This example is quite contrived, but in general the combination of abstraction and synthesis can be very powerful because the synthesizer can be pushed to choosing values that even a relatively weak analysis is able to prove correct.

Types-based analysis In the context of inductive synthesis, we saw already how type information can help us aggressively prune the space of possible programs. We can take this idea a step further by focusing on even more expressive type systems. Later in Lecture 15, we will explore a more expressive type system based on refinement types that will allow us to synthesize programs with non-trivial properties in a modular way.