Introduction to Program Synthesis

© Armando Solar-Lezama. 2018. All rights reserved.

Lecture 16: Neurosymbolic programming

Earlier in the course, we contrasted machine learning—where the goal is to learn a function from data— with inductive program synthesis, where we are also interested in discovering a function that fits data, but where we want the function to be expressed as a program in a given programming language, and where we may have structural or additional behavioral constraints that the program needs to satisfy. More recently, in this unit, we saw how deep neural networks in general—and large language models in particular— can be used to solve program synthesis problems. But even though deep learning is an essential component in these approaches, the final goal is still to find a program that meets the desired behavioral and structural constraints. For this lecture, we explore a middle point between machine learning and program synthesis that we term Neurosymbolic Programming (NSP). Similar to program synthesis, the goal is to discover a function from data that meets certain structural and behavioral constraints, but we are not fully restricted to the programs that are expressible in a given programming language. The programs can be hybrids between traditional code and neural networks, and in some cases they can be compositions of neural networks with constraints that give them some of the modularity of programs. In short, the goal is to learn models that capture symbolic knowledge in the form of program structures.

Neurosym:BuildingBlocks In order to do this, we are going to be relying on a set of algorithmic building blocks that will allow us to combine the benefits of deep learning with the benefits of symbolic program synthesis.

Program search and symbolic abstraction have already been covered in this course, so for the rest of the lecture, we focus on relaxation, distillation and symbolic guided deep learning.

Relaxation

Neurosym:symbolicVsNeural1; Neurosym:symbolicVsNeural2 The idea of relaxation is to take a symbolic program and make it differentiable. There are two basic appraoches to do this. The first approach is to do it symbolically, by replacing each program element with a differentiable approximation. Swarat Chaudhuri and I wrote an early paper showing how to do thisChaudhuriS10. The goal in that paper was to derive a smooth approximation of a program that could be used to optimize the parameters of the program through numerical optimization. More recently, there has been significant interest in the area of differentiable programming, which provides differentiable analogs to common programming constructs which can be composed to form differentiable programsAbadiP20.

An alternative approach is to use the program as a data generator to train a neural network that imitates the behavior of the program. The basic idea is simple, one can generate random inputs for a program and then train a neural network to predict the outputs from the inputs. The resulting neural network can then be used as a differentiable approximation of the program.

Alex Renda, Yi Ding and and Michael Carbin together explored some applications of these Neural SurrogatesRenda0C21. The most straightforward application is to use neural surrogates to optimize program parameters through gradient descent, but they also highlight other applications such as speeding up the execution of an otherwise slow algorithm. In a different paper, the same authors also show how to better leverage program structure in order to get samples that better represent the different behaviors of the program RendaDC23.

Combining Relaxation and Program search

Relaxation can be combined with program search both to find programs that match a set of input-output examples, but also to find programs that combine symbolic and neural elements. This was demonstrated by Shah et al. in their work on the NEAR algorithmShahAdmissibleHeuristics2020.

The algorithm builds on the idea of angelic non-determinism, which had previously been proposed as a way to determine whether a partial program could possibly be completed to satisfy a given specificationBodikCGKTBR10. The idea is that given a partial program, completing the program with pieces of code that work for all inputs may be quite hard, but imagine if we could execute the partial program such that every time execution reaches an uknown piece of code, we ask an oracle to produce a value that allows the program satisfy the specification. The job of the oracle is simpler than the overall synthesis problem, because it does not have to figure out a program that works for all inputs, only a value that works for the current input. If no such value can be found, then clearly there is no way the partial program could be completed to satisfy the specification for all inputs.

Neurosym:near1; Neurosym:near2; Neurosym:near3; Neurosym:near4; Neurosym:near5; Neurosym:near6; Neurosym:near7; Neurosym:near8; Neurosym:near9; Neurosym:near10; Neurosym:near11; Neurosym:near12; Neurosym:near13 NEAR builds on this idea by training a neural network to serve as the oracle. The algorithm works like a traditional top-down program search, but every time it generates a new partial program as part of the search, it trains a neural network to serve as the oracle for each hole in the partial program. If no such oracle can be trained, the assumption is that the partial program cannot be completed to satisfy the specification, so the search choses a different branch to explore.

If we assume that neural networks can approximate any function representable by a program fragment in the language, and if we assume we are training the neural network to convergence, then the error made by the hybrid neural-symbolic program is a lower bound on the error made by any completion of the partial program. This allows us to use the error of the neural-symbolic hybrid program as an admissible heuristic for a graph search over the space of partial programs.

The algorithm has an added advantage, which is that it is an anytime algorighm. The process can be stopped at any time, and the resulting neurosymbolic program will have some neural components in the middle of a symbolic structure.

Symbolic guided deep learning

In neural relaxation, we already have a concrete program, and the goal is to find a neural network that behaves like that program. A more general version of this is when we don't have a program, but we have a set of structural constraints, and we want the neural network to behave like some program satisfying those structural constraints. One example of this is