Lecture 16: Neurosymbolic programming
Earlier in the course, we contrasted machine learning—where the goal is to learn a function from data—
with inductive program synthesis, where we are also interested in discovering a function that fits data,
but where we want the function to be expressed as a program in a given programming language, and where we
may have structural or additional behavioral constraints that the program needs to satisfy.
More recently, in this unit, we saw how deep neural networks in general—and large language models in particular—
can be used to solve program synthesis problems. But even though deep learning is an essential component in these approaches,
the final goal is still to find a program that meets the desired behavioral and structural constraints.
For this lecture, we explore a middle point between machine learning and program synthesis that we term
Neurosymbolic Programming (NSP).
Similar to program synthesis, the goal is to discover a function from data that meets certain structural and behavioral constraints,
but we are not fully restricted to the programs that are expressible in a given programming language. The programs can be hybrids
between traditional code and neural networks, and in some cases they can be compositions of neural networks with constraints
that give them some of the modularity of programs. In short, the goal is to
learn models that capture symbolic knowledge
in the form of program structures.
Neurosym:BuildingBlocks
In order to do this, we are going to be relying on a set of algorithmic building blocks that will allow us to combine the benefits
of deep learning with the benefits of symbolic program synthesis.
- Program search. The ability to search for programs that satisfy a set of examples using the techniques
from the past two units is a key building block in NSP
- Relaxation. This is the ability to take a traditional symbolic program and make it differentiable. This
can be done either by symbolically “smoothing” the program element by elementChaudhuriS10
or by training a neural network to approximate the behavior of the program.
- Symbolic guided deep learning. This is the ability to train a neural network while constraining the program
to behave like a program written in a symbolic programming language.
- Distillation. This is the ability to extract a symbolic program using the neural network as a teacher.
- Symbolic abstraction (a.k.a Component discovery). This corresponds to the component discovery techniques that
we introduced in Lecture 9.
Program search and symbolic abstraction have already been covered in this course, so for the rest of the lecture, we focus on
relaxation, distillation and symbolic guided deep learning.
Relaxation
Neurosym:symbolicVsNeural1;
Neurosym:symbolicVsNeural2
The idea of relaxation is to take a symbolic program and make it differentiable. There are two basic appraoches to do this.
The first approach is to do it symbolically, by replacing each program element with a differentiable approximation.
Swarat Chaudhuri and I wrote an early paper showing how to do this
ChaudhuriS10. The goal in that paper was to
derive a smooth approximation of a program that could be used to optimize the parameters of the program through numerical optimization.
More recently, there has been significant interest in the area of
differentiable programming, which provides differentiable analogs
to common programming constructs which can be composed to form differentiable programs
AbadiP20.
An alternative approach is to use the program as a data generator to train a neural network that imitates the behavior of the program.
The basic idea is simple, one can generate random inputs for a program and then train a neural network to predict the outputs
from the inputs. The resulting neural network can then be used as a differentiable approximation of the program.
Alex Renda, Yi Ding and and Michael Carbin together explored some applications of these
Neural SurrogatesRenda0C21.
The most straightforward application is to use neural surrogates to optimize program parameters through gradient descent,
but they also highlight other applications such as speeding up the execution of an otherwise slow algorithm.
In a different paper, the same authors also show how to better leverage program structure in order to get samples that better
represent the different behaviors of the program
RendaDC23.
Combining Relaxation and Program search
Relaxation can be combined with program search both to find programs that match a set of input-output examples, but also
to find programs that combine symbolic and neural elements. This was demonstrated by Shah et al. in their work on
the NEAR algorithm
ShahAdmissibleHeuristics2020.
The algorithm builds on the idea of
angelic non-determinism, which had previously been proposed as a way to determine whether
a partial program could possibly be completed to satisfy a given specification
BodikCGKTBR10.
The idea is that given a partial program, completing the program with pieces of code that work for all inputs may be quite hard,
but imagine if we could execute the partial program such that every time execution reaches an uknown piece of code,
we ask an oracle to produce a value that allows the program satisfy the specification. The job of the oracle is simpler than the
overall synthesis problem, because it does not have to figure out a program that works for all inputs, only a value that works
for the current input. If no such value can be found, then clearly there is no way the partial program could be completed to satisfy
the specification for all inputs.
Neurosym:near1;
Neurosym:near2;
Neurosym:near3;
Neurosym:near4;
Neurosym:near5;
Neurosym:near6;
Neurosym:near7;
Neurosym:near8;
Neurosym:near9;
Neurosym:near10;
Neurosym:near11;
Neurosym:near12;
Neurosym:near13
NEAR builds on this idea by training a neural network to serve as the oracle.
The algorithm works like a traditional
top-down program search, but every time it generates a new
partial program as part of the search, it trains a neural network to serve as the oracle for each hole in the partial program.
If no such oracle can be trained, the assumption is that the partial program cannot be completed to satisfy the specification,
so the search choses a different branch to explore.
If we assume that neural networks can approximate any function representable by a program fragment in the language,
and if we assume we are training the neural network to convergence,
then the error made by the hybrid neural-symbolic program is a lower bound on the error made by any completion of the partial program.
This allows us to use the error of the neural-symbolic hybrid program as an admissible heuristic for a graph search
over the space of partial programs.
The algorithm has an added advantage, which is that it is an anytime algorighm. The process can be stopped at any time,
and the resulting neurosymbolic program will have some neural components in the middle of a symbolic structure.
Symbolic guided deep learning
In neural relaxation, we already have a concrete program, and the goal is to find a neural network that behaves like that program.
A more general version of this is when we don't have a program, but we have a set of structural constraints, and we want the neural
network to behave like some program satisfying those structural constraints. One example of this is