Lecture 1: Introduction and Definitions
Back in 2021, with the introduction of Copilot, the broader development community got a first taste of what it meant for the machine to automatically write code for you. Since then, programming tools based on large language models (LLMs) have only grown in capabilities; just a year after Copilot, in 2022, AlphaCodeAlphaCode was already claiming to surpass the median competitor in a programming competition. All of these advances have been made possible by the rapid progress in LLMs. But LLMs by themselves are only part of the story. Code presents unique challenges that derive from the extreme precision required to write correct programs — a few characters can be the difference between a correct program and one that contains a dangerous vulnerability. But code also presents unique opportunities; unlike other tasks for which LLMs have shown promise, code benefits from precise semantics, which allow us to test it and to reason formally about its behavior. This course aims to introduce students to the broad field of program synthesis, including techniques based on large language models and reinforcement learning, but also symbolic techniques with complementary capabilities. The goal is to provide a comprehensive view of the modern program synthesis toolkit, with an emphasis on the benefits and limitations of different techniques, to allow practitioners to pick the best combination of tools for a specific task. But before diving into algorithms, this lecture aims to provide some historical context and to define the field of program synthesis.What is program synthesis
The dream of automating software development has been present from the early days of the computer age. Already back in 1945, as part of his vision for the Automatic Computing Engine, Alan Turing argued that
Instruction tables will have to be made up by mathematicians with computing
experience and perhaps a certain puzzle-solving ability…
This process of constructing instruction tables should be very fascinating.
There need be no real danger of it ever becoming a drudge, for any processes
that are quite mechanical may be turned over to the machine itself.
copeland2012alan
"code problems for itself and produce as good programs as human coders (but without the errors)" Backus:1957. Compilation and synthesis are very closely related in terms of their goals: they both aim to support the generation of software from a high-level description of its behavior. In general, though, we expect a synthesizer to do more than translate a program from one notation to another as traditional compilers do; we expect it to discover how to perform the desired task. The line can be blurry, though, since some aggressive optimizing compilers can be argued to actually discover how to perform a computation that was specified at a higher level of abstraction; autoparallelization is one such example, where the compiler seeks to discover how to parallelize a set of operations that have been described sequentially. Historically, one distinguishing feature between a compiler and a synthesizer was the use of search; however, this distinction has also become less clear in recent years, with many compilers leveraging search techniques to optimize the generated code, and with the advent of Large Language Models that can synthesize entire programs without any explicit search. Another class of techniques that is closely associated with synthesis is declarative programming, and in particular logic programming. The dream of logic programming was that programmers would be able to express the requirements of their computation in a logical form, and when given an input, the runtime system would derive an output that satisfied the logical constraints through a combination of search and deduction. So the goals are also closely related to program synthesis, but there are some important distinctions. First, rather than trying to discover an algorithm to solve a particular problem, logic programming systems rely on a generic algorithm to search for a solution to every problem. This means that for many problems, they can be dramatically slower than a specialized program. Additionally, if the problem is under-specified, the user may get a solution that is very far from that which was expected. Finally, the field of machine learning itself forms a third class of approaches that are closely related to program synthesis. The canonical problem in machine learning is finding a predictor (i.e., a function) $f : \mathcal{X} \to \mathcal{Y}$ whose behavior closely matches a given dataset $D = \{(x_i, y_i)\}_{i=1}^N$. In some sense, these datapoints $(x_i, y_i)$ can often be thought of as "input-output" pairs, in which case machine learning becomes a form of program synthesis, where we seek the program $f$ that (perhaps imperfectly) maps each input $x_i$ to its output $y_i$. However, there are some important distinctions between program synthesis and machine learning. Perhaps the biggest one is that in machine learning, the space of functions under consideration is typically restricted to those that adhere to a particular structure. For example, we may assume that $f$ is a linear function, or a decision tree, or a neural network with a fixed architecture. By contrast, a core goal of program synthesis is to discover the structure that is needed to solve a particular problem, whether it involve loops, branches, or recursion. (Although, as we will see later, we will still need to impose other forms of constraints on the space of programs.) Relatedly, another distinction is that because the function space is so tightly prescribed in machine learning, each class of functions has its own set of highly specialized (and optimized) algorithms. Meanwhile, program synthesis typically takes a broader view, leading to algorithms that are (in principle) more general. Finally, traditionally there was a very important distinction in that program synthesis aspired to discovering programs that always precisely matched the specification. This has not been the case in machine learning, where the notions of learning from noisy data and using real-valued measures of success has been deeply ingrained in the literature from the very start through the languages of probability, statistics, and optimization. However, this distinction is somewhat less relevant today, since there is growing interest within the synthesis community in algorithms that are robust to noise, or that behave well in the presence of incomplete or informal specifications. In addition to thinking of machine learning as a form of program synthesis, which perhaps is more of an illuminating exercise than an idea from which we can immediately derive practically useful techniques, recent years have seen a sharp increase in the use of machine learning techniques to support general program synthesis. In particular, the use of pre-trained Large Language Models (LLMs) such as GPT-4, Claude, and Gemini to support program synthesis has been one of the most significant developments in the field over the past five years. We will learn more about this in Unit 2, where we will cover the essential techniques that allow LLMs to synthesize programs and the strengths and weaknesses of learning-based techniques relative to symbolic approaches.
A working definition of program synthesis
So if program synthesis is not compilation, it is not logic programming, and it is not machine learning, then what is program synthesis? As mentioned before, different people in the community have different working definitions of what they would describe as program synthesis, but I believe the definition below is one that both captures most of what today we understand as program synthesis and also excludes some of the aforementioned classes of approaches.Program Synthesis Today
In 2025, chances are that many of you have heard of or even used program synthesis, although you may not have known it by that name. Large Language Models (LLMs) have become an integral part of the software development process, whether it be through a web interface such as ChatGPT, Claude or Gemini, auto-complete on steroids in your favorite IDE with Github Copilot, Tabnine or Cursor, or even end-to-end coding "agents" such as Claude Code and Cursor's Agent mode. Indeed, for many of us, the fact that we can now generate a piece of code by simply describing it in natural language has become something that we almost take for granted. Beyond consumer level systems, research-level models have achieved some impressive accomplishments through the use of massive amounts of search and compute. For example, in 2022 the AlphaCode system from DeepMind demonstrated performance comparable to that of the median human participant in a competitive programming contestAlphaCode, and shortly after, the AlphaTensor paperfawzi2022alphatensor showed that it was possible to use language models to discover new algorithms for matrix multiplication that were faster than the best known algorithms at the time. But there is more to program synthesis than LLMs. As powerful as they are, LLMs still have some important limitations: they come with weak-to-nonexistent guarantees about the code they generate; they can be difficult to adapt to new domains; they require enormous amounts of data and infrastructure to train, and consume significant energy and compute when deployed. Before the advent of LLMs, the focus of the field was on efficient search techniques that could explore large spaces of possible programs to find one that satisfied a set of requirements. Those techniques were limited to synthesizing fairly small programs, and could not take advantage of unstructured means of specification such as natural language. Despite these limitations, these techniques achieved some impressive results. For example, early success stories included the ability to synthesize Karatsuba big-integer multiplicationsketchthesis, Strassen's matrix-multiplicationSrivastava:2010, or the functional cartesian product algorithm of Barron and C. Strachey, which is considered the first functional pearlFeser:2015. The search-based techniques proved to be very effective for things like bit-vector manipulations. The winner of a program synthesis competition back in 2019 was able to synthesize every bit-vector manipulation that the organizers threw at it. Search-based techniques were also designed to work well with verification, enabling the synthesis of provably correct implementations of fairly complex algorithms; in a few years, the field was able to move from things like sorting and list reversal to algorithms and data-structure manipulations such as insertion into red-black trees or binary heapsPolikarpova:2016. Amidst all the hype, one would be forgiven for thinking that the advent of LLMs would mean that all of these techniques have now become obsolete. In fact, they are perhaps more relevant than ever. One reason why is that search-based program synthesis techniques by themselves fill an important niche in the synthesis landscape, one in which LLMs cannot (yet) compete. In stark contrast to LLMs, which are trained on large amounts of data and require significant compute resources to run, search-based techniques can be engineered to be extremely efficient and effective as long as they are tailored to a specific domain. There are many specialized applications where training data is either unavailable, or where the cost of running an LLM is simply prohibitive, in which case search-based techniques are still the best option. Another, perhaps more important, reason is that search-based techniques are also being used to improve the performance of LLMs themselves. Indeed, as we will see towards the end of Unit 2, many techniques that have been developed by the program synthesis community are already being used to improve the performance of LLMs, and to make them more reliable and easier to use even outside of the context of generating programs. So, despite the success of LLMs, program synthesis remains an active area of research with research papers being published every year in all the major programming systems conferences (PLDI, POPL, OOPSLA), as well as in formal methods (CAV, TACAS) and machine learning (NeurIPS, ICLR, ICML).Program Synthesis Applications
One of the most obvious uses of program synthesis is as a software engineering aid. This is an application with which most of you will already be familiar, as noted earlier. However, there are other applications of program synthesis that may perhaps not be as obvious, but which have proven to be very impactful.Challenges
The advent of LLMs has transformed the field of program synthesis, allowing us to attack larger and more complex programs and to support a wider variety of specifications which have expanded beyond the traditional formal specifications and examples to include natural language and even visual inputs. As we will see in this course, the field is now mature enough that we can leverage off-the-shelf tools for many synthesis applications, and even problems for which the existing off-the-shelf solutions do not work can usually be attacked by bringing together the algorithmic building blocks that we cover in this course. But despite these advances, there still remain a number of open challenges, whether one's goal is to support software engineering, or to synthesize programs to serve as interpretable models. Back in 2018 we proposed to group the challenges of Machine Programming into three pillarsGottschlichSTCR18, and while the technology has advanced significantly since then, these pillars remain a useful grouping of the challenges we will be discussing in this course.Intention. The first challenge is what we have termed the Intention challenge: how do the users tell you their goals? The definition of synthesis talks about semantic and syntactic constraints, but the exact form of these will influence all subsequent decisions about the synthesis system. Historically, early successes such as the FlashFillgulwani:2011:flashfill system popularized the use of input-output examples as a means of specification. Examples come with a number of advantages, such as the ability to treat the examples as a set of unit tests against which to validate your code, but there are many tasks for which the rigidity and verbosity of input-output examples makes them unsuitable. Indeed, the rise of the language modeling paradigm has led most recent work to instead adopt natural language as the primary means of specification, but natural language lacks precision and is difficult to automatically check for correctness.
No matter what format the specifications are given in, one big aspect of the intention challenge is how to cope with under-specification. Ultimately, the only way to completely and unambiguously characterize a program is by writing down the program itself (although perhaps not in a form that is immediately executable by your machine). Thus, typically in program synthesis we are almost always dealing with a situation in which there are multiple programs that satisfy the requirements. How can we tell which one the user actually wants? Of course one solution is to simply ignore this problem; if the user provides a partial specification, they have no right to complain if they get a different program from the one they wanted. In practice, though, making a good choice can make the difference between a system that is useful, and one that is not.Invention. Once we know what the user wants, the second challenge is to actually discover a piece of code that will satisfy those requirements. Arguably this is the central challenge of synthesis, as it potentially involves inventing new algorithmic solutions to a problem. One of the key questions we will be dealing with in this course are different techniques that the community has developed to tackle the inherent complexity of this task.
It is important to note that while LLMs have demonstrated impressive capabilities in this regard, even solving programming competition problems that require significant algorithmic ingenuity, they are not a magic bullet. They exhibit significant deficiencies when it comes to problems farther out their training distribution, as well as generating solutions unfamiliar domains-specific languages, and often require prohibitive amounts of compute to search for a solution.Adaptation The canonical view of synthesis is that the user is creating a brand new algorithm from scratch, and wants to leverage a synthesizer to create a correct implementation of the desired algorithm. However, most software development involves working in the context of existing software systems, fixing bugs, optimizing code, and performing other kinds of maintenance tasks. This pillar deals with the question of synthesis in a broader context, and the application of synthesis ideas to broader software development tasks beyond green-field software creation. There are a number of compelling applications of program synthesis in support of the broader software development process, especially debugging and optimization.