Introduction to Program Synthesis

© Theo X. Olausson. 2025. All rights reserved.

TODO:

Changelog:

Lecture 9: Component Discovery

As we have seen, search-based program synthesis can be a very powerful tool, especially when combined with various ways of pruning and efficiently exploring the search space. However, one thing we have not yet paid much attention to is the role of the domain-specific language (DSL) itself. Fundamentally, this is actually the most important part of the synthesis process, as it defines and shapes the search space; if the DSL is too restrictive, it may not be able to express the solutions we are looking for, and if it is too broad, it may lead to an intractable search space.

In this lecture, we will explore component discovery, the task of automatically discovering new components (functions, operators, etc.) that can be added to a DSL to improve its expressiveness and efficiency. This relieves some of the burden from the programmer, who no longer has to worry quite so much about the design of the DSL, and allows the synthesis system to become more powerful and flexible over time. (Sometimes, component discovery is for this reason referred to as abstraction learning, but we will use the term component discovery here to avoid confusion with the more specific definition of "learning" used in the machine learning literature.) While it has its roots in inductive logic programming (TODO I need to find a reference for this; I remember some ILP person at Chalmers complained about it when I presented Stitch there, and also I think it showed up in the POPL reviews for Stitch), using component discovery "in the loop" together with synthesis is a relatively new idea that first gained traction in the early 2020s with the introduction of DreamCoder Ellis2021DreamCoder

DreamCoder

DreamCoder is a program synthesis system which alternates between three phases: This process is repeated in a loop, allowing the system to continuously improve its performance and discover new components that can be used in future synthesis tasks. The hope is that tasks which are too difficult to solve with the initial DSL may become solvable as the system discovers new components that can be used to express the solutions more succinctly. An example of a task here would be nice: like the prototypical learning to get the Kth largest element of a list. DreamCoder introduced several novel ideas, in particular the pairing of program synthesis with component discovery as a form of continual learning. However, it also had several limitations; in particular, it was prone to getting stuck, since bootstrapping the DSL required a carefully curated curriculum of tasks that allowed the system to discover new components in a controlled manner. Underneath the hood, DreamCoder's component discovery algorithm was also very expensive... TODO: write more about DreamCoder

Stitch

After DreamCoder had shown the potential of combining synthesis with component discovery, several follow-up systems were developed to address its limitations. One of these was Stitch bowers2023top, which massively improved the efficiency of the component discovery stage by treating it as a search problem in its own right.

The core idea behind Stitch was simple. Per DreamCoder's intuition, the optimal components to add to the DSL are those that reduce the search space on future problems the most. While we cannot measure this, we can approximate it by instead looking at the programs we have already synthesized and measuring how much smaller they would have been if we had access to a new component. Thus, for a given component $C$, we can measure its utility $U(C)$ as the total size reduction of all programs in our corpus that would be achieved by substituting in $C$: $U(C) = \sum_{p \in P} (|p| - |p[C]|)$, where $P$ is the set of programs in our corpus, $|p|$ is the size of program $p$, and $|p[C]|$ is the size of program $p$ after rewriting it with component $C$.

Stitch then uses this utility to guide a top-down search for new components. It starts with a component consisting of a single hole and then expands it by iteratively expanding holes in the component with non-terminal or terminal symbols from the DSL. Crucially, Stitch does not need to enumerate all possible components, though; instead, it cleverly constructs an upper bound $U^*(C)$ on the utility of a partial component $C$ as $U^*(C) = |C| * N_C(P)$, where $N_C(P)$ is the number of expressions in the corpus $P$ that can be rewritten with $C$. This allows Stitch to prune the search space significantly, since if we have seen a complete component $C$ with utility $U(C)$, we can immediately discard any partial component $C'$ that has a utility upper bound $U^*(C') < U(C)$. Formally, this means that Stitch is an instance of branch and bound search, which is a classic algorithm for solving combinatorial optimization problems.

Stitch also had several advantages compared to DreamCoder, for example that it was an anytime algorithm: it could be interrupted at any time and still return a useful result, since it would always return the best component found so far. This, in addition to runtime and memory improvements on the order of 100x-10,000x, made Stitch a much more practical system for component discovery. However, Stitch did make some compromises in terms of the expressiveness of the components it could discover. Since the components were only matched against the programs on a syntactic level, it could not discover components that required more complex reasoning about the equivalance of expressions, such as higher-order functions. Bowers et al. were able to provide a proof-of-concept that remedied this by running Stitch on top of a version space constructed by rewriting, allowing to discover some higher-order components, but this was not the main focus of the paper.

Babble Contemporary with Stitch, another system called Babble cao2023babble was developed, which took an altogether different approach to component discovery. Instead of taking the purely syntax-driven approach of Stitch, Babble sought to retain the expressivity of DreamCoder's component discovery algorithm, while still being able to scale to larger problems.

Babble's key technical insight was the development of Library Learning Modulo (Equational) Theories (LLMT), a component discovery algorithm that put semantic equivalence at the forefront. LLMT works as follows. Alongside the DSL, the user provides a set of equational theories, which are sets of equations that describe equivalences between expressions in the DSL. For example, in a graphical DSL, we might have an equation that states that no matter how we rotate a circle, it is still the same circle; or, in an arithmetic DSL, we might have an equation that states that $x + y - y = x$ for any $x$ and $y$. LLMT then uses these equations to rewrite the corpus of programs, into other, equivalent programs. Importantly, the results are not stored naively as individual programs, as that would lead to an exponential blowup. In fact, the number of resulting programs could even be infinite, as would be the case with the graphical example above, where we could rotate the circle by any angle. Instead, LLMT uses e-graphs, a data structure that--like a version space--efficiently stores equivalence classes of expressions.

Once the e-graph has been constructed, LLMT still needs to extract useful components from it. To do so, Cao et al. first generate a set of candidate components by applying anti-unfication to the equivalence classes in the e-graph. At a high level, their anti-unification algorithm works by taking two equivalence classes and checking if they share a constructor; if they do, anti-unification is recursively applied to the sub-expressions of the constructor; if they don't, the expressions are replaced with a variable. Applying this procedure to all pairs of equivalence classes can be done efficiently through bottom-up dynamic programming (since this ensures that, at each point, the recursive calls require no duplicate work). Once the candidate components have been generated, LLMT then applies them as equational theories to the e-graph, rewriting it with the new components and thus yielding an even larger e-graph. Finally, LLMT picks the optimal set of components to add to the DSL by identifying those that are used to construct the smallest term in the e-graph.

LLMT thus also differs from Stitch in how it discovers sets of new components. Since Stitch treats each component as a separate entity, it can only discover one component at a time. Although it does so with a guarantee that the chosen component is locally optimal (that is, that it is the component that reduces the search space the most at that point in time), this greedy stitching together of components can lead to globally suboptimal results (in particular, if choosing a locally suboptimal component would have allowed Stitch to discover a better component later on). LLMT, on the other hand, can jointly extract a set of components that are all useful together, by considering the utility of the entire set rather than individual components. However, it does so without any guarantees about global nor local optimality. Is this accurate? I'm a bit fuzzy on the details here.

LEMMA

While Stitch and Babble represent the current state of the art in component discovery as a general-purpose mechanism in the synthesis loop, other systems have been developed in specialized domains. One such system is LEMMA li2022lemma, which is a component discovery system for mathematical proofs. The observation behind LEMMA is that component discovery is arguably the core activity of mathematicians, who often care far more about discovering a new technique or theorem that can be used to prove many other theorems, than about proving any specific result in isolation. Component discovery is also key to the success of human learning in mathematics: imagine if, when teaching a student about the fundamental theorem of calculus, we had to prove each result from the Peano axioms all the way up to the theorem itself, without ever being able to use any of the results we had already proved!

In LEMMA, Li et al. adopt a similar approach to Stitch, focusing on syntactic rewriting of proof traces (that is, sequences of proof steps) to discover new components. Unlike a general-purpose DSL, the arithmetic setting of LEMMA allows for a more simple strategy, since the proofs are straight-line programs without control flow. The key idea is to look for common sequences in the proof traces, which can then be abstracted into new components. LEMMA does, however, borrow a bit from Babble in that it uses theories to expose commonalities in the proof traces before searching for the components. However, the theories only describe simple projections of the proof traces, such as removing all the "arguments" of the steps being applied (yielding, for example, the sequence "subtract, associativity, eval" from the proof trace "subtract(1), associativity((x + 1) - 1), eval(1 - 1)").

In summary, LEMMA is a specialized component discovery system that focuses on one particular domain: simple, arithmetic proofs. It is not as general-purpose as Stitch or Babble, but it is able to discover components that are useful in its context, and it does so with a much simpler algorithm. TODO should also discuss Isil's new component discovery paper (https://arxiv.org/abs/2503.24036), which Maddy said is closer to Stitch than LEMMA was.