Artifact for Paper 64

Summary

This artifact accompanies Paper #64: "Program Synthesis from Polymorphic Refinement Types".

The paper presents a program synthesizer called Synquid. Synquid generates recursive functions that provably satisfy a specification given in the form of a refinement type. The tool operates by combining a novel, modular approach to refinement type reconstruction with explicit term enumeration and condition abduction.

Artifact Content

We have evaluated Synquid on a suite of synthesis benchmarks. Table 1 in the paper reports various metrics collected as part of the evaluation: sizes of specifications and code, as well as synthesis times and how they are impacted by various features of the tool. The goal of this artifact is to make it possible to reproduce and verify the results of this evaluation, as well as assess general quality of the tool.

Towards this goal we provide the following materials:

A virtual machine image equipped with Synquid, the benchmark suite, and a script that executes Synquid on the benchmarks: this is all you need in order to reproduce the results in Table 1
A web interface to Synquid: this is the easiest way to play around with the tool and test it on synthesis problems beyond the benchmarks used in the paper
A stable branch of the Synquid source code repository: a convenient way to browse the source code of the tool and the benchmarks

Reproducing experiment results

Using the virtual machine to reproduce the experiment results in Table 1:

Start the virtual machine (password: 123). A terminal window will open with the top-level Synquid directory as current directory.
(Optional) Build Synquid from sources:
cabal clean
cabal install
Navigate to the benchmarks directory:
cd benchmarks/paper
Invoke one of the three scripts that run Synquid on the benchmarks:
python2 run_small.py — only runs several small benchmarks, takes a couple of seconds, dumps solutions into run_small.log
python2 run_medium.py — runs all the benchmarks but no algorithm variants, takes several minutes, dumps solutions into run_medium.log
python2 run_all.py — the real deal: runs all algorithm variants on all benchmarks, takes over an hour (since "slow" algorithm variants run until the timeout of 120s), dumps solutions into run_all.log and produces the latex table in results.tex
Apart from producing the latex table, the scripts also output results for each benchmark to standard output. For example, the output:
IncList-Merge: merge ['-h']
3.12 OK 53.52 FAIL 22.88 OK 6.26 OK 120.04 TIMEOUT 2.68 OK
means that the benchmark IncList-Merge, described in the table as merge was run with option -h; synthesis succeeded for four of the algorithm variants (and took the amount of time in seconds listed before the OK), while one variant ran out of memory and one timed out (algorithm variants are executed in the same order as they are listed in Table 1).

The results reported in the paper were produced on a 2.60GHz CPU with 4Mb cache (running on a single core) and DDR3L 1600 MHz RAM. Running the experiment inside the virtual machine and/or on different hardware may affect the synthesis times slightly.

Note that the paper also present a comparison between Synquid and existing synthesizers (Section 4.3 and Table2). The data in Table 2 partly originates from the papers describing the respective synthesizers, and partly is copied over from Table 1; thus we do not provide special means for generating this table in the artifact.

Differences from the submitted version

The current version of the paper has minor differences with respect to the submitted version: we have fixed typos and improved the writing in the evaluation section, and restructured and extended the benchmark suite. Here is the summary of the changes to Table 1:

Some benchmarks were given more understandable names, e.g. dedup subsequences renamed to remove adjacent duplicates
Some benchmarks with multiple synthesis goals were split: reverse split into insert at end and reverse; length with fold split into fold, length using fold, and append using fold
In the Heap category: added 2-element constructor
In the RBT category: previously we only managed to synthesize left-balancing and insert was using as component a balancing function with a stronger type; the updated version features a fully-functioning RBT insertion with both left- and right-balancing
In the User category: (a) we removed the desugar AST with variables benchmark because it was very similar to another benchmark from our suite, desugar AST; (b) we added two more benchmarks from the Leon example set: creating and merging address books

Using Synquid

The best way to get started with Synquid is to go through the first two or three examples in the web interface and try modifying them. The Overview section of the paper gives an introduction to the input language and the main concepts.

Inside the virtual machine we provide, you can run Synquid from the command line using: synquid my_example.sq (use synquid --help for a full list of options). If you would like to install Synquid on your machine, get the source code from BitBucket and build it by running cabal install from its top-level directory; you will need recent versions of GHC and Cabal (we are using GHC 7.10.2 and Cabal 1.22.6.0).