This virtual machine contains programs that you can run to evaluate the results presented in our paper.

After logging in (password: sle2017), open a terminal by clicking on the black rectangular icon on the left. Go to the artifact directory by entering cd ~/sle17.rifl.artifact/

Source code of an RIFL interpreter (section 3.3)

We present the source code for an Ocaml RIFL interpreter in rifl/interpreter/. The RIFL language is presented briefly in the paper (section 3). Its full syntax and operational semantics are presented in our accompanying technical report (section 3).

The source code corresponds to RIFL as follows:

File rifl/interpreter/parser.mly implements the RIFL syntax.
File rifl/interpreter/rifl.ml implements the RIFL operational semantics.

We will describe how to run this interpreter later with the benchmark applications (section 5).

Controlled experiment (section 4)

The major results of our paper are from the controlled experiment (section 4). We present the steps to obtain these results.

Study setup (section 4.1)

We included the original virtual machines and instructions used in the controlled experiment, in the separate folder "full_design_raw_data" (not in this virtual machine). This separate folder contains a readme file that points to the materials used to set up the controlled experiment. The folder also contains the raw data that we collected.

For your convenience, we have copied the relevant files into this virtual machine, under experiment/. This directory contains the following files:

Starter files for the tasks in the study: experiment/starter/
Original programs that the developers submitted during the study: experiment/submissions/*/prog*.stu, among which prog3.stu is the thumbnail programs discussed in the paper. More information about the other files are available in the separate folder "full_design_raw_data" (not in this virtual machine).
Programs that we used for analysis: experiment/submissions/*/thumbnail.stu. As discussed in the paper (section 4.2 second paragraph), we fixed minor bugs of the original programs prog3.stu. The fixed programs are named as thumbnail.stu. To see our slight modifications, run diff prog3.stu thumbnail.stu.
Three different interpreters for the study:
- Two interpreters used in the study: experiment/interpreters/{inspect,control}/, for the inspect group and the control group, respectively.
- Another interpreter for running the control group programs in failure-oblivious mode (section 4.1 "comparison with failure-oblivious executions") during analysis: experiment/interpreters/foc/
Input files for testing the developer programs: experiment/inputs/, which are used in the paper (table 5) and presented in the technical report (appendix A).
Outputs from running the developer programs: experiment/outputs/, which are described in the paper (table 5).
Scripts to run the developer programs during analysis: experiment/{test,run,complexity}.sh, which we will describe below.

Outputs (table 5)

Please first compile the interpreters as follows:

cd experiment/interpreters/
make
cd ../

To run a single program with a single input file, use ./run.sh [interpreter version] [developer id] [input name]. For example:

./run.sh inspect i1 bufovfint1
./run.sh control c1 bufovfint1
./run.sh foc c1 bufovfint1

To test all the thumbnail programs with all the input files, use ./test.sh. You should see the following in the terminal:

Program submissions/i1/thumbnail.stu ==> outputs/i1.out
Program submissions/i2/thumbnail.stu ==> outputs/i2.out
Program submissions/i3/thumbnail.stu ==> outputs/i3.out
Program submissions/i4/thumbnail.stu ==> outputs/i4.out
Program submissions/i5/thumbnail.stu ==> outputs/i5.out
Program submissions/c1/thumbnail.stu ==> outputs/c1.out and outputs/c1.foc.out
Program submissions/c2/thumbnail.stu ==> outputs/c2.out and outputs/c2.foc.out
Program submissions/c3/thumbnail.stu ==> outputs/c3.out and outputs/c3.foc.out
Program submissions/c4/thumbnail.stu ==> outputs/c4.out and outputs/c4.foc.out
Program submissions/c5/thumbnail.stu ==> outputs/c5.out and outputs/c5.foc.out

The outputs of the thumbnail programs are stored in experiments/outputs/ for each developer program. These outputs are consistent with the behavior presented in the paper (table 5), with the few exceptions that some programs with long-running loops may time out and output "[Aborted for time out]" with the current test script, though these programs could have terminated in a longer time.

You may compare these outputs to the correct outputs presented in our accompanying technical report (appendix A). You may also add other input files to experiments/inputs/, run test.sh again, and see the updated outputs in experiments/outputs/.

Defects (table 4)

We identified defects in the developer thumbnail programs by running tests and analyzing code manually. Please read the programs in experiment/submissions/*/thumbnail.stu and refer to the test outputs (table 5) to verify the defects that we found (table 4).

Code complexity (figure 6)

We provide parsers to analyze the code complexity.

cd experiment/complexity/
make
cd ../
./complexity.sh

You should see the following output:

developer | group | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
i1 | inspect | 15 | 6 | 61 | 58 |  109
i2 | inspect | 12 | 7 | 49 | 45 |   64
i3 | inspect | 13 | 2 | 63 | 62 |   92
i4 | inspect | 15 | 5 | 59 | 56 |   92
i5 | inspect | 13 | 6 | 41 | 37 |   56
c1 | control | 20 | 13 | 50 | 43 |   74
c2 | control | 24 | 11 | 85 | 76 |  118
c3 | control | 43 | 25 | 93 | 84 |  178
c4 | control | 34 | 21 | 99 | 80 |  136
c5 | control | 22 | 12 | 63 | 57 |  100

These numbers are presented in the paper (figure 6) and used in statistical analysis.

Statistical analysis (section 4.2)

We provide a script experiment/statistics.r, written in R, to perform statistical analysis presented in the paper.

To use this script, first launch the R software by clicking at the "R" icon on the left. In the R terminal, enter the following command:

source ("/home/rifl/sle17.rifl.artifact/experiment/statistics.r", echo = TRUE)

You should see at least these outputs in the R terminal:

> # Unconditional statements
> wilcox.test(inspect_uncond, control_uncond, paired=TRUE, alternative="l")
 
    Wilcoxon signed rank test
 
data:  inspect_uncond and control_uncond
V = 1, p-value = 0.0625
alternative hypothesis: true location shift is less than 0
 
There were 14 warnings (use warnings() to see them)

Please scroll up and find other outputs that did not fit in the screen. The p-values produced in R is consistent with the results reported in the paper (section 4.2).

To quit the R software, enter q() in the R terminal. When prompted to save workspace, enter n.

Other input formats (section 5)

We present the benchmark applications discussed in the paper (section 5) and the technical report (section 5). Each application was implemented in four different versions that have the same functionality (technical report section 5.1).

You may run these programs using the full RIFL interpreter. To do so, first compile the full RIFL interpreter.

cd rifl/interpreter/
make
cd ../applications/
./run.sh csv plain # The conventional version
./run.sh csv idir # The full RIFL version

Detailed steps to run each application is as follows.

Programs to process binary formats ZIP, PCAP, and PNG:

cd rifl/interpreter/
make
cd ../applications/
./run.sh zip plain # The conventional version
./run.sh zip idir # The full RIFL version
./run.sh zip edir # Explicit error detection (explicit assertions for all errors) and implicit error recovery
./run.sh zip eder # Explicit error detection and error recovery, using iterators
diff zipplain zipidir
diff zipplain zipedir
diff zipplain zipeder

Note the output files are the same across all four versions

Programs to process text formats JSON, OBJ, and CSV:

cd rifl/interpreter/
make
cd ../applications/
./run.sh json plain # The conventional version
./run.sh json idir # The full RIFL version
./run.sh json edir # Explicit error detection (explicit assertions for all errors) and implicit error recovery
./run.sh json eder # Explicit error detection and error recovery, using iterators

Note the outputs on the screen are the same across all four versions.

Code complexity

We provide a parser to compute the code complexity for benchmark programs.

cd rifl/complexity/
make
cd ../applications/
./complexity.sh

You should see the following:

app | version | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
png | idir | 40 | 38 | 69 | 61 |   97
png | edir | 41 | 49 | 79 | 61 |  107
png | eder | 62 | 54 | 97 | 78 |  144
png | plain | 65 | 57 | 103 | 82 |  153
 
app | version | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
json | idir | 26 | 19 | 94 | 85 |  130
json | edir | 33 | 31 | 99 | 85 |  135
json | eder | 64 | 45 | 164 | 141 |  238
json | plain | 88 | 67 | 183 | 155 |  264
 
app | version | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
zip | idir | 25 | 18 | 109 | 101 |  138
zip | edir | 30 | 42 | 128 | 101 |  157
zip | eder | 55 | 44 | 148 | 119 |  204
zip | plain | 62 | 51 | 155 | 124 |  214
 
app | version | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
csv | idir | 8 | 4 | 35 | 31 |  48
csv | edir | 8 | 11 | 42 | 31 |  55
csv | eder | 23 | 15 | 57 | 47 |   81
csv | plain | 39 | 28 | 69 | 58 |   97
 
app | version | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
obj | idir | 17 | 20 | 80 | 64 |  105
obj | edir | 18 | 24 | 83 | 64 |  108
obj | eder | 42 | 29 | 115 | 93 |  172
obj | plain | 50 | 36 | 129 | 106 |  190
 
app | version | cyclomatic | cond-clauses | all-stmt | uncond-stmt | loc
pcap | idir | 35 | 29 | 179 | 159 |  242
pcap | edir | 41 | 48 | 192 | 159 |  255
pcap | eder | 81 | 59 | 257 | 221 |  351
pcap | plain | 86 | 64 | 266 | 227 |  364

Which are the numbers used to calculate the percentage numbers in the paper (section 5). The calculation is discussed in more detail in the technical report (section 5.2).

Error logs (footnote 2)

The default setting for all interpreters in this artifact is to produce only the standard outputs of the programs, without generating error logs.

To enable the runtime error logs mentioned in the paper (footnote 2), run the interpreter with verbose level 1 or 2. For example, you may modify the script rifl/applications/run.sh at lines 21 and 23 to use the interpreter with option 1 or 2 instead of 0:

    ../interpreter/a.out $prog text 1
    ../interpreter/a.out $prog binary 2; mv output.bin $app$version