COS 320
Assignment 11
Eirik Bakke's Documentation

NOTE: Substantial improvements have been made over what was submitted in
      Assignment 10. The bytecode interpreter and compiler retargeting is now
      completely done and well-working, and can be easily tested by following
      the procedure below. 

My project for the last assignments is one of my own devising; the goal was to
be able to compile the fun language for real hardware that I can program in
Verilog.

I originally planned on retargeting the fun compiler for a virtual machine
running on the Anitra computer, which I designed and built several years ago
(http://people.csail.mit.edu/ebakke/anitra/article.html). However, since I no
longer have a working prototype of this machine, and since the machine was
severly limited by its 8-bit data words, I decided to instead design a new
machine specifically for the purposes of this project.

*** Instructions for testing the FIC target ************************************
* Copy /fun/everything.fun to /src
* Run 'CM.make "sources.cm"; CompileFic.compile "everything.fun";' in SML
  This generates "everything.fun.fas"
* From current directory /src/fictools, run "make test"
  The output should be equivalent to that listed at the beginning of
  everything.fun, including return value, plus the 0xFEEDFEED magic number at
  the end.
*******************************************************************************

=== About my Project ====

A lot of work was involved in this project, of which the actual modification of
ML code was probably the easiest. Below is a summary of the work I did.

1) Define computer architecture (README.ebakke.Specifications.pdf)

The entire machine, called FIC (four-instruction-computer), is specified in the
file "README.ebakke.Specifications.pdf". Its design is simple enough that it
should be possible to implement it with less than a page's worth of Verilog
code. The machine has a 30-bit address bus and a 32-bit data bus, and contains
three special-purpose registers: the program counter (PC), the address register
(A), and an accumulator (R). Of these, only R is directly visible from software.
The memory is 32-bit word addressible, and each instruction takes one word; the
two most significant bits specifies one of four possible opcodes while the 30
least significant bits specify an address to be taken as an argument to the
instruction. The add and nor instructions load the contents at the specified
address and adds/NORs it to the current accumulator value. The sav instruction
saves the current accumulator value to the specified address. The jcz
instruction jumps (sets PC) to the specified address if there was no carry from
the previous add instruction or if the carry flag was reset by a nand or
previous jcz instruction. Regular I/O is performed during execution of the
instruction at address zero; at this point the contents of R is exchanged with
external input/output registers.

While the machine's instruction set may seem by far inadequate, this is not the
case. Besides simply being Turing-complete, the computer can perform many common
operations in a relatively small number of operations. For instance, pointer
dereferencing can be performed through a standard sequence of self-modifying
code, and pushing a value at location x onto some stack (growing with decreasing
addresses) can be performed with code like the following:

  nor const_c0;     / nor with ones to reset accumulator (and carry) 
  add stackp;       / load stack pointer
  add const_n1;     / decrease by one (note: will activate carry, reset below)
  sav stackp;       / save stack pointer
  add const_op_sav; / attach opcode for "sav"
  sav _deref;       / write a "sav" instruction to dereference stack pointer
  nor const_c0;     / reset accumulator again
  add x;            / load the word to be pushed
  _deref: 0;        / write to dereferenced location using self-modified code

  / In data section
  :const_c0 :const_n1 0xFFFFFFFF; / negative one / not-zero
  :const_op_sav sav;

2) Write assembler for computer (ficasm.c)

I implemented a simple assembler for the computer in C. The grammar was designed
so that very little effort would be needed to parse it correctly. Words in
memory are delimited by semicolons, and each word is defined by one or more
numeric literals and/or identifiers (constants and labels) which are ORed
together. ":somelabel" defines a new label referring to the current memory
address. "=someconst" indicates that the contents of the next word will define
a constant rather than go into memory. ".1234" skips forward to an absolute
memory address.

The assembler has no special knowledge of the computer's instruction set;
instead, the programmer can define appropriate constants as opcodes. This allows
the same assembler to be used to assemble both the native bytecode interpreter
and the virtual machine bytecode itself.

3) Write simulator for computer (ficsim.c)

The simulator loads a memory image file generated by the assembler and follows
the state transition diagram in the FIC specifications closely. It accepts the
following commands on stdin:

  "e" continues the simulation forever
  "s1234" continues the simulation until cycle 1234 is reached
  "i42" sets the value of the input register to 42
  
The value of the output register is printed every time a hardware I/O exchange
is performed. In addition, the value at address 0x1 is printed every time this
location is modified; this was extremely useful during debugging of the bytecode
interpreter. The simulator automatically exits whenever the magic word
0xFEEDFEED is seen on either the output register or the 0x1 watchpoint location.

4) Write bytecode interpreter for computer (vm.pfa)

This was the most challenging part of the project. It was not realistic to
retarget the compiler directly to the FIC instruction set as it lacks basic
abstractions such as function calls and indirection; generated code would have
been extremely large and the code generator would have been very cluttered.
Instead, I decided to write a bytecode interpreter for a virtual machine with a
dual stack-based instruction set tailored for the fun language, sort of similar
to that of MuP21 or other Forth machines (www.ultratechnology.com/p21.html).
This approach makes the retargeted Fun compiler very simple; code generation is
more straightforward than for MIPS, and register allocation/spilling is not
necessary due to the stack-oriented nature of the virtual machine.

Each virtual instruction is either one or two memory words (32-bit) long. The
first word is an opcode, the second provides an immediate argument for some
instructions. Instructions may push and/or pop values on the data stack; call
and ret uses a return stack that is kept separate from the data stack. In
addition there exists internally a "heap" stack which is used to allocate
sections of memory. (There is no garbage collector, so this will eventually fill
up.) See the comments in the bytecode interpreter for a description of the
various various instructions.

Some sanity checks are performed on the three stack pointers after the main
function generated by the compiler returns. This is useful for detecting
compiler bugs.

I used the M4 macro language to assist in coding of the bytecode interpreter.
This makes the source quite intuitive to read despite the simplicity of both the
assembler and the underlying architecture. The script src/fictools/assemble.bash
concatenates the preprocessed version of the bytecode interpreter with the
output generated by the compiler and invokes the assembler /src/fictools/ficasm
on this to produce a single "linked" memory image. This memory image can then be
simulated by /src/fictools/ficsim, or uploaded to real hardware.

5) Port mips.sml and mips.sig to equivalent FIC data structures
   (fic.sml and fic.sig), and codegen.sml to produce FIC code (codegen-fic.sml)

This was relatively straightforward. The instruction set is simpler and more
adapted to the Fun language than that of MIPS, so code generation is somewhat
simpler as well. A slightly tricky point is that local variables are addressed
relative to the top of the stack at any point in execution, so it is necessary
to keep track of the relative depth of the stack between the time a variable is
declared (in a "let" statement) and it is referenced.

6) Integrate with rest of group's compiler (compile-fic.sml)

See below.

=== Contributions to Group Effort ====
I integrated my own extension without affecting the operation of anyone else's
code. I collected 14 fun test programs written by others into a single file
(everything.fun) for ease of testing, and ran it on each of the targets. I found
a bug in the inlining code and submitted a unit test for it.

=== How well my thing worked in isolation ===

My retargeted compiler works perfectly with all available tests, plus some
simpler tests of individual features I did underways. See the testing procedure
above.

=== Integration ===
Coloring and coalescing optimizations were not applicable as the FIC virtual
machine does not make use of registers that must be allocated. The loop
and tail-call optimizations could not be integrated as they were written in a
target-specific manner; getting them to work with FIC would have required most
of the code to be rewritten. Hanjun Kim's inlining optimization worked
immediately and without any adverse effects, as it operated on the abstract
syntax tree rather than the assembler code.