COS 320 Assignment 11 Eirik Bakke's Documentation NOTE: Substantial improvements have been made over what was submitted in Assignment 10. The bytecode interpreter and compiler retargeting is now completely done and well-working, and can be easily tested by following the procedure below. My project for the last assignments is one of my own devising; the goal was to be able to compile the fun language for real hardware that I can program in Verilog. I originally planned on retargeting the fun compiler for a virtual machine running on the Anitra computer, which I designed and built several years ago (http://people.csail.mit.edu/ebakke/anitra/article.html). However, since I no longer have a working prototype of this machine, and since the machine was severly limited by its 8-bit data words, I decided to instead design a new machine specifically for the purposes of this project. *** Instructions for testing the FIC target ************************************ * Copy /fun/everything.fun to /src * Run 'CM.make "sources.cm"; CompileFic.compile "everything.fun";' in SML This generates "everything.fun.fas" * From current directory /src/fictools, run "make test" The output should be equivalent to that listed at the beginning of everything.fun, including return value, plus the 0xFEEDFEED magic number at the end. ******************************************************************************* === About my Project ==== A lot of work was involved in this project, of which the actual modification of ML code was probably the easiest. Below is a summary of the work I did. 1) Define computer architecture (README.ebakke.Specifications.pdf) The entire machine, called FIC (four-instruction-computer), is specified in the file "README.ebakke.Specifications.pdf". Its design is simple enough that it should be possible to implement it with less than a page's worth of Verilog code. The machine has a 30-bit address bus and a 32-bit data bus, and contains three special-purpose registers: the program counter (PC), the address register (A), and an accumulator (R). Of these, only R is directly visible from software. The memory is 32-bit word addressible, and each instruction takes one word; the two most significant bits specifies one of four possible opcodes while the 30 least significant bits specify an address to be taken as an argument to the instruction. The add and nor instructions load the contents at the specified address and adds/NORs it to the current accumulator value. The sav instruction saves the current accumulator value to the specified address. The jcz instruction jumps (sets PC) to the specified address if there was no carry from the previous add instruction or if the carry flag was reset by a nand or previous jcz instruction. Regular I/O is performed during execution of the instruction at address zero; at this point the contents of R is exchanged with external input/output registers. While the machine's instruction set may seem by far inadequate, this is not the case. Besides simply being Turing-complete, the computer can perform many common operations in a relatively small number of operations. For instance, pointer dereferencing can be performed through a standard sequence of self-modifying code, and pushing a value at location x onto some stack (growing with decreasing addresses) can be performed with code like the following: nor const_c0; / nor with ones to reset accumulator (and carry) add stackp; / load stack pointer add const_n1; / decrease by one (note: will activate carry, reset below) sav stackp; / save stack pointer add const_op_sav; / attach opcode for "sav" sav _deref; / write a "sav" instruction to dereference stack pointer nor const_c0; / reset accumulator again add x; / load the word to be pushed _deref: 0; / write to dereferenced location using self-modified code / In data section :const_c0 :const_n1 0xFFFFFFFF; / negative one / not-zero :const_op_sav sav; 2) Write assembler for computer (ficasm.c) I implemented a simple assembler for the computer in C. The grammar was designed so that very little effort would be needed to parse it correctly. Words in memory are delimited by semicolons, and each word is defined by one or more numeric literals and/or identifiers (constants and labels) which are ORed together. ":somelabel" defines a new label referring to the current memory address. "=someconst" indicates that the contents of the next word will define a constant rather than go into memory. ".1234" skips forward to an absolute memory address. The assembler has no special knowledge of the computer's instruction set; instead, the programmer can define appropriate constants as opcodes. This allows the same assembler to be used to assemble both the native bytecode interpreter and the virtual machine bytecode itself. 3) Write simulator for computer (ficsim.c) The simulator loads a memory image file generated by the assembler and follows the state transition diagram in the FIC specifications closely. It accepts the following commands on stdin: "e" continues the simulation forever "s1234" continues the simulation until cycle 1234 is reached "i42" sets the value of the input register to 42 The value of the output register is printed every time a hardware I/O exchange is performed. In addition, the value at address 0x1 is printed every time this location is modified; this was extremely useful during debugging of the bytecode interpreter. The simulator automatically exits whenever the magic word 0xFEEDFEED is seen on either the output register or the 0x1 watchpoint location. 4) Write bytecode interpreter for computer (vm.pfa) This was the most challenging part of the project. It was not realistic to retarget the compiler directly to the FIC instruction set as it lacks basic abstractions such as function calls and indirection; generated code would have been extremely large and the code generator would have been very cluttered. Instead, I decided to write a bytecode interpreter for a virtual machine with a dual stack-based instruction set tailored for the fun language, sort of similar to that of MuP21 or other Forth machines (www.ultratechnology.com/p21.html). This approach makes the retargeted Fun compiler very simple; code generation is more straightforward than for MIPS, and register allocation/spilling is not necessary due to the stack-oriented nature of the virtual machine. Each virtual instruction is either one or two memory words (32-bit) long. The first word is an opcode, the second provides an immediate argument for some instructions. Instructions may push and/or pop values on the data stack; call and ret uses a return stack that is kept separate from the data stack. In addition there exists internally a "heap" stack which is used to allocate sections of memory. (There is no garbage collector, so this will eventually fill up.) See the comments in the bytecode interpreter for a description of the various various instructions. Some sanity checks are performed on the three stack pointers after the main function generated by the compiler returns. This is useful for detecting compiler bugs. I used the M4 macro language to assist in coding of the bytecode interpreter. This makes the source quite intuitive to read despite the simplicity of both the assembler and the underlying architecture. The script src/fictools/assemble.bash concatenates the preprocessed version of the bytecode interpreter with the output generated by the compiler and invokes the assembler /src/fictools/ficasm on this to produce a single "linked" memory image. This memory image can then be simulated by /src/fictools/ficsim, or uploaded to real hardware. 5) Port mips.sml and mips.sig to equivalent FIC data structures (fic.sml and fic.sig), and codegen.sml to produce FIC code (codegen-fic.sml) This was relatively straightforward. The instruction set is simpler and more adapted to the Fun language than that of MIPS, so code generation is somewhat simpler as well. A slightly tricky point is that local variables are addressed relative to the top of the stack at any point in execution, so it is necessary to keep track of the relative depth of the stack between the time a variable is declared (in a "let" statement) and it is referenced. 6) Integrate with rest of group's compiler (compile-fic.sml) See below. === Contributions to Group Effort ==== I integrated my own extension without affecting the operation of anyone else's code. I collected 14 fun test programs written by others into a single file (everything.fun) for ease of testing, and ran it on each of the targets. I found a bug in the inlining code and submitted a unit test for it. === How well my thing worked in isolation === My retargeted compiler works perfectly with all available tests, plus some simpler tests of individual features I did underways. See the testing procedure above. === Integration === Coloring and coalescing optimizations were not applicable as the FIC virtual machine does not make use of registers that must be allocated. The loop and tail-call optimizations could not be integrated as they were written in a target-specific manner; getting them to work with FIC would have required most of the code to be rewritten. Hanjun Kim's inlining optimization worked immediately and without any adverse effects, as it operated on the abstract syntax tree rather than the assembler code.