http://people.csail.mit.edu/jaffer/CNS/interpreter-latency | |
Interpreter Latency |
"Interpreter Speed" explains about the necessities who bore the SCM interpreter. But focussing only on speed executing repetitive computations would produce a much less useful programming environment than SCM.
Many people who use computers program them, although they may be unaware that their activity could be classified as such. Spreadsheets and scripts, databases and report generation can involve changing instructions (often stored with the data) to the application; in a word, programming.
There are computer users who don't program. When something doesn't work correctly, they are stuck. For this group, startup latency is less of an issue.
The rest of us spend significant amounts of time repeating this sequence:
For software developers, this cycle time is often the primary impediment to quick debugging and progress. If one has a very fast restart, then it is often quicker to try several variations or polarities than it is to stare at the code and count the fenceposts. I find this especially true when debugging code written by others.
Some would argue that program interactivity obviates the need for frequent restarting. But I would be surprised to find an experienced software developer who hasn't wasted hours of interactive modifications in a session because those modifications crashed the program on restart.
I have used low-latency programming techniques from SCM's inception, although I hadn't thought of it in those terms until reading a posting from Tom Lord about Guile (a descendant of SCM): "Re: Rscheme etc."
For another thing, all this talk about "replacing Guile", especially by an implementation based on compiling, is a little silly. Guile deliberately has all sorts of yummy dynamic behaviors that would be difficult and pointless to try to get right in a compiler (examples: lazily computed top-levels, Guile's low-level macro system, and very-low-latency eval).
What follows describes how SCM achieves its low restart latency. These are not profound ideas. Success is simply a matter of addressing the latency of each link in the chain between Scheme source and execution. Most of these techniques should readily apply to interpreters for other languages.
Code the read
procedure using the low-level implementation
language, not the target language (Scheme).
Source code must be processed by read
before being executed.
When creating an interpreter, developers often try to write as much of the implementation as possible in the target language. In order to reduce latency, critical functions must either be rewritten or compiled into the implementation language. If recoded by hand, then the first version in the target language was wasted effort.
If the code is to be compiled, then the compiler's performance on every language construct used in critical sections potentially gates achievement of low latency. And use of compilers for critical code sections also destroys independence between developers, as it turns most compiler issues into critical code issues.
A fast read
brings other benefits. Being able to read 1.MB
Electronic Design Interchange Format
(EDIF) files in under 1.s facilitates easy scripting of design tools.
If possible, macro-expand only source with macros. Otherwise, code the macro-expandsion using the low-level implementation language, not the target language (Scheme).
Macro expansion is another process through which source must pass
before execution. The read
discussion applies here as well.
Lazy processing reduces latency.
SCM macro-expands and optimizes each form the first time it is
executed. The time spent doing so is spread through early execution.
Although uncalled procedures will usually be read
along
with procedures which are called, other processing will not be wasted
on the unused procedures.
Interpret some derived forms directly.
Just because derived forms can be rewritten in terms of primitives, doesn't necessarily mean that the rewritten forms will be smaller or interpreted faster.
Even with 12 years experience tinkering with interpreters, prediction of the effects of changes on SCM's interpreter eludes me. Benchmark measurement is the only reliable method for gaging speed. Make sure your benchmarks are up to the task.
Provide and use the ability to dynamically load
files.
Large application programs do not use all their abilities at boot-up. Requiring all the modules, source or compiled, to be initially loaded makes the user wait and occupies RAM, needlessly displacing other processes.
Precompute source (or compiled) file locations. Do not scan search-paths.
Scanning search paths may seem like a convenience, but as library directories grow, the multiple disk seeks add 10s or 100s of milliseconds latency. These gratuitous disk reads also displace more deserving blocks from the disk cache.
Search-paths are dangerous.
Developers may unwittingly have obsolete versions of files in directories shadowing the intended source directory; leading to paradoxical differences in program behavior seen by users and developers.
Use of mnemonic filenames risks spoofing by users or other developers. Show me the programmer who has never named a file "test1"!
In the SLIB Scheme Library, precomputing module file locations has worked well with a dozen Scheme implementations for over a decade. But several implementations persist in the folly of search-paths for their non-SLIB libraries.
Group source for procedures with similar functions into one file.
Each file accessed incurs milliseconds of overhead from the file-system. Amortize this cost by grouping procedures into files.
Sorry to spend time stating the obvious; but I have seen projects which consign each tiny function to its own file!
Use moderation in literate programming and source licenses.
If the license comments or documentation in many files are larger than their source code being documented, then consider a different venue or organization for these texts.
Save the interpreter program image using unexec
.
SCM's unexec
feature was adapted from GNU-Emacs circa
1994. It creates an executable image whose read-only
text (program) segment is identical to the compiled
program, but whose data segment is a dump from a
running program. The time savings realized are for the code which
initializes memory, tables, etc.
On (Pentium II) RedHat-7.3 Linux with 2.4.18 kernel, the dumped scm executable takes 180.ms real-time to start versus 265.ms for the undumped udscm5. After both executables are resident in RAM, the dumped scm executable takes only 9.ms real-time to start versus 48.ms for the undumped udscm5.
/bin/bash's (resident) startup latency is 8.ms on the same platform. SCM is thus well suited to scripting, which has been my focus lately for the Docupage project.
Low-latency programming principally requires consideration of disk seek times and cache activity. Early, thoughtful organization of source files is a good investment of one's time. Restraining expectations for and dependence on compilers and core interpreters also help prevent the bottlenecks which lead to poor latency.
Precomputing file locations and lazy in-place optimizations were ideas I had not seen elsewhere. The others are natural consequences of looking beyond the CPU when thinking about programs.
Copyright © 2003 Aubrey Jaffer
I am a guest and not a member of the MIT Computer Science and Artificial Intelligence Laboratory.
My actions and comments do not reflect in any way on
MIT. | ||
Computer Natural Science | ||
agj @ alum.mit.edu | Go Figure! |