Disco: Running Commodity Operating Systems on Scalable Multiprocessors E. Bugnion, S. Devine, and M. Rosenblum (CSL, Stanford) In this paper Bugnion et al. describe the Disco operating system which is designed to make it easier to extend "commodity" operating systems to multiprocessor machines. Disco allows systems programmers to extend existing operating systems by providing a virtual machine abstraction. As such, it is nothing new -- a great deal of work was done in the 1970s on virtualization (e.g. IBM's VM/370, Cray's early versions of hosted Unicos). The new twist the authors bring to the subject of virtualization is that their system is designed to make it easier to deal with scalability, fault isolation, and non-uniform memory access times without a complete re-write of a traditional operating system for new hardware. By minimizing the amount of work taken to get an existing system running on new hardware, the reliability of the resulting system is increased and the time taken to produce a working system reduced. The virtual machine approach has several benefits, including the ability to share memory across VM boundaries with relatively small changes to existing system software and the ability to run multiple operating systems on the same physical machine. The latter ability is particularly useful for migration to a new system and to support special-purpose operating systems for particular tasks such as scientific computation. Virtualization is not a panacea: costs include (1) overhead of virtualizing hardware resources (CPUs, disks, etc.) (2) resource management (3) communication among processors. Disco attacks the problem by running multiple independent VMs simultaneously on the same hardware. It virtualizes the kernel address space, uses dynamic page migration and replication to hide the non-uniformity of memory access times, and virtualizes I/O devices, providing a special abstraction for SCSI and network device interfaces. In order to achieve reasonable performance, Disco uses direct execution for most operations. The difficult and expensive part, however, is the detection and emulation of services that can not be safely exported in raw form. For instance, to virtualize memory, Disco maintains a set of physical-to-machine address mappings and performs the necessary translations by entering mappings into the MIPS's software-controlled TLB. The trouble with this approach is two-fold: (1) some kernel segments on the MIPS are traditionally direct mapped (2) TLB misses are now both more frequent and more expensive. The first problem is addressed by changing the client operating system; the second by maintaining a second-level cache for TLB entries. The authors also briefly describe their NUMA memory management scheme, which attempts to hide the unusual aspects of the architecture from clients running in a Disco VM. In addition to memory and CPU virtualization, Disco provides virtual DMA, network devices, and disks. In the last two sections of the paper, the authors describe their experimental results and related work. The results given by the authors were produced by running the system on the SimOS machine simulator rather than on real hardware. The overhead of simulation forced the experiments to be smaller in duration and scope than would otherwise have been possible. That said, Bugnion et al. provide reasonably good performance numbers. On the basis of these results, they conclude that the overhead due to virtualization is acceptable for many applications (the range they report is 3 to 16%, depending upon the application). Disco: Running Commodity Operating Systems on Scalable Multiprocessors (Stanford, 1997) Jonathan Ledlie cs736 Operating Systems February 4, 2000 After one member of our reading group (Brian Forney) explained what ccNUMA was and how it was different than SMP/UMA, the ideas in this paper made much more sense and seemed like a practical solution to a difficult problem. The problem is that the hardware jocks are coming out with new hardware that they believe is better, but their empirical tests are limited by the fact that no operating system (and hence no benchmarking software) runs on their brand new hardware. They must often wait years for a new OS to take advantage of their hardware. This Stanford group's pragmatic solution to this dilemna is to coat the new hardware with a thin veneer of an OS, which simulates older hardware to upper layers. They call this base coat Disco. Disco then allows several operating systems on top of it simultaneously, each of which uses the hardware in the way it knows how. If some OS knows how to deal with the ccNUMA hardware, it is told about it; otherwise Disco provides a more traditional hardware view which the OS knows how to handle. Particular examples of this are locality of reference (when one CPU is repeatedly asking for a page that is far away, it is replicated locally) and dealing with the fact that not all memory references take the same amount of time, depending on the number of hops to where the data actually is (ccNUMA presents a flat memory image). Using virtual machines is not a new idea, but one key, unstated point of the paper is that some of these ideas which we dropped in the 1970s, like virtual machines, may have new applications today. One difficulty we had with this paper's concept is that even though OS's may be moving to using a HAL (hardware abstraction layer) to allow them to more easily port and add new devices, in order to run on top of Disco, some tinkering is still needed in the HAL: "powerful software companies" must still be convinced "that running on their hardware is worth the effort of the port." We also found that the idea of starting up some commodity OS just to run some piece of software and then having that OS shut down difficult, because then either the remaining OSs would not be using all the resources of the machine or Disco would have to give them the newly released resources on the fly (e.g. try plugging in more RAM while your computer is running) and most OSs are only made to account for their resources at boot time.