"Implementing Remote Procedure Calls": Andrew Birrel and Bruce Nelson Birrel and Nelson describe their implementation of RPC in the context of the Cedar system in "Implementing Remote Procedure Calls". The basic motivation the authors provide for the implementation of an RPC package is to ease the process of writing distributed applications. Before writing their RPC package, the authors observed that few programmers in their research environment tackled distributed computation as the overhead of understanding and writing the necessary communication software was prohibitive. The authors chose to facility distributed computation through a RPC package because the procedure call is a well-understood mechanism in the context of a single machine; the RPC package merely attempts to extend this mechanism to multiple independent machines. The authors particular emphasized the simple semantics, efficiency and generality attainable with RPC. Given this motivation, the authors stressed replicating the semantics of the usual local procedure call in the distributed environment in so far as possible. The RPC package was structured in five components: the user program, user stub, RPC communications package (RPCRuntime), server stub, and server program. A program called Lupine, notionally similar to Sun's rpcgen, was responsible for constructing the stub functions automatically. This process was directed by the use of Mesa modules (Mesa is the native programming language under Cedar). As is often the case in distributed systems, the most difficult problems in the RPC package revolve around naming. A name, in this case, consists of a type and an instance. The type specifies the interface in question and the instance to a particular machine. The type and instance information is distributed with the Grapevine distributed database, which provides a hierarchy of names, the mechanisms to retrieve entries, and access control. Two important architectural choices in the design of the RPC package include the decision to have no timeouts for procedure calls and that the importation of a remote interface does not reserve resources on the exporting server. The first of these decisions means that program execution in a single process on the client is synchronous with the RPC, just as for a normal procedure call. It also means that deadlock on the remote machine halts further progress locally. (Server crashes are detected and an exception thrown in that case). The second decision means that the server can store much less state and need not handle client crashes, the simplifying its design and implementation. The authors next discuss their special purpose, reliable internet protocol for RPC, built with an eye toward good common-case performance. They do acknowledge that RPC can be layered on top of existing reliable protocols and, in fact, suggest that future work could entail efficient RPC implementations on top of more traditional network protocols. The most important features of their protocol include: (1) no communications required for an idle connection (2) no explicit termination protocol (3) a unique identifier to detect duplicates and handle server crashes. The result, they claim, is an efficient, lightweight protocol in the context of RPC (it is not suitable for bulk data transfer and it is not clear to me from the paper how well it performs on a lossy WAN). The authors do provide some performance numbers to back up their claims regarding the underlying network protocol, although I would have liked a greater emphasis here and less verbosity elsewhere. Finally, the authors discuss a few of the shortcomings and potential future work. The principal weakness the authors identify is that distributed applications that want some form of broadcast or multicast do not fit well into the RPC model. Given that the purpose of the RPC package was to make programming easier by extending an existing mechanism, one can not complain to strenuously about this limitation. Implementing Remote Procedure Calls (Xerox, 1984) Jonathan Ledlie March 22, 2000 CS 736 RPC has proven to be an elegant and simple solution to a fairly complex problem: how to allow a programmer to transparently speard work over a network. In its paradigm, a function caller is the client and the callee is the server. Both the client and server code are written with networking stubs which are then filled out with an automatic code generator, here called Lupine, later called RPCGen (for C, I think). The programmer is then able to use the functions as if they were local with a few limitations, like needing to catch networking exceptions (like server failure) and bounds on types (usually only simple primative return types are allowed). Once this networking code is generated at compile time, only a small module, called RPCRuntime need do the actual transport work at runtime. To locate a server at runtime, it uses a distributed list of available services, which are type-instance pairs. Once a server is located, the client does not need to contact the list anymore (here this is stored in Grapevine). While this reduced lookups are one (probably -- see below) good feature, another is the ability to add security on top of RPC -- they have intentionally left this up to the user. They also optimize for the common case, which are small parameters and return types. While this does not effectively allow for large objects as are often common today, this was not so much the case in 1984. This common case is exemplified by the fact that they try to limit the usage of acknowledgement packets, allowing the receipt of the return value to imply success. Two more good ideas are 1) keeping around a few idle servers to reduce forking on the server end of the calls and 2) that they do not limit there idea to just Mesa (unlike another familiar project at Xerox). Two bad points are 1) their testing seemed very limited -- on a lightly loaded machine connected to a lightly used network and 2) that because the clients only talked to Grapevine once, it was difficult to see how adding a new physical server would balance the load, as long as it stayed up. In other words, the clients would all still try using the original heavily loaded server, and do not seem to have a mechanism to look for other servers except in the case of total failure.