Scale and Performance in a Distributed System - John Howard - 1988 Kind of Paper: -------------- -- design experience + methodology (plan to throw one away :) -- performance analysis General Summary: ---------------- The paper describes the experience of designing, building, and optimizing the Andrew File System. The file system is intended as a location-transparent distributed system. The primary design concerns were scalability and reasonable performance. The authors designed a benchmark for measuring distributed file system performance and used the results to evaluate their prototype. The Andrew benchmark results were also used to identify system bottlenecks which were addressed by redesigning the sytstem. The benchmark was also used to compare the Andrew Files System (AFS) with the popular NFS and the results indicated that AFS is more scalable and performs better than NFS except under very low loads. Design notables of the improved system: --------------------------------------- -- client(Venus)-server(Vice) architecture based on user space lightweight processes and RPC abstraction. Control through kernel syscall interception. -- workstations cache entire file on open and send it back to server on close, caching of dir structures -- cache callbacks vs. status check on every open -- most of the CPU intensive pathname resolution work is pushed to the clients and files are retrieved by FID. Also, inode retrieval syscalls. -- volumes - relatively small (few megabytes) groups of files introduced to improve performance and operability Performance measurement and NFS comparison: ------------------------------------------- --the improved AFS scales quite well to 20 load units (100 users) per server --low network, low disc, and high CPU utilization --scales and performs better than NFS, more CLEARLY DEFINED consistency constraints then NFS Comments on ideas and methodology: ---------------------------------- -- This paper is a great example of how the design process of a complex system should work: careful definition of system goals, prototype construction, measurement, bottleneck identification, redesign, improvement measurement -- Unlike the NFS designers, the AFS creators were not afraid of keeping state on the servers and the clients (cache callbacks, Volume location database). The authors, however, did not deal much with the implications of using a state aware protocol. No discussion of recovery after a crash. What happens with callbacks, etc? -- I am not sure how good is that both the clients and servers run as user level processes and gain control only when particular syscalls are intercepted. Eventhough lightweight processes are used there is still considerable context switching and kernel entry overhead. I don't really buy the argument that in kernel implementation would be less efficient because of added overhead. -- Since files are cached in their entirety, the simple LRU replacement semantics is not necessarily the best way to go. It would have been interesting if the paper explored some more sophisticated schemes which took into account other parameters (e.g. file size) into account. -- The way it is presented, the comparison with NFS is "unfair". It is contradictory that the authors agree the NFS is designed as a small system between mutually trusting workstations but then demonstrate that it doesn't scale as well as a system specifically designed with scalability in mind. -- The performance sections were not very good since there was little to none interpretation of results beyond the really obvious. Scale and Performance in a Distributed System (AFS) (CMU, 1988) Jonathan Ledlie CS 736 April 7, 2000 The CMU group exploits two main principles to make their system scale well: they combine the generic idea of caching for performance with Ousterhout's specific observation "that most files in a 4.2BSD environment are read in their entirety." The result is to copy a file in its entirety to the client on open and to return it in its entirety to the server on close (if it has changed). One outcome of this decision is fewer, but larger, operations than NFS. "NFS generates nearly three times as many packets as Andrew" and demands more out of the server because of the increased number of operations. From this paper, AFS definitely seems to scale better than NFS. The AFS mechanism is to have a centralized group of servers on which Venus runs. Clients each have a copy of Vice which contacts Venus to request and return complete files and directories.