Scale and Performance in a Distributed System - John Howard - 1988

Kind of Paper:
--------------
   -- design experience + methodology (plan to throw one away :)
   -- performance analysis

General Summary:
----------------
   The paper describes the experience of designing, building, and optimizing
the Andrew File System. The file system is intended as a location-transparent
distributed system. The primary design concerns were scalability and
reasonable performance. The authors designed a benchmark for measuring
distributed file system performance and used the results to evaluate their
prototype. The Andrew benchmark results were also used to identify system
bottlenecks which were addressed by redesigning the sytstem. The
benchmark was also used to compare the Andrew Files System (AFS) with the
popular NFS and the results indicated that AFS is more scalable and
performs better than NFS except under very low loads. 

Design notables of the improved system:
---------------------------------------
   -- client(Venus)-server(Vice) architecture based on user space lightweight
processes and RPC abstraction. Control through kernel syscall interception.
   -- workstations cache entire file on open and send it back to server on
   close, caching of dir structures 
   -- cache callbacks vs. status check on every open
   -- most of the CPU intensive pathname resolution work is pushed to the 
clients and files are retrieved by FID. Also, inode retrieval syscalls.
   -- volumes - relatively small (few megabytes) groups of files
introduced to improve performance and operability

Performance measurement and NFS comparison:
-------------------------------------------
   --the improved AFS scales quite well to 20 load units (100 users) per server
   --low network, low disc, and high CPU utilization
   --scales and performs better than NFS, more CLEARLY DEFINED consistency
constraints then NFS

Comments on ideas and methodology:
----------------------------------
   -- This paper is a great example of how the design process of a complex 
system should work: careful definition of system goals, prototype construction,
measurement, bottleneck identification, redesign, improvement measurement
   -- Unlike the NFS designers, the AFS creators were not afraid of keeping
state on the servers and the clients (cache callbacks, Volume location
database). The authors, however, did not deal much with the implications of
using a state aware protocol. No discussion of recovery after a crash. What
happens with callbacks, etc?
   -- I am not sure how good is that both the clients and servers run as user
level processes and gain control only when particular syscalls are intercepted.
Eventhough lightweight processes are used there is still considerable
context switching and kernel entry overhead. I don't really buy the argument 
that in kernel implementation would be less efficient because of added overhead.
   -- Since files are cached in their entirety, the simple LRU
replacement semantics is not necessarily the best way to go. It would
have been interesting if the paper explored some more sophisticated schemes
which took into account other parameters (e.g. file size) into account. 
   -- The way it is presented, the comparison with NFS is "unfair". It is
contradictory that the authors agree the NFS is designed as a small system
between mutually trusting workstations but then demonstrate that it doesn't
scale as well as a system specifically designed with scalability in mind.
   -- The performance sections were not very good since there was little to
none interpretation of results beyond the really obvious.


Scale and Performance in a Distributed System (AFS) (CMU, 1988)
Jonathan Ledlie CS 736
April 7, 2000

The CMU group exploits two main principles to make their system 
scale well: they combine the generic idea of caching for performance with 
Ousterhout's specific observation "that most files in a 4.2BSD environment are 
read in their entirety."  The result is to copy a file in its entirety to the client on 
open and to return it in its entirety to the server on close (if it has changed).  One 
outcome of this decision is fewer, but larger, operations than NFS.  "NFS generates 
nearly three times as many packets as Andrew" and demands more out of the server 
because of the increased number of operations.  From this paper, AFS definitely 
seems to scale better than NFS.    The AFS mechanism is to have a centralized group of 
servers on which Venus runs.  Clients each have a copy of Vice which contacts Venus to 
request and return complete files and directories.