Design and Implementation of the Sun Network Filesystem (Sandberg et. al.)

This paper reports on the design and implementation of the Sun Network
Filesystem. It also presents some performance results.

The authors list the design goals to include: machine and operating system
independence; crash recovery; transparent access; UNIX semantics maintained
on client; performance approximately 80% of local disk.

The design is partitioned into protocol, server side and client side
components.

The protocol is further divided into NFS protocol and mount protocol (both
built on the Sun Remote Procedure Call (RPC) mechanism, which in turn is
built on the eXternal Data Representation (XDR) specification). The reason
for the separation is to ease the addition of file system access checking
methods, and also to compartmentalize the use of UNIX pathname semantics to
the mount protocol. The NFS protocol is stateless, which helps in crash
recovery and also helps in reducing the resources required on the server.
Because the protocol is stateless, the server needs to perform synchronous
writes (a performance bottleneck, for which they later describe solutions).
A data structure called the file handle is used to access files. This is
combination of the inode number, inode generation number (which is
incremented every time an inode is freed to prevent stale file handles) and
filesystem id. The client side 'mounts' an NFS, which is the only time the
client is aware of the hostname and pathname conventions of the server
(hence the separation of the NFS and mount). Mounting also allows the server
to check client credentials.

In order to implement this design, they had to modify the file system
interface to implement the concept of a Virtual File System (VFS) structure
(containing operation that can be done on a file system), and a vnode
structure (containing operations that can be done on a file or directory).
The authors deferred implementation of locking and time synchronization. The
authors describe tricks used to improve performance such as reducing the
number of bcopy(s) and treating paged in programs as swapped in if the size
were below a threshold.

The paper describes the design and implementation issues well. Descriptions
of how they tracked down and identified bottlenecks and devised workarounds
was good. I question the decision to have a stateless protocol. Could the
write performance be improved by having more state information and
asynchronous writes? Unfortunately the performance tests did not included
data on the side effects of network traffic, large number of clients and
many application types.


Implementation of the Sun Network Filesystem
Jonathan Ledlie CS 736 April 5, 2000  

The paramount concept behind Sun's NFS is statelessness, with both its
benefits and drawbacks.  The server maintains virtually no state
between calls from the clients which have mounted it (it does save the
client's credentials).  To request a file to be written, the client
sends the data over the wire to the server, the client blocks, the
server checks the client's credentials, the server writes the data to
disk (not to a cache or memory), and the server returns, unblocking
the client.  The client knows at this point that its data has made it
to disk; if it doesn't hear back from the server, it knows the data
has not, at which point the client can repeat the idempotent call.
While this may sounds arduous, it is simple.  The authors introduce
several optimizations to make NFS go faster, especially for reads.
They increase the UDP datagram size to allow transport in 8k chunks,
add two client caches, an attribute cache and a directory name lookup
cache, and add read-ahead to the server.  To enable their system to
work with many types of UNIX, they add a layer of indirection and make
use of commonly available parts.  VFS makes a file appear the same to
applications whether or not it is local - allowing most applications
to use NFS without change.  The downside to this is that they have to
modify the kernel and some inode specific tools.  The parts the
authors build on are RPC and XPR, a machine-independent data
representation.  Clearly the success of NFS is grounded in their
observation that statelessness is the best policy for most
applications - in accordance with general UNIX philosophy.  However,
in situations where speed is more important than immediate
consistency, NFS's write-through semantics may be overkill.  Another
difficulty, mentioned above, is the fact that a port requires changing
the kernel, often not the simplest task.