Design and Implementation of a Log-Structured File System

	This paper presents a fundamentally new paradigm of file system design.
Based on a hypothesis of future disk access patters and cache performance
(namely that future traffic will be dominated by writes as caches
increasingly handle read requests), the authors of this paper propose
batching writes to different files together into a single log entry on disk.
The eliminates the need for costly seek and rotational latency as, instead
of writing to several different locations on disk, each for a different
file, all writes occur in the same location on disk.  Log summaries are
written in order to keep track of metadata such as inode location and free
space.  This method introduces the idea of temporal locality on disk in
contrast to logical locality of existing file systems.  Files that were
written at approximately the same time will tend to be near to each other on
disk rather than files that are, for instance, in the same directory.
Finally, in order to avoid disasterous amounts of fragmentation over time
(that would eventually negate the benefits of such a system), the authors of
the paper propose a cleaner that would periodically run in the background
and reclaim partially or fully invalidated disk "segments," thereby creating
larger free disk portions and compacting existing data so that new data does
not have to be strewn across the disk.  Finally, the authors note the ease
with which recovery can be performed under their scheme.
	This paper is thorough and well thought out.  The authors tended to
anticipate counter-arguments.  For instance, just as I was about to critique
their performance numbers running without the cleaner, the next section
described cleaner performance.  More impressively, they related their
counter-intuitive discovery that a higher degree of locality results in
worse performance, and they both explained the reason for this anomaly and
altered their theory and implementation to reflect it.  Their use of
diagrams was helpful in understanding their arguments.  My only critiques
are the following: first, I question their basic assumption.  A system which
is either capable of performing asynchronous writes of metadata or whose
traffic is more dependent on normal read/writes of data files as opposed to
creation and deletion of files may in fact be more bound by read traffic
than write traffic, especially with low locality.  Every file that is read
must be read at least once from disk, whereas given the rules of a
particular system a write need not access the disk until it is forced to
disk (i.e., it can remain in the cache).  Thus, the lesson for system
building in this paper is not necessarily that LFS is the end-all, be-all of
file systems but that one should tailor a file system to the kind of traffic
it is empirically likely to encounter rather than to abstract notions of how
a disk should look.


The Log File System stems from a central idea: the cost of updating file meta-data is 
high in FFS, even though it has moved this data nearer to the files themselves than in 
the original Unix file system.  This is particularly true when the files are 
short-lived: we have gone to the effort of making and maintaining them just like 
their larger, persistent neighbors, only to have them soon be deleted.  The LFS 
combines two ideas: (1) sequentially writing out changes to the disk to the 
greatest extent possible (minimizing seeks), and (2) organizing files into 
long-lived and short-lived categories (this is taken from the Generational 
memory management technique).  More good points about the system are: it can treat a 
file access like a transaction (taking procedures gleaned from database work); it 
allows for quick crash recovery (with its two checkpoint areas); and it is fairly 
simple because there is no bit map or free list to manage.  Points against the LFS are 
its large amount of buffering and its garbage collection that can kick in at 
inappropriate times.  On an installation which has low usage periods, garbage 
collection could be scheduled for then.  The way the LFS works is that it keeps an 
in-memory inode map, which tells where the live inodes are located, and then 
checkpoints this map at intervals – every thirty seconds in the paper.  The garbage 
collector cleans out blocks by copying data which is in use to other partially used 
block, basically defragmenting the disk on the fly.