Baker's Measurements This paper details a trace-driven (and partially simulation driven) analysis of the sprite distributed operating and file system. It is a followup to a previous study of the BSD system using a similar distributed file system that seeks to understand how applicable the results of the previous study are after several years and evolution of systems and hardware. The important points that they seemed to find were that while it was still true that most file accesses were to small files and most of the bytes actually transferred were to large files, the large files had gotten larger and subsequently the disparity of the proportion of bytes belong to large files as opposed to small files had gotten larger. They also note that throughput had increased by at least an order of magnitude and had at the same time gotten more bursty. The paper also notes several new findings. The group found that load balancing, despite intuition to the contrary, did not reduce the effectiveness of caching. Also, the group found that paging traffic was not particularly heavy (though it is bursty), and they suggest that local disks would do little to improve performance, and in fact it might actually decrease it due to the fact that information can be gotten faster over a network than from a local disk. Finally, through simulations, the group found that a more sophisticated and efficient cache consistency mechanism would not significantly improve performance, but nonetheless the strong consistency guarantees in Sprite were indeed important as their data suggests that less stringent systems like NFS may often serve stale data. This paper was well written and easier to follow. I wish more papers were intelligent enough to use graphical aids like figure 5. The paper flows well and makes sense. It presents a structured argument and uses evidence to back up its claims. The only significant fault I found in it was the reliance on a small computing research cluster for their conclusions. This seems to be a theme in such papers: analysis of system performance and file system analysis usually seems to relate to computer science projects rather than other more "real world" applications, and in that sense it is hard to generalize their conclusions to the computing world at large. Measurements of a Distributed File System Jonathan Ledlie CS 736 April 12, 2000 This Berkeley group offers a fairly comprehensive methodology and some interesting results on caching and user behavior on a distributed file system. They repeat a six year old study on BSD, which tested users’ file access habits, on the Sprite OS. Unlike the other distributed OSs we have seen, the Sprite work was done with diskless clients – something that simply would not work with AFS. The results of this study were very similar to the original one except that large files had gotten an order of magnitude larger, skewing other characteristics in that direction. The most noticeable characteristics are: Most bytes passed over the network are from large files Most files opened are small Most connections are short lived; i.e. quick open-read-close Most data lives longer than 30 seconds; it does not get deleted in the cache and actually must be saved on the server The second half of the paper discussed the performance of Sprite’s main memory caches -- well suited to diskless operation : ). While their study does present common characteristics of files, the data is collected from a research institution and only uses eight days worth of data. Who knows how Sprite, NFS, and AFS would perform in a different environment, like a busy corporation? Their cache coherency semantics are puzzling – if somebody opens a file for writing then all other clients must read directly from the server – and seem like they would only be useful in limited applications. Process migration did not suffer the penalties they expected, because, they believed, when a process migrates it does so to perform one specific task and, therefore, gets good locality. They also surmised that the same machines were being chosen for migration, so some of the data would already be in their caches. Finally, for coherency reasons, they did not cache directories, which seem to me to be something you would access very often.