\reviewtitle{High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two} \reviewlabel{blake03picktwo} \reviewauthor{Charles Blake and Rodrigo Rodrigues} This is a mini-summary of a discussion with David Holland, Lex Stein, Matt Amis, and Jeff Bezanson. The authors argue that creating a p2p file system with reliability guarantees is a bogus idea because there just isn't enough bandwidth to support the redundancy required. They set up a simple model that just accounts for the amount of data that a node would need to download when it first joined the system. Then they multiply that by two for a reason I can't remember off the top of my head right now (in fact, there is a fair bit of number fuzziness). They then show that in order to support an interesting amount of data, these downloads and uploads -- not including any searches or user-sponsored data transfer -- would exceed the bandwidth available to the average node. There is a lot of averaging in creating their numbers, but they do seem to have a good point. They continue with their model showing how things change if you only use the most reliable 5% of nodes, e.g. They get the data showing system behavior from Gnutella. It is unclear if it is directly applicable, because users of a cooperative backup system might have more reason to stay connected than the average Gnutella user. If, however, the Gnutella clients connect to Gnutella every time the user dials up -- whether or not the user specifies it to -- then it may reflect user behavior. It might be interesting to get some of this data from an ISP. Strangely, they stop at 30 hours for the greatest leave time tau, when it seems that many people might be members of Gnutella for far longer than that. Another strange measurement was that they said they measured the exact Gnutella membership, whereas a paper about measuring Gnutella was unable to determine its exact membership -- seems sketchy. We discussed how computation could be spread out to other machines in a p2p network: - security/verification that the computation has been done correctly is a big problem - sandboxing, freebsd jail, virtual machines to run the code. How does the current movement in computation relate to Amoeba? What financial companies are running their models on borrowed cycles? What about moving data to a group of machines that would run a job together? E.g., four machines would suffice to run a job.