\reviewtitle{High Availability, Scalable Storage, Dynamic Peer
Networks: Pick Two}
\reviewlabel{blake03picktwo}
\reviewauthor{Charles Blake and Rodrigo Rodrigues}

This is a mini-summary of a discussion with David Holland, Lex Stein,
Matt Amis, and Jeff Bezanson.

The authors argue that creating a p2p file system with reliability
guarantees is a bogus idea because there just isn't enough bandwidth
to support the redundancy required.  They set up a simple model that
just accounts for the amount of data that a node would need to
download when it first joined the system.  Then they multiply that by
two for a reason I can't remember off the top of my head right now (in
fact, there is a fair bit of number fuzziness).  They then show that
in order to support an interesting amount of data, these downloads and
uploads -- not including any searches or user-sponsored data transfer
-- would exceed the bandwidth available to the average node.  There is
a lot of averaging in creating their numbers, but they do seem to have
a good point.  They continue with their model showing how things
change if you only use the most reliable 5% of nodes, e.g.  

They get the data showing system behavior from Gnutella.  It is
unclear if it is directly applicable, because users of a cooperative
backup system might have more reason to stay connected than the
average Gnutella user.  If, however, the Gnutella clients connect to
Gnutella every time the user dials up -- whether or not the user
specifies it to -- then it may reflect user behavior.  It might be
interesting to get some of this data from an ISP.

Strangely, they stop at 30 hours for the greatest leave time tau, when
it seems that many people might be members of Gnutella for far longer
than that.  Another strange measurement was that they said they
measured the exact Gnutella membership, whereas a paper about
measuring Gnutella was unable to determine its exact membership --
seems sketchy.

We discussed how computation could be spread out to other machines in
a p2p network:
- security/verification that the computation has been done correctly
is a big problem
- sandboxing, freebsd jail, virtual machines to run the code.

How does the current movement in computation relate to Amoeba?

What financial companies are running their models on borrowed cycles?

What about moving data to a group of machines that would run a job
together?  E.g., four machines would suffice to run a job.