Object load balancing. We have oriented our examination of
load balancing around routing, where a node request must reach the
destination ID for it to be successful. If the network is instead
being used for storage, other load balancing techniques can be
applied. Byers et al. describe a technique that hashes data to be
stored using two distinct hash functions, providing two potential
locations [4]. The less loaded of the two
possibilities is chosen. During data lookup, the query must contact
both possible storage locations, or appropriate forwarding pointers
must be used. Under uniform workload and capacity assumptions and
with no churn, they have recently generalized this result to show
that the maximum load at any server is
where
is the number of choices [5]. Their
method is an example of ``the power of two
choices'' [31]. Our ID selection process
is similar in spirit, in that we also use multiple hash functions,
although here we do so to provide VSs with a menu of identifiers.
Objects may be cached in the network to reduce hot spots or overload. Roussopoulos and Baker develop a cooperative request scheme where nodes direct requests toward the highest capacity replica [37]. They assume that the source of each lookup is aware of the capacity of each possible replica holder. Sources of requests learn the replicas by first contacting the root of the query, a key's primary storage node, so it must still perform some work for their method to function.
These storage-oriented load balancing techniques are orthogonal and complementary to the methods examined in this paper, including k-Choices. For example, k-Choices reduces an overburdened node's namespace, preventing it from being contacted in the first place, and Roussopoulos' technique prevents it from being contacted frequently after the replica set is known.
Namespace balancing. While the simple VSs per node
achieves
namespace balance per node,
more recent algorithms have achieved tighter bounds with fewer
virtual servers. These algorithms are based on the assumptions that
the capacity of nodes and workloads are uniform; they do not include
any workload scaling parameter. Because of these factors, they
would approximate the behavior and results of the
VS
algorithm. If they did achieve perfect namespace balancing at zero
cost, they could be expected to perform as Balanced does in Table
I.
Four algorithms fall into this category. First, Karger and Ruhl
propose that each node has potential IDs, only one of which
is activated at once [26]. Nodes activate and
deactivate their VSs to balance the distance between themselves and
their successor. Because this algorithm allocates nodes a limited
number of IDs, it has stronger security properties than the
remainder of this group. Second, Manku's algorithm reduces the ratio
of the largest to the smallest partition to at most 4 w.h.p. and has
low arrival and departure cost [30]. Third and
fourth, Adler [1] and Naor [32]
also have low cost algorithms to achieve namespace balancing based
on unlimited virtual server movement. Both algorithms depend on the
history of node IDs that each node has used and their analyses are
given only for the insertions, not deletions, cases.
Range queries. While we have examined uniform and Zipf query distributions in our simulations, we have not examined load balancing algorithms targeted at p2p systems when performing range queries are common. However, if one considers using a p2p system more like a typical database where each node is analagous to a disk, it is clear that ordering data by key might be warranted. We are aware of two load balancing algorithms that are targeted for this new domain [20,26]. We evaluated Ganesan's Threshold in this paper. Both require unlimited ID selection and, therefore, suffer from Sybil attack liabilities, making them unsuitable for non-cooperative environments. However, it is unlikely that a load balancing technique for range queries exists that supports scalable secure IDs.
Jonathan Ledlie 2006-01-06