Related Work

Object load balancing. We have oriented our examination of load balancing around routing, where a node request must reach the destination ID for it to be successful. If the network is instead being used for storage, other load balancing techniques can be applied. Byers et al. describe a technique that hashes data to be stored using two distinct hash functions, providing two potential locations [4]. The less loaded of the two possibilities is chosen. During data lookup, the query must contact both possible storage locations, or appropriate forwarding pointers must be used. Under uniform workload and capacity assumptions and with no churn, they have recently generalized this result to show that the maximum load at any server is $log log(N) / log(d) + O(1)$ where $d$ is the number of choices [5]. Their method is an example of ``the power of two choices'' [31]. Our ID selection process is similar in spirit, in that we also use multiple hash functions, although here we do so to provide VSs with a menu of identifiers.

Objects may be cached in the network to reduce hot spots or overload. Roussopoulos and Baker develop a cooperative request scheme where nodes direct requests toward the highest capacity replica [37]. They assume that the source of each lookup is aware of the capacity of each possible replica holder. Sources of requests learn the replicas by first contacting the root of the query, a key's primary storage node, so it must still perform some work for their method to function.

These storage-oriented load balancing techniques are orthogonal and complementary to the methods examined in this paper, including k-Choices. For example, k-Choices reduces an overburdened node's namespace, preventing it from being contacted in the first place, and Roussopoulos' technique prevents it from being contacted frequently after the replica set is known.

Namespace balancing. While the simple $log(N)$ VSs per node achieves $O(1/N)$ namespace balance per node, more recent algorithms have achieved tighter bounds with fewer virtual servers. These algorithms are based on the assumptions that the capacity of nodes and workloads are uniform; they do not include any workload scaling parameter. Because of these factors, they would approximate the behavior and results of the $log(N)$ VS algorithm. If they did achieve perfect namespace balancing at zero cost, they could be expected to perform as Balanced does in Table I.

Four algorithms fall into this category. First, Karger and Ruhl propose that each node has $log(N)$ potential IDs, only one of which is activated at once [26]. Nodes activate and deactivate their VSs to balance the distance between themselves and their successor. Because this algorithm allocates nodes a limited number of IDs, it has stronger security properties than the remainder of this group. Second, Manku's algorithm reduces the ratio of the largest to the smallest partition to at most 4 w.h.p. and has low arrival and departure cost [30]. Third and fourth, Adler [1] and Naor [32] also have low cost algorithms to achieve namespace balancing based on unlimited virtual server movement. Both algorithms depend on the history of node IDs that each node has used and their analyses are given only for the insertions, not deletions, cases.

Range queries. While we have examined uniform and Zipf query distributions in our simulations, we have not examined load balancing algorithms targeted at p2p systems when performing range queries are common. However, if one considers using a p2p system more like a typical database where each node is analagous to a disk, it is clear that ordering data by key might be warranted. We are aware of two load balancing algorithms that are targeted for this new domain [20,26]. We evaluated Ganesan's Threshold in this paper. Both require unlimited ID selection and, therefore, suffer from Sybil attack liabilities, making them unsuitable for non-cooperative environments. However, it is unlikely that a load balancing technique for range queries exists that supports scalable secure IDs.

Jonathan Ledlie 2006-01-06