Locality-Aware Request Distribution in Cluster-Base Network Servers

The authors discuss a new heuristic for load balance in clustered
network servers.  They focus in particular on content-based request
distribution wherein the front end uses the content of the request in
addition to information about cluster load to direct the request to
the back end.  Their particular proposal is locality-aware request
based distribution (LARD) in which the front-end distributes incoming
requests so as to achieve high locality in back-end servers' main
memory and disk caches.  They see this as a special case of
content-based request distribution which has as its goals: (1)
increased performance due to improved cache hit rates (2) increased
secondary storage scalability due to partitioning (3) the ability to
use back-end servers specialized for particular tasks (e.g. video).
The main technical difficulty that the authors address is that naive
LARD can actually reduce performance.  If a single machine or a small
number of machines in the back end are forced to service most of the
cluster's incoming requests, the advantages of cache-locality will be
negated by poor load balancing.

In the discussion and analysis of their system, the authors make the
following assumptions: (1) the front end is responsible for handing
off new connections and passing inbound (but not outbound) data (2) the
front-end is responsible for admission control (3) any back-end node
is capable of serving any request, if more slowly.  The underlying
idea behind LARD is roughly this: the distribution of the namespace at
boot time yields an initial partition of the namespace and thus to the
workspace.  Only if a significant load imbalance is detected are
"targets" in the database redistributed across the backend.  Happily,
the authors provide a detailed discussion of what "significant" means
and as well as an intuition for how to tune the relevant parameters.
They also discuss extending their basic LARD approach to one in which
some parts of the database namespace may be replicated.

The second half of their paper is devoted to a very thorough analysis
of their proposed system both using trace-driven simulation and a
prototype implementation.  In the context of simulation, they compare
LARD and LARD with replication, to a pure locality-based strategy and
weighted round-robin.  LARD here is a clear winner.  They also analyze
the sensitivity of the various algorithms to CPU and disk speed,
noting that as CPU speeds are expected to improve much faster than
disks, at least for the foreseeable future, that caching and awareness
of locality will become more important, not less.  Thus, there is
reason to believe that LARD will continue to be a better strategy than
existing ones.