Algorithms and Complexity Seminar
Monday, April 14, 2008, 4:00-5:15pm in 32-G575.
In-Network Aggregation of Holistic Functions
Fabian Kuhn (ETH Zurich)
In today's computing environments, data is no longer stored at one
central server. Often, data is distributed across many heterogeneous
locations that are connected through a network. Nevertheless, there
need to be efficient methods to query large data sets. Basic
aggregation queries such as computing the maximum, the sum, or the
average of a set of values can be answered quickly by a simple
convergecast on a spanning tree of the network where the data is
aggregated from the leaves towards the root of the tree. The time
needed for this computation is proportional to the depth of the
spanning tree and thus, if the spanning tree is chosen appropriately,
proportional to the diameter D of the network.
Aggregation functions that can be decomposed into functions over
subsets of the set of all values and that can then be computed using a
single convergecast are often called distributive functions. Functions
that cannot be decomposed in such a way are known as non-distributive
or holistic functions. Typical important examples of holistic
functions are the median of a set of values or the mode (the most
frequent element) of a set of elements. In the talk, I will show a
simple randomized distributed algorithm that computes the median (and
more generally the element with a given rank) in time
O(D*log_D(n)). We will see that this is optimal, i.e., computing the
median requires time Omega(D*log_D(n)). Further, I will present the
basic ideas of a distributed algorithm that computes the mode in time
O(D+F_2/m^2*log n). Here, m is the frequency of the mode and F_2 is a
parameter (called the second frequency moment) of the frequency
distribution. Generalizing a known communication complexity result, it
is possible to prove a lower bound that is weaker but depends on the
frequency distribution in a similar way.
This is joint work with Thomas Locher and Roger Wattenhofer (median)
and with Thomas Locher and Stefan Schmid (mode).
Host: Nancy Lynch
(The Algorithms
and Complexity Seminar series talks are usually held Thursdays 4:00-5:15pm in 32-G575.)