Algorithms and Complexity Seminar

Monday, April 14, 2008, 4:00-5:15pm in 32-G575.

In-Network Aggregation of Holistic Functions

Fabian Kuhn (ETH Zurich)

In today's computing environments, data is no longer stored at one central server. Often, data is distributed across many heterogeneous locations that are connected through a network. Nevertheless, there need to be efficient methods to query large data sets. Basic aggregation queries such as computing the maximum, the sum, or the average of a set of values can be answered quickly by a simple convergecast on a spanning tree of the network where the data is aggregated from the leaves towards the root of the tree. The time needed for this computation is proportional to the depth of the spanning tree and thus, if the spanning tree is chosen appropriately, proportional to the diameter D of the network.

Aggregation functions that can be decomposed into functions over subsets of the set of all values and that can then be computed using a single convergecast are often called distributive functions. Functions that cannot be decomposed in such a way are known as non-distributive or holistic functions. Typical important examples of holistic functions are the median of a set of values or the mode (the most frequent element) of a set of elements. In the talk, I will show a simple randomized distributed algorithm that computes the median (and more generally the element with a given rank) in time O(D*log_D(n)). We will see that this is optimal, i.e., computing the median requires time Omega(D*log_D(n)). Further, I will present the basic ideas of a distributed algorithm that computes the mode in time O(D+F_2/m^2*log n). Here, m is the frequency of the mode and F_2 is a parameter (called the second frequency moment) of the frequency distribution. Generalizing a known communication complexity result, it is possible to prove a lower bound that is weaker but depends on the frequency distribution in a similar way.

This is joint work with Thomas Locher and Roger Wattenhofer (median) and with Thomas Locher and Stefan Schmid (mode).

Host: Nancy Lynch

(The Algorithms and Complexity Seminar series talks are usually held Thursdays 4:00-5:15pm in 32-G575.)