Massively Parallel Algorithms, ETH Zurich, Spring 2019

Units: 2V+1U+2A = 6CP

Lecture Time & Place: Fridays 10:00-12:00 at CAB G51

Exercise Session Time & Place: Fridays 17:00-18:00 at CAB G59

Grading: A semester-long project, delivered in three parts: an initial proposal (20%), a presentation (20%), and final report (60%).

Other Links: Information on the Course Catalogue of ETH Zurich

Course Content: Data sets are growing faster than the capacity of single processors. This makes it almost a certainty that the future of computation relies on parallelism. In this new graduate-level course, we study the theoretical foundations of modern parallel computation, with an emphasis on the algorithmic tools and techniques. In particular, we will discuss the growing body of algorithmic results in the Massively Parallel Computation (MPC) model. This model is a mathematical abstraction of some of the popular large-scale data processing settings such as MapReduce, Hadoop, Spark, etc. The course will cover a sampling of the recent developments (and open questions) at the frontier of research in massively/modern parallel algorithms. The related papers will be provided throughout the semester. By the end of the semester, the students will know all the standard tools of this area, as well as the state of the art on a number of the central problems. Our hope is that the course prepares the students for independent research in this area, and we will take concrete steps in that direction with the course projects.

Tentative Topics: Massively parallel algorithms for various problems including sorting, maximum matching approximations, maximal independent set, connected components, minimum spanning tree, minimum cut, graph coloring, geometric problems, some dynamic programming problems, various clusterings, submodular maximization, triangle counting, densest subgraph, and coreness decomposition, as well as some impossibility results and conditional lower bounds and some discussion of composobale coresets.

Prerequisite: The class does not assume any prior knowledge in parallel algorithms/computing. Our only prerequisite is the undergraduate class Algorithms, Probability, and Computing (APC) or any other course that can be seen as the equivalent. In particular, much of what we will discuss uses randomized algorithms and therefore, we will assume that the students are familiar and comfortable with the basic tools and techniques in randomized algorithms and analysis (to the extent covered in the APC class). Please check Problem Set 0 to see if you are comfortable with the required background material. If you're not sure whether you're ready for this class, please consult with the instructor.

Projects:

Step 1 --- Proposals: We have had one-on-one meetings with each of the teams regarding project proposals. If you need another meeting, please send me an email. Your final project proposals should be submitted (via email to ghaffari@inf.ethz.ch) by the midnight of Monday, April 8. Please see the email from March 25 regarding guidlines for the proposal.

Step 2 --- Presentations: We will have the presentations on July 24 in CAB G51. Details and the schedule will be announced later.

Step 3 --- Final Reports: The final reports should be submitted (via email to ghaffari@inf.ethz.ch) by August 15.

Here is a compilation of the scribe notes, courtesy of Davin Choo. These notes will be updated regularly, throughout the semester. Last Update: June 19, 2019.

Lectures:

Problem Set 0

(02/22) Lecture 01: Introduction + Matching Approximation in the Strongly Superlinear Memory Regime
- A model of computation for MapReduce by Karloff et al. (SODA'10)
- Filtering: a method for solving graph problems in MapReduce by Lattanzi et al. (SPAA'11).

(03/01) Lecture 02: Matching Approximation in the Near-Linear Memory Regime
- Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover by Ghaffari et al. (PODC'18).
- Round compression for parallel matching algorithms by Czumaj et al. (STOC'18)
- Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs by Assadi et al. (SODA'19)

(03/08) Lecture 03: Matching Approximation in the Strongly Sublinear Memory Regime
- Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation ... by Ghaffari & Uitto (SODA'19).
- Round Compression for Parallel Graph Algorithms in Strongly Sublinear Space by Onak (Manuscript 2018)

(03/15) Lecture 04: Approximation Improvement for Matching via Augmenting Paths
- An n^{5/2} Algorithm for Maximum Matchings in Bipartite Graphs by Hopcroft and Karp (SICOMP 1973).
- Finding Graph Matchings in Data Streams by McGregor (Approx/Random'05)
- Weighted Matchings via Unweighted Augmentations by Gamlath et al. (PODC'19)

(03/22) Lecture 05: Connected Components and Minimum Spanning Tree
- Analyzing Graph Structure via Linear Measurements by Ahn, Guha, and McGregor (SODA'12).
- Graph sketches: sparsification, spanners, and subgraphs by Ahn, Guha, and McGregor (PODS'12).
- MST in O(1) Rounds of the Congested Clique by Jurdinski and Nowicki (SODA'18).
- Filtering: a method for solving graph problems in MapReduce by Lattanzi et al. (SPAA'11).

(03/29) Lecture 06: Log Diameter Connectivity and Euclidean MST
- Parallel Graph Connectivity in Log Diameter Rounds by Andoni et al. (FOCS'18).
- Parallel Algorithms for Geometric Graph Problems by Andoni et al. (STOC'14).

(04/12) Lecture 07: Lower Bounds and Conditional Hardness Results
- Shuffles and Circuits (On Lower Bounds for Modern Parallel Computation) by Roughgarden, Vassilvitskii, and Wang (SPAA'16; JACM'18).
- Conditional Hardness Results for Massively Parallel Computation from Distributed Lower Bounds by Ghaffari, Kuhn, and Uitto (FOCS'19).

(05/03) Lecture 08: Parallelism in Dynamic Programming
- Efficient Massively Parallel Methods for Dynamic Programming by Im, Moseley, and Sun (STOC'17).

(05/10) Lecture 09: Parallelism in Submodular Maximization
- Distributed Submodular Maximization: Identifying Representative Elements in Massive Data by Mirzasoleiman et al. (NIPS'13).
- The Power of Randomization: Distributed Submodular Maximization on Massive Datasets by Barbosa et al. (ICML'15).
- Fast Greedy Algorithms in MapReduce and Streaming by Kumar et al. (SPAA'13).

(05/17) Lecture 10: Parallelism in Submodular Maximization, Cont.
- A New Framework for Distributed Submodular Maximization by Barbosa et al. (FOCS'16).
- Submodular Optimization in the MapReduce Model by Liu and Vondrak (SOSA'19).
- Randomized Composable Core-sets for Distributed Submodular Maximization by Mirrokni and Zadimoghaddam (STOC'15).

(05/24) Lecture 11: Parallelism in Data Clustering
- Scalable k-Means ++ by Bahmani et al. (VLDB'12).
- k-means++: The Advantages of Careful Seeding by Arthur and Vassilvitskii (SODA'07).
- The Effectiveness of Lloyd-Type Methods for the k-Means Problem by Ostrovsky et al. (FOCS'06).

(05/31) Lecture 12: Exact Minimum Cut
- Faster Algorithms for Edge Connectivity via Random 2-Out Contractions by Ghaffari, Nowicki, and Thorup (SODA'20).

(06/11) Lecture 13: Delta+1 Coloring with Near-Linear Memory
- Sublinear Algorithms for (Delta+1) Vertex Coloring by Assadi, Chen, and Khanna (SODA'19).
- Distributed (Delta+1)-Coloring in Sublogarithmic Rounds by Harris, Schneider, and Su (STOC'16).
- Chapter 15 (titled "The Structural Decomposition") of the book by Molloy and Reed on Graph coloring and the Probabilistic Method.
- The Complexity of (Delta + 1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation by Chang et al. (PODC'19).