6.827 Algorithm Engineering Spring 2022

Basic Information

Instructor: Julian Shun
TA: Tom Tseng
Schedule: Tuesdays and Thursdays 4-5:30pm ET
Location: 36-156
Office hours: By appointment
Email: jshun AT mit.edu, tomtseng AT mit.edu
Piazza
Units: 3-0-9
Prerequisites: 6.046, 6.172

Course Description

This subject qualifies as a Computer Systems concentration subject.

This is a research-oriented course on algorithm engineering, which will cover both the theory and practice of algorithms and data structures. Students will learn about models of computation, algorithm design and analysis, and performance engineering of algorithm implementations. We will study the design and implementation of sequential, parallel, cache-efficient, external-memory, and write-efficient algorithms for fundamental problems in computing. Many of the principles of algorithm engineering will be illustrated in the context of parallel algorithms and graph problems. Students will read and present research papers, write paper reviews, complete problem sets, participate in classroom discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project meetings. This course is suitable for graduate students or advanced undergraduates who have taken 6.046 and 6.172. Mathematical maturity and familiarity with algorithm analysis and performance engineering will be assumed. Lectures will consist of instructor and student presentations. Lecture attendance is required and participation counts toward the grade.

Skip to Course Policy

Announcements

Schedule (tentative)

This is a tentative schedule that will be updated before the course begins. However, it should give you an idea of the topics that we will cover.

DateTopicSpeakerRequired ReadingOptional Reading
Tuesday 2/1Course IntroductionJulian Shun Algorithm Engineering - An Attempt at a Definition

A Theoretician's Guide to the Experimental Analysis of Algorithms

Slides
Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

A Guide to Experimental Algorithmics

Algorithm engineering: an attempt at a definition using sorting as an example

Algorithm Engineering for Parallel Computation

Distributed Algorithm Engineering

Experimental algorithmics

Programming Pearls

Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Thursday 2/3Parallel AlgorithmsJulian ShunParallel Algorithms

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (Chapters 4-8)

CLRS Chapter 27

Slides
Prefix Sums and Their Applications

Algorithm Design: Parallel and Sequential

Introduction to Parallel Algorithms

Scheduling Multithreaded Computations by Work Stealing

Thread Scheduling for Multiprogrammed Multiprocessors

Problem Based Benchmark Suite
Tuesday 2/8Parallel Graph Traversal Andrew Feldman

Edmund Williams

Julian Shun
Direction-Optimizing Breadth-First Search* Slides

The More the Merrier: Efficient Multi-Source Graph Traversal* Slides

A Faster Algorithm for Betweenness Centrality Slides
A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers)

Internally Deterministic Parallel Algorithms Can Be Fast

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Chapter 3.6 of Networks, Crowds, and Markets (describes Betweenness Centrality with an example)

Better Approximation of Betweenness Centrality

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

Fast approximation of betweenness centrality through sampling

Scalable Betweenness Centrality Maximization via Sampling

Articulation Points Guided Redundancy Elimination for Betweenness Centrality

Betweenness Centrality: Algorithms and Implementations
Thursday 2/10Shared-Memory Graph AlgorithmsJulian Shun A Simple and Practical Linear-Work Parallel Algorithm for Connectivity* Slides

Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable* Slides
Parallel Graph Decompositions Using Random Shifts

Improved Parallel Algorithms for Spanners and Hopsets

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Shared Memory Graph Frameworks

Chapter 14 of Networks, Crowds, and Markets
Tuesday 2/15Sorting AlgorithmsJulian Shun In-place Parallel Super Scalar Samplesort (IPS4o)*

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting* Slides
Super Scalar Sample Sort

Parallel Merge Sort

Parallelism in Randomized Incremental Algorithms

Optimal and sublogarithmic time randomized parallel sorting algorithms

A comparison of sorting algorithms for the connection machine CM-2

Efficient implementation of sorting on multi-core SIMD CPU architecture

Sorting Data on Ultra-Large Scale with RADULS

Even Faster Sorting of (Not Only) Integers

Highly scalable parallel sorting

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Designing efficient sorting algorithms for manycore GPUs

TritonSort: A Balanced Large-Scale Sorting System

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

Practical Massively Parallel Sorting

Engineering a Multi-core Radix Sort

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort

Parallel In-Place Algorithms: Theory and Practice

Sort Benchmark
Thursday 2/17External-Memory AlgorithmsAndrew Feldman

Zhi Wei Gan
The input/output complexity of sorting and related problems* Slides

A Functional Approach to External Graph Algorithms* Slides
Algorithms and Data Structures for External Memory

I/O-Complexity of Graph Algorithms

External-Memory Graph Algorithms

Beyond Synchronous: New Techniques for External-Memory Graph Connectivity and Minimum Spanning Forest

Fundamental parallel algorithms for private-cache chip multiprocessors

Parallel External Memory Graph Algorithms

Efficient Parallel and External Matching

External-Memory Computational Geometry

External Memory Geometric Data Structures

A Bulk-Parallel Priority Queue in External Memory with STXXL
Tuesday 2/22No class
Thursday 2/24Cache-Oblivious Algorithms Ricardo Gayle Jr.

Kliment Serafimov
Cache-Oblivious Algorithms* Slides

Engineering a cache-oblivious sorting algorithm* Slides
Cache-Oblivious Algorithms and Data Structures

Cache Oblivious Distribution Sweeping

Cache-oblivious databases: Limitations and opportunities

An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms

Cache-Oblivious B-Trees

Cache-oblivious streaming B-trees

An experimental comparison of cache-oblivious and cache-conscious programs
Monday 2/28Problem Set due today
Tuesday 3/1Bucketing and the Graph Based Benchmark SuiteLaxman Dhulipala
(Guest Lecture)
Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing*

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable*

The Graph Based Benchmark Suite (GBBS)

Slides
Fast algorithms for determining (generalized) core groups in social networks

K-Core Decomposition of Large Networks on a Single PC

Patterns and Anomalies in k-Cores of Real-World Graphs with Applications

Delta-stepping: a parallelizable shortest path algorithm

Parallel Shortest Paths Using Radius Stepping

GBBS Website
Thursday 3/3Pre-proposal Meeting
Tuesday 3/8Graph Compression and Parallel Cache-Oblivious Algorithms

Project proposal due today
Julian Shun Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+* Slides

Multicore Triangle Computations Without Tuning* Slides

Low Depth Cache-Oblivious Algorithms

Parallel and I/O efficient set covering algorithms

Resource Oblivious Sorting on Multicores

Scheduling Irregular Parallel Computations on Hierarchical Caches

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

Cache-efficient dynamic programming algorithms for multicores

Provably good multicore cache performance for divide-and-conquer algorithms

Concurrent cache-oblivious B-trees

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation

Autogen: Automatic Discovery of Efficient Recursive Divide-&-Conquer Algorithms for Solving Dynamic Programming Problems

Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms

Oblivious algorithms for multicores and networks of processors

Algorithmic Aspects of Triangle-Based Network Analysis

Triangle Listing Algorithms: Back from the Diversion

The Input/Output Complexity of Triangle Enumeration

I/O-efficient join dependency testing, Loomis-Whitney join, and triangle enumeration

2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection
Thursday 3/10Stencil Computations and HypergraphsRicardo Gayle Jr.

Julian Shun
Cache oblivious stencil computations* Slides

Practical Parallel Hypergraph Algorithms* Slides
The cache complexity of multithreaded cache oblivious algorithms

The pochoir stencil compiler

Fast Stencil Computations using Fast Fourier Transforms
Tuesday 3/15Distributed Graph FrameworksYosef Mihretie

Ian Limarta
Pregel: A System for Large-Scale Graph Processing* Slides

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs* Slides
Distributed Memory Graph Frameworks
Tuesday 3/22Spring Break
Thursday 3/24Spring Break
Tuesday 3/29Locality in Graph Processing and the GraphIt DSLJulian Shun Making Caches Work for Graph Analytics*

GraphIt: A High-Performance DSL for Graph Analytics*

Optimizing Ordered Graph Algorithms with Graphlt*

Slides
Compiling Graph Applications for GPUs with GraphIt

Taming the Zoo: A Unified Graph Compiler Framework for Novel Architectures

Locality Optimizations

Shared Memory Graph Frameworks
Thursday 3/31Dynamic and Streaming Graph AlgorithmsQing Feng

Julian Shun
A New Parallel Algorithm for Connected Components in Dynamic Graphs* Slides

Low-Latency Graph Streaming Using Compressed Purely-Functional Trees* Slides
Dynamic Graph Algorithms

Streaming Graph Frameworks
Tuesday 4/5Spatial ClusteringShangdi Yu
(Guest Lecture)
Parallel algorithms for hierarchical clustering*

ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain*
Multi-Threaded Hierarchical Clustering by Parallel Nearest-Neighbor Chaining
Thursday 4/7NVRAM Graph FrameworksYosef Mihretie

Julian Shun
Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory* Slides

Sage: Parallel Semi-Asymmetric Graph Algorithms for NVRAMs* Slides
An Empirical Guide to the Behavior and Use of Scalable Persistent Memory
Tuesday 4/12Subgraph Finding Frameworks

Midterm report due today
Edmund Williams

Kliment Serafimov
Tesseract: distributed, general graph pattern mining on evolving graphs* Slides

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra* Slides
Subgraph Finding
Thursday 4/14Parallel Subgraph ComputationsJessica Shi
(Guest Lecture)
Parallel Clique Counting and Peeling Algorithms*

Theoretically and Practically Efficient Parallel Nucleus Decomposition*
Arboricity and Subgraph Listing Algorithms

Butterfly Counting in Bipartite Networks

Peeling Bipartite Networks for Dense Subgraph Discovery

Listing k-cliques in Sparse Real-World Graphs

KClist++: a simple algorithm for finding k-clique densest subgraphs in large graphs

Ordering Heuristics for k-clique Listing

The K-clique Densest Subgraph Problem

Densest subgraph in streaming and MapReduce
Tuesday 4/19Graph Compression and ReorderingQing Feng

Zhi Wei Gan
Log(graph): a near-optimal high-performance graph representation* Slides

Locality Analysis of Graph Reordering Algorithms* Slides
An Experimental Analysis of a Compact Graph Representation

Compact Representations of Separable Graphs

Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable

Graph Compression

Graph Partitioning and Reordering
Thursday 4/21Concurrent Data StructuresSiddhartha Jayanti
(Guest Lecture)
A Randomized Concurrent Algorithm for Disjoint Set Union*

Fast Arrays: Atomic Arrays with Constant Time Initialization*
Randomized Concurrent Set Union and Generalized Wake-Up

ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

Work-Efficient Parallel Union-Find
Tuesday 4/26Systems for Graph Neural NetworksKliment Serafimov

Yosef Mihretie
Marius: Learning Massive Graph Embeddings on a Single Machine* Slides

ThunderRW: An In-Memory Graph Random Walk Engine* Slides
LightNE: A Lightweight Graph Processing System for Network Embedding

Random Walks on Huge Graphs at Cache Efficiency

Graph Representation Learning Book

CS224W: Machine Learning with Graphs (Stanford course)
Thursday 4/28Spatial ClusteringYiqiu Wang
(Guest Lecture)
Theoretically-Efficient and Practical Parallel DBSCAN*

Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering*
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN

OPTICS: ordering points to identify the clustering structure

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

Fast Euclidean OPTICS with Bounded Precision in Low Dimensional Space

Faster DBScan and HDBScan in Low-Dimensional Euclidean Spaces

Geometric Algorithms for Density-Based Data Clustering

A Faster Algorithm for DBSCAN On the Hardness and Approximation of Euclidean DBSCAN
Tuesday 5/3Graph ClusteringTom Tseng Parallel Index-Based Structural Graph Clustering and Its Approximation* Dynamic Structural Clustering on Graphs

Clustering and Community Detection
Thursday 5/5Suffix Arrays and TreesEdmund Williams

Julian Shun
Simple Linear Work Suffix Array Construction* Slides

A Simple Parallel Cartesian Tree Algorithm and its Application to Parallel Suffix Tree Construction* Slides
Better external memory suffix array construction

Engineering a Lightweight External Memory Suffix Array Construction Algorithm

Fast parallel skew and prefix-doubling suffix array construction on the GPU

Parallel suffix array and least common prefix for the GPU

Fast Parallel Computation of Longest Common Prefixes

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

Parallel lightweight wavelet tree, suffix array and FM-index construction

Practical Parallel Lempel-Ziv Factorization

Suffix arrays: a new method for on-line string searches

A taxonomy of suffix array construction algorithms
Tuesday 5/10Project Presentations

Final Report due today

Course Policy

This is a graduate-level course where we will cover research in algorithm engineering. Advanced undergraduates may enroll if they have taken 6.046 and 6.172. The course units are 3-0-9.

Assignments will be posted and submitted on Canvas.

Lectures

For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.

Grading

The assignments consist of answering questions about papers, writing paper reviews, completing a problem set, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Reviews15%
Problem Set10%
Paper Presentations20%
Research Project45%
Class Participation10%

You must complete all assignments to pass the class.

Paper Reviews and Problem Set

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. In addition, students are required to submit one paper review before every lecture, due at 12:00pm on the day of the lecture. The review should be on a paper chosen from any of the starred papers (*) under "Required Reading" for the lecture.

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of. The length should be 3-4 paragraphs.

Finally, there will be a problem set on parallel algorithms to be released in the second week of the semester and due at 11:59pm on Monday 2/28.

Paper Presentations

Students will give several presentations on assigned papers throughout the semester. These presentations should be 25-30 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical results and proofs, experimental results, and existing work. Theoretical proofs may be presented on the board. The presentation should also include any relevant content from related work that would be needed to fully understand the paper being presented. The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient algorithms; analyzing and optimizing the performance of existing algorithm implementations; designing and implementing new algorithms that are theoretically and/or practically efficient; applying algorithms in the context of larger applications; and improving or designing new programming frameworks/systems for classes of algorithms. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. There should be an implementation component to the project. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 1-3 people and consist of a proposal, mid-term report, final presentation, and final report. The timeline for the project is as follows.

AssignmentDue Date
Pre-proposal Meeting3/3
Proposal3/8
Weekly Progress Reports3/18, 4/1, 4/8, 4/15, 4/22, 4/29, 5/6
Mid-term Report4/12
Project Presentations5/10
Final Report5/10

Piazza

We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper reviews, problem set, and the project reports.

Computing Resources

We will be using Amazon Web Services.

Useful Resources

Introduction to Algorithms, 3rd Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

Algorithm Design: Parallel and Sequential by Umut Acar and Guy Blelloch

Computational Geometry - Algorithms and Applications, Third Edition by Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

A list of papers related to graph analytics