6.886 Algorithm Engineering Spring 2021

Basic Information

Instructor: Julian Shun
Guest Facilitator: Laxman Dhulipala
Schedule: Tuesdays and Thursdays 2:30-4pm ET on Zoom
Office hours: By appointment
Email: jshun AT mit.edu, laxman AT mit.edu
Units: 3-0-9
Prerequisites: 6.046, 6.172

Course Description

This subject qualifies as a Computer Systems concentration subject.

This is a research-oriented course on algorithm engineering, which will cover both the theory and practice of algorithms and data structures. Students will learn about models of computation, algorithm design and analysis, and performance engineering of algorithm implementations. We will study the design and implementation of sequential, parallel, cache-efficient, external-memory, and write-efficient algorithms for fundamental problems in computing. Many of the principles of algorithm engineering will be illustrated in the context of parallel algorithms and graph problems. Students will read and present research papers, write paper reviews, complete assignments that involve both theory and implementation, participate in classroom discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project meetings. This course is suitable for graduate students or advanced undergraduates who have taken 6.046 and 6.172. Mathematical maturity and familiarity with algorithm analysis and performance engineering will be assumed.

Lectures will consist of instructor and student presentations and will happen live on Zoom. Lecture attendance is required and participation counts toward the grade. For students in distant timezones where lecture attendance is difficult, their presentations can be prerecorded, and their participation score can be made up by completing additional assignments.

Skip to Course Policy


Schedule (tentative)

DateTopicSpeakerRequired ReadingOptional Reading
Tuesday 2/16Course IntroductionJulian Shun Algorithm Engineering - An Attempt at a Definition

A Theoretician's Guide to the Experimental Analysis of Algorithms

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

A Guide to Experimental Algorithmics

Algorithm engineering: an attempt at a definition using sorting as an example

Algorithm Engineering for Parallel Computation

Distributed Algorithm Engineering

Experimental algorithmics

Programming Pearls

Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Thursday 2/18Parallel AlgorithmsJulian ShunParallel Algorithms

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (Chapters 4-8)

CLRS Chapter 27

Prefix Sums and Their Applications

Algorithm Design: Parallel and Sequential

Introduction to Parallel Algorithms

Scheduling Multithreaded Computations by Work Stealing

Thread Scheduling for Multiprogrammed Multiprocessors

Problem Based Benchmark Suite
Tuesday 2/23Parallel Graph TraversalSiddhartha Jayanti

Victor Ying
Direction-Optimizing Breadth-First Search* Slides

A Faster Algorithm for Betweenness Centrality Slides

The More the Merrier: Efficient Multi-Source Graph Traversal* Slides

A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers)

Internally Deterministic Parallel Algorithms Can Be Fast

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Chapter 3.6 of Networks, Crowds, and Markets (describes Betweenness Centrality with an example)

Better Approximation of Betweenness Centrality

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

Fast approximation of betweenness centrality through sampling

Scalable Betweenness Centrality Maximization via Sampling

Articulation Points Guided Redundancy Elimination for Betweenness Centrality

Betweenness Centrality: Algorithms and Implementations
Thursday 2/25Shared-Memory Graph AlgorithmsVictor Ying

Julian Shun
Parallel Graph Decompositions Using Random Shifts* Slides

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity* Slides

Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable Slides
Improved Parallel Algorithms for Spanners and Hopsets

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Shared Memory Graph Frameworks

Chapter 14 of Networks, Crowds, and Markets
Tuesday 3/2Sorting and In-Place AlgorithmsJulian Shun

Terryn Brunelle
Parallel In-Place Algorithms: Theory and Practice* Slides

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting* Slides

The Case for a Learned Sorting Algorithm* Slides
In-place Parallel Super Scalar Samplesort (IPS4o)

Super Scalar Sample Sort

Parallel Merge Sort

Parallelism in Randomized Incremental Algorithms

Optimal and sublogarithmic time randomized parallel sorting algorithms

A comparison of sorting algorithms for the connection machine CM-2

Efficient implementation of sorting on multi-core SIMD CPU architecture

Sorting Data on Ultra-Large Scale with RADULS

Even Faster Sorting of (Not Only) Integers

Highly scalable parallel sorting

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Designing efficient sorting algorithms for manycore GPUs

TritonSort: A Balanced Large-Scale Sorting System

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

Practical Massively Parallel Sorting

Engineering a Multi-core Radix Sort

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort

The Case for Learned Index Structures

Sort Benchmark
Thursday 3/4External-Memory AlgorithmsJulian Shun

Tao Sun
The input/output complexity of sorting and related problems*

A Functional Approach to External Graph Algorithms* Slides
Algorithms and Data Structures for External Memory

I/O-Complexity of Graph Algorithms

External-Memory Graph Algorithms

Beyond Synchronous: New Techniques for External-Memory Graph Connectivity and Minimum Spanning Forest

Fundamental parallel algorithms for private-cache chip multiprocessors

Parallel External Memory Graph Algorithms

Efficient Parallel and External Matching

External-Memory Computational Geometry

External Memory Geometric Data Structures

A Bulk-Parallel Priority Queue in External Memory with STXXL
Tuesday 3/9No class
Thursday 3/11Cache-Oblivious Algorithms Viktor Urvantsev

Sualeh Asif
Cache-Oblivious Algorithms* Slides

Engineering a cache-oblivious sorting algorithm* Slides
Cache-Oblivious Algorithms and Data Structures

Cache Oblivious Distribution Sweeping

Cache-oblivious databases: Limitations and opportunities

An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms

Cache-Oblivious B-Trees

Cache-oblivious streaming B-trees

An experimental comparison of cache-oblivious and cache-conscious programs
Tuesday 3/16Graph Based Benchmark SuiteLaxman Dhulipala Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing*

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable*

Fast algorithms for determining (generalized) core groups in social networks

K-Core Decomposition of Large Networks on a Single PC

Patterns and Anomalies in k-Cores of Real-World Graphs with Applications

Delta-stepping: a parallelizable shortest path algorithm

Parallel Shortest Paths Using Radius Stepping

The Graph Based Benchmark Suite (GBBS)

GBBS Website
Thursday 3/18Pre-proposal Meeting
Tuesday 3/23No class
Thursday 3/25Parallel Cache-Oblivious Algorithms

Project proposal due today
Jingnan Shi

Julian Shun
Low Depth Cache-Oblivious Algorithms* Slides

Multicore Triangle Computations Without Tuning* Slides
Parallel and I/O efficient set covering algorithms

Resource Oblivious Sorting on Multicores

Scheduling Irregular Parallel Computations on Hierarchical Caches

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

Cache-efficient dynamic programming algorithms for multicores

Provably good multicore cache performance for divide-and-conquer algorithms

Concurrent cache-oblivious B-trees

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation

AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs

Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms

Oblivious algorithms for multicores and networks of processors

Algorithmic Aspects of Triangle-Based Network Analysis

Triangle Listing Algorithms: Back from the Diversion

The Input/Output Complexity of Triangle Enumeration

I/O-efficient join dependency testing, Loomis-Whitney join, and triangle enumeration

2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection
Tuesday 3/30Distributed Graph FrameworksSualeh Asif

Nellie Wu
Pregel: A System for Large-Scale Graph Processing* Slides

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs* Slides
Distributed Memory Graph Frameworks
Thursday 4/1Disk-based Graph FrameworksJulian Shun

Terryn Brunelle

Tao Sun
Scalability! But at what COST? Slides

GraphChi: Large-Scale Graph Computation on Just a PC* Slides

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks* Slides
External Memory Graph Frameworks
Friday 4/2Problem Set due today
Tuesday 4/6Locality in Graph Processing and the GraphIt DSLNellie Wu

Julian Shun
Making Caches Work for Graph Analytics* Slides

GraphIt: A High-Performance DSL for Graph Analytics* Slides

Optimizing Ordered Graph Algorithms with Graphlt*
Compiling Graph Applications for GPUs with GraphIt

Locality Optimizations

Shared Memory Graph Frameworks
Thursday 4/8NVRAM and Streaming Graph FrameworksLaxman Dhulipala Sage: Parallel Semi-Asymmetric Graph Algorithms for NVRAMs*

Low-Latency Graph Streaming Using Compressed Purely-Functional Trees*

Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

An Empirical Guide to the Behavior and Use of Scalable Persistent Memory

Streaming Graph Frameworks
Tuesday 4/13Dynamic and Distributed Subgraph CountingQuanquan Liu Parallel Batch-Dynamic k-Clique Counting*

Parallel Algorithms for Small Subgraph Counting*

Dynamic Graph Algorithms
Thursday 4/15Subgraph Finding FrameworksJingnan Shi

Xuhao Chen
EmptyHeaded: A Relational Engine for Graph Processing* Slides

Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU* Slides
Subgraph Finding
Tuesday 4/20No class
Thursday 4/22Parallel Subgraph Finding and PeelingJessica Shi Parallel Algorithms for Butterfly Computations*

Parallel Clique Counting and Peeling Algorithms*

Arboricity and Subgraph Listing Algorithms

Butterfly Counting in Bipartite Networks

Peeling Bipartite Networks for Dense Subgraph Discovery

Listing k-cliques in Sparse Real-World Graphs

KClist++: a simple algorithm for finding k-clique densest subgraphs in large graphs

Ordering Heuristics for k-clique Listing

The K-clique Densest Subgraph Problem

Densest subgraph in streaming and MapReduce
Tuesday 4/27Graph Coloring and GPU Graph Processing

Mid-term Report due today
Viktor Urvantsev

Julian Shun
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality* Slides

Compiling Graph Applications for GPUs with GraphIt* Slides
Ordering heuristics for parallel graph coloring

A Parallel Graph Coloring Heuristic

Scalable parallel graph coloring algorithms

A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers

Parallel Graph Coloring for Manycore Architectures

Algorithms for Balanced Graph Colorings with Applications in Parallel Computing

Effective and Efficient Dynamic Graph Coloring

Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling

Effective and Efficient Dynamic Graph Coloring

GPU Graph Frameworks
Thursday 4/29Aggregation and JoinTao Sun

Terryn Brunelle
Cache-Efficient Aggregation: Hashing Is Sorting* Slides

Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited* Slides
A Top-Down Parallel Semisort

Main-Memory Hash Joins on Modern Processor Architectures

Sort vs. Hash Join Revisited for Near-Memory Execution

Many-query join: efficient shared execution of relational joins on modern hardware

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory

Optimal and sublogarithmic time randomized parallel sorting algorithms
Tuesday 5/4Union Find Siddhartha Jayanti

Laxman Dhulipala
A Randomized Concurrent Algorithm for Disjoint Set Union* Slides

ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms* Slides
Randomized Concurrent Set Union and Generalized Wake-Up

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

Work-Efficient Parallel Union-Find
Thursday 5/6Graph Compression and ReorderingNellie Wu

Victor Ying
An Experimental Analysis of a Compact Graph Representation* Slides

Combining Data Duplication and Graph Reordering to Accelerate Parallel Graph Processing* Slides
Compact Representations of Separable Graphs

Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable

Graph Compression

Graph Partitioning and Reordering
Tuesday 5/11Graph ClusteringLaxman Dhulipala Affinity clustering: hierarchical clustering at scale* Slides Clustering and Community Detection
Thursday 5/13Suffix Arrays and TreesSualeh Asif

Julian Shun
Simple Linear Work Suffix Array Construction* Slides

A Simple Parallel Cartesian Tree Algorithm and its Application to Parallel Suffix Tree Construction* Slides
Better external memory suffix array construction

Engineering a Lightweight External Memory Suffix Array Construction Algorithm

Fast parallel skew and prefix-doubling suffix array construction on the GPU

Parallel suffix array and least common prefix for the GPU

Fast Parallel Computation of Longest Common Prefixes

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

Parallel lightweight wavelet tree, suffix array and FM-index construction

Practical Parallel Lempel-Ziv Factorization

Suffix arrays: a new method for on-line string searches

A taxonomy of suffix array construction algorithms
Tuesday 5/18Local Graph ClusteringLaxman Dhulipala Local Graph Partitioning using PageRank Vectors*

Local Higher-Order Graph Clustering*
A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning

Parallel Local Graph Clustering
Thursday 5/20Project Presentations

Final Report due today

Course Policy

This is a graduate-level course where we will cover research in algorithm engineering. Advanced undergraduates may enroll if they have taken 6.046 and 6.172. The course units are 3-0-9.

Assignments will be posted and submitted on Canvas.


For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.


The assignments consist of answering questions about papers, writing paper reviews, completing a problem set, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Reviews15%
Problem Set10%
Paper Presentations20%
Research Project45%
Class Participation10%

You must complete all assignments to pass the class.

Paper Reviews and Problem Set

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. In addition, students are required to submit one paper review before every lecture, due at 12:00pm on the day of the lecture. The review should be on a paper chosen from any of the starred papers (*) under "Required Reading" for the lecture.

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of. The length should be 3-4 paragraphs.

Finally, there will be a problem set on parallel algorithms to be released several weeks into the semester and due at 11:59pm on 4/2.

Paper Presentations

Students will give several presentations on assigned papers throughout the semester. These presentations should be 25-30 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical results and proofs, experimental results, and existing work. Theoretical proofs may be presented on the board. The presentation should also include any relevant content from related work that would be needed to fully understand the paper being presented. The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient algorithms; analyzing and optimizing the performance of existing algorithm implementations; designing and implementing new algorithms that are theoretically and/or practically efficient; applying algorithms in the context of larger applications; and improving or designing new programming frameworks/systems for classes of algorithms. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. There should be an implementation component to the project. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 1-3 people and consist of a proposal, mid-term report, final presentation, and final report. The timeline for the project is as follows.

AssignmentDue Date
Pre-proposal Meeting3/18
Weekly Progress Reports4/2, 4/9, 4/16, 4/23, 4/30, 5/7, 5/14
Mid-term Report4/27
Project Presentations5/20
Final Report5/20


We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper reviews, problem set, and the project reports.

Computing Resources

We will be using Amazon Web Services.

Useful Resources

Introduction to Algorithms, 3rd Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

Algorithm Design: Parallel and Sequential by Umut Acar and Guy Blelloch

Computational Geometry - Algorithms and Applications, Third Edition by Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

A list of papers related to graph analytics