6.886 Algorithm Engineering Spring 2020

Basic Information

Instructor: Julian Shun
TA: Yiqiu Wang
Schedule: Tuesdays and Thursdays 2:30-4pm in room 1-150
Office hours: By appointment
Email: jshun AT mit.edu, yiqiuw AT mit.edu
Units: 3-0-9
Prerequisites: 6.046, 6.172

Course Description

This subject qualifies as a Computer Systems concentration subject.

This is a research-oriented course on algorithm engineering, which will cover both the theory and practice of algorithms and data structures. Students will learn about models of computation, algorithm design and analysis, and performance engineering of algorithm implementations. We will study the design and implementation of sequential, parallel, cache-efficient, external-memory, and write-efficient algorithms for fundamental problems in computing. Many of the principles of algorithm engineering will be illustrated in the context of parallel algorithms and graph problems. Students will read and present research papers, write paper reviews, complete assignments that involve both theory and implementation, participate in classroom discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project meetings. This course is suitable for graduate students or advanced undergraduates who have taken 6.046 and 6.172. Mathematical maturity and familiarity with algorithm analysis and performance engineering will be assumed.

Skip to Course Policy


Schedule (tentative)

DateTopicSpeakerRequired ReadingOptional Reading
Tuesday 2/4Course IntroductionJulian Shun Algorithm Engineering - An Attempt at a Definition

A Theoretician's Guide to the Experimental Analysis of Algorithms

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

A Guide to Experimental Algorithmics

Algorithm engineering: an attempt at a definition using sorting as an example

Algorithm Engineering for Parallel Computation

Distributed Algorithm Engineering

Experimental algorithmics

Programming Pearls

Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Thursday 2/6Parallel AlgorithmsJulian ShunParallel Algorithms

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (Chapters 4-8)

CLRS Chapter 27

Prefix Sums and Their Applications

Algorithm Design: Parallel and Sequential

Introduction to Parallel Algorithms

Scheduling Multithreaded Computations by Work Stealing

Thread Scheduling for Multiprogrammed Multiprocessors
Tuesday 2/11Parallel Graph TraversalRahul Yesantharao

Shwetark Patel

Sai Sameer Pusapaty
Direction-Optimizing Breadth-First Search* Slides

A Faster Algorithm for Betweenness Centrality Slides

The More the Merrier: Efficient Multi-Source Graph Traversal* Slides

A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers)

Internally Deterministic Parallel Algorithms Can Be Fast

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Chapter 3.6 of Networks, Crowds, and Markets (describes Betweenness Centrality with an example)

Better Approximation of Betweenness Centrality

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

Fast approximation of betweenness centrality through sampling

Scalable Betweenness Centrality Maximization via Sampling

Articulation Points Guided Redundancy Elimination for Betweenness Centrality

Betweenness Centrality: Algorithms and Implementations
Thursday 2/13Shared-Memory Graph AlgorithmsJulian Shun Parallel Graph Decompositions Using Random Shifts*

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity* Slides

Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable Slides
Improved Parallel Algorithms for Spanners and Hopsets

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Shared Memory Graph Frameworks

Chapter 14 of Networks, Crowds, and Markets
Tuesday 2/18No Class (Presidents Day)
Thursday 2/20Parallel SortingJessica Zhu

Endrias Kahssay
In-place Parallel Super Scalar Samplesort (IPS4o)* Slides

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting* Slides
Super Scalar Sample Sort

Parallel Merge Sort

Parallelism in Randomized Incremental Algorithms

Optimal and sublogarithmic time randomized parallel sorting algorithms

A comparison of sorting algorithms for the connection machine CM-2

Efficient implementation of sorting on multi-core SIMD CPU architecture

Sorting Data on Ultra-Large Scale with RADULS

Even Faster Sorting of (Not Only) Integers

Highly scalable parallel sorting

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Designing efficient sorting algorithms for manycore GPUs

TritonSort: A Balanced Large-Scale Sorting System

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

Practical Massively Parallel Sorting

Engineering a Multi-core Radix Sort

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort

Sort Benchmark
Tuesday 2/25Delaunay Triangulation Anton Cao

Rahul Yesantharao
Computational Geometry - Algorithms and Applications (Chapter 9)* Slides

One machine, one minute, three billion tetrahedra* Slides
Design and Implementation of a Practical Parallel Delaunay Algorithm

Parallel d-D Delaunay Triangulations in Shared and Distributed Memory

Engineering a compact parallel delaunay algorithm in 3D

Internally Deterministic Parallel Algorithms Can Be Fast

Parallelism in Randomized Incremental Algorithms

TIPP: parallel Delaunay triangulation for large-scale datasets
Thursday 2/27Spatial ClusteringShwetark Patel

Yiqiu Wang
On the Hardness and Approximation of Euclidean DBSCAN* Slides

Theoretically-Efficient and Practical Parallel DBSCAN* Slides
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN

OPTICS: ordering points to identify the clustering structure

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

Fast Euclidean OPTICS with Bounded Precision in Low Dimensional Space

Faster DBScan and HDBScan in Low-Dimensional Euclidean Spaces

Geometric Algorithms for Density-Based Data Clustering

A Faster Algorithm for DBSCAN
Tuesday 3/3External-Memory AlgorithmsBryan Chen

Sophia Luo
The input/output complexity of sorting and related problems* Slides

A Functional Approach to External Graph Algorithms* Slides

Algorithms and Data Structures for External Memory

I/O-Complexity of Graph Algorithms

External-Memory Graph Algorithms

Beyond Synchronous: New Techniques for External-Memory Graph Connectivity and Minimum Spanning Forest

Fundamental parallel algorithms for private-cache chip multiprocessors

Parallel External Memory Graph Algorithms

Efficient Parallel and External Matching

External-Memory Computational Geometry

External Memory Geometric Data Structures

A Bulk-Parallel Priority Queue in External Memory with STXXL
Thursday 3/5Cache-Oblivious Algorithms Obada Alkhatib

William Qian
Cache-Oblivious Algorithms* Slides

Engineering a cache-oblivious sorting algorithm* Slides
Cache-Oblivious Algorithms and Data Structures

Cache Oblivious Distribution Sweeping

Cache-oblivious databases: Limitations and opportunities

An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms

Cache-Oblivious B-Trees

Cache-oblivious streaming B-trees

An experimental comparison of cache-oblivious and cache-conscious programs
Tuesday 3/10Pre-proposal Meeting
Thursday 3/12Parallel Cache-Oblivious Algorithms Julian Shun Low Depth Cache-Oblivious Algorithms* Slides

Multicore Triangle Computations Without Tuning* Slides

Parallel and I/O efficient set covering algorithms

Resource Oblivious Sorting on Multicores

Scheduling Irregular Parallel Computations on Hierarchical Caches

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

Cache-efficient dynamic programming algorithms for multicores

Provably good multicore cache performance for divide-and-conquer algorithms

Concurrent cache-oblivious B-trees

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation

AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs

Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms

Oblivious algorithms for multicores and networks of processors

Algorithmic Aspects of Triangle-Based Network Analysis

Triangle Listing Algorithms: Back from the Diversion

The Input/Output Complexity of Triangle Enumeration

I/O-efficient join dependency testing, Loomis-Whitney join, and triangle enumeration

2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection
Tuesday 3/17No class
Thursday 3/19No class
Tuesday 3/24Spring break
Thursday 3/26Spring break
Tuesday 3/31Project Proposal due today.
Tuesday 3/31Distributed Graph FrameworksShwetark Patel

Rahul Yesantharao
Pregel: A System for Large-Scale Graph Processing* Slides

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs* Slides
Distributed Memory Graph Frameworks
Thursday 4/2Disk-based Graph FrameworksAbdullah Alomar

Jessica Zhu

Endrias Kahssay
Scalability! But at what COST? Slides

GraphChi: Large-Scale Graph Computation on Just a PC* Slides

LUMOS: Dependency-Driven Disk-based Graph Processing* Slides
External Memory Graph Frameworks
Friday 4/3Problem Set due today.
Tuesday 4/7Dynamic Graph AlgorithmsJessica Zhu

Aron Ricardo Perez-Lopez

Obada Alkhatib
A New Parallel Algorithm for Connected Components in Dynamic Graphs* Slides

Work-Efficient Parallel Union-Find* Slides

Exact and Parallel Triangle Counting in Dynamic Graphs* Slides
Dynamic Graph Algorithms
Thursday 4/9Streaming Graph FrameworksAnton Cao

Julian Shun
GraphOne: A Data Store for Real-time Analytics on Evolving Graphs* Slides

Low-Latency Graph Streaming Using Compressed Purely-Functional Trees* Slides
Streaming Graph Frameworks
Tuesday 4/14Subgraph FindingSai Sameer Pusapaty

Abdullah Alomar
EmptyHeaded: A Relational Engine for Graph Processing* Slides

AutoMine: Harmonizing High-Level Abstraction and High Performance for Graph Mining* Slides
Skew Strikes Back: New Developments in the Theory of Join Algorithms

Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins

Subgraph Finding
Thursday 4/16JoinsWilliam Qian

Aron Ricardo Perez-Lopez
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited* Slides

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory* Slides
Main-Memory Hash Joins on Modern Processor Architectures

Sort vs. Hash Join Revisited for Near-Memory Execution

Many-query join: efficient shared execution of relational joins on modern hardware
Tuesday 4/21Mid-term Report due today.
Tuesday 4/21Making Graph Computations Fast, Simple, and PortableYunming Zhang
(Guest Lecture)
Reducing Pagerank Communication via Propagation Blocking*

Making Caches Work for Graph Analytics*

GraphIt: A High-Performance DSL for Graph Analytics*

Locality Optimizations

Shared Memory Graph Frameworks
Thursday 4/23Graph Compression and ReorderingEndrias Kahssay

Sophia Luo
An Experimental Analysis of a Compact Graph Representation* Slides

Speedup Graph Processing by Graph Ordering* Slides
Compact Representations of Separable Graphs

Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable

Graph Compression

Graph Partitioning and Reordering
Tuesday 4/28Compression

Problem Set Review
Abdullah Alomar

Julian Shun
An Experimental Study of Bitmap Compression vs. Inverted List Compression* Slides

Problem Set Review
Decoding billions of integers per second through vectorization*

Stream VByte: Faster byte-oriented integer compression

Roaring bitmaps: Implementation of an optimized software library

SIMD compression and the intersection of sorted integers

A General SIMD-Based Approach to Accelerating Compression Algorithms

Lightweight Data Compression Algorithms: An Experimental Survey

Compact inverted index storage using general-purpose compression libraries

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
Thursday 4/30FiltersAron Ricardo Perez-Lopez

William Qian
Don't thrash: how to cache your hash on flash* Slides

Morton filters: fast, compressed sparse cuckoo filters* Slides

Space/time trade-offs in hash coding with allowable errors

Network Applications of Bloom Filters: A Survey

Compressed Bloom filters

Cache-, Hash-, and Space-Efficient Bloom Filters

Cuckoo Filter: Practically Better Than Bloom

2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection

The Dynamic Bloom Filters

A General-Purpose Counting Filter: Making Every Bit Count

Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput

The Case for Learned Index Structures

A Model for Learned Bloom Filters, and Optimizing by Sandwiching
Tuesday 5/5Parallel Bucketing AlgorithmsJulian Shun Fast algorithms for determining (generalized) core groups in social networks

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing* Slides

Parallel Algorithms for Butterfly Computations* Slides
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

K-Core Decomposition of Large Networks on a Single PC

Patterns and Anomalies in k-Cores of Real-World Graphs with Applications

Delta-stepping: a parallelizable shortest path algorithm

Parallel Shortest Paths Using Radius Stepping

Arboricity and Subgraph Listing Algorithms

Butterfly Counting in Bipartite Networks

Peeling Bipartite Networks for Dense Subgraph Discovery
Thursday 5/7Suffix Arrays and Trees Linear work suffix array construction* Slides

A Simple Parallel Cartesian Tree Algorithm and its Application to Parallel Suffix Tree Construction* Slides
Better external memory suffix array construction

Engineering a Lightweight External Memory Suffix Array Construction Algorithm

Fast parallel skew and prefix-doubling suffix array construction on the GPU

Parallel suffix array and least common prefix for the GPU

Fast Parallel Computation of Longest Common Prefixes

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

Parallel lightweight wavelet tree, suffix array and FM-index construction

Practical Parallel Lempel-Ziv Factorization

Suffix arrays: a new method for on-line string searches

A taxonomy of suffix array construction algorithms
Tuesday 5/12Project Presentations

Course Policy

This is a graduate-level course where we will cover research in algorithm engineering. Advanced undergraduates may enroll if they have taken 6.046 and 6.172. The course units are 3-0-9.


For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.


The assignments consist of answering questions about papers, writing paper reviews, completing a problem set, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Reviews10%
Paper Questions and Problem Set15%
Paper Presentations20%
Research Project50%
Class Participation5%

You must complete all assignments to pass the class.

Paper Reviews, Paper Questions, and Problem Set

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. Students are required to answer a question about each paper, due at 12:00pm on the day of the lecture.

In addition, students are required to submit one paper review every week, due 11:59pm on Mondays. The review should be on a paper chosen from any of the starred papers (*) under "Required Reading" for the two lectures the week (i.e., the Tuesday and Thursday immediately after the due date).

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of. The length should be 3-4 paragraphs.

The paper reviews will be submitted on Learning Modules. The paper reviews will be made visible after each submission deadline, and you are encouraged to read other reviews to improve your understanding and to prepare for the class discussion.

Finally, there will be a problem set on parallel algorithms to be released several weeks into the semester and due on 3/20.

Paper Presentations

Students will give several presentations on assigned papers throughout the semester. These presentations should be 25-30 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical results and proofs, experimental results, and existing work. Theoretical proofs may be presented on the board. The presentation should also include any relevant content from related work that would be needed to fully understand the paper being presented. The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient algorithms; analyzing and optimizing the performance of existing algorithm implementations; designing and implementing new algorithms that are theoretically and/or practically efficient; applying algorithms in the context of larger applications; and improving or designing new programming frameworks/systems for classes of algorithms. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. There should be an implementation component to the project. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 1-3 people and consist of a proposal, mid-term report, final presentation, and final report. The timeline for the project is as follows.

AssignmentDue Date
Pre-proposal Meeting3/10
Weekly Progress Reports4/3, 4/10, 4/24, 5/1, 5/8
Mid-term Report4/21
Project Presentations5/12
Final Report5/12


We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper questions, paper reviews, and the project reports. Please do not look at anyone else's answers before submitting your answers to the paper questions and your paper reviews.

Computing Resources

We will be using Amazon Web Services.

Useful Resources

Introduction to Algorithms, 3rd Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

Algorithm Design: Parallel and Sequential by Umut Acar and Guy Blelloch

Computational Geometry - Algorithms and Applications, Third Edition by Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

A list of papers related to graph analytics