6.886 Algorithm Engineering Spring 2019

Basic Information

Instructor: Prof. Julian Shun
Schedule: Tuesdays and Thursdays 2:30-4pm in room 1-150
Office hours: By appointment
Email: jshun AT mit.edu
Units: 3-0-9
Prerequisites: 6.046, 6.172

Course Description

This subject qualifies as a Computer Systems concentration subject.

This is a research-oriented course on algorithm engineering, which will cover both the theory and practice of algorithms and data structures. Students will learn about models of computation, algorithm design and analysis, and performance engineering of algorithm implementations. We will study the design and implementation of sequential, parallel, cache-efficient, external-memory, and write-efficient algorithms for fundamental problems in computing. Many of the principles of algorithm engineering will be illustrated in the context of parallel algorithms and graph problems. Students will read and present research papers, participate in class discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project meetings. This course is suitable for graduate students or advanced undergraduates who have taken 6.046 and 6.172. Mathematical maturity and familiarity with performance engineering will be assumed.

Skip to Course Policy


Schedule (tentative)

DateTopicSpeakerRequired ReadingOptional Reading
Tuesday 2/5Course IntroductionJulian Shun Algorithm Engineering - An Attempt at a Definition

A Theoretician's Guide to the Experimental Analysis of Algorithms

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

A Guide to Experimental Algorithmics

Algorithm engineering: an attempt at a definition using sorting as an example

Algorithm Engineering for Parallel Computation

Distributed Algorithm Engineering

Experimental algorithmics

Programming Pearls

Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Thursday 2/7Parallel AlgorithmsJulian ShunParallel Algorithms

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (Chapters 4-8)

CLRS Chapter 27

Prefix Sums and Their Applications

Algorithm Design: Parallel and Sequential

Introduction to Parallel Algorithms
Tuesday 2/12Parallel Graph TraversalPatrick Insinger

Taylor Andrews

Jeremy Bogle
Direction-Optimizing Breadth-First Search* Slides

A Faster Algorithm for Betweenness Centrality Slides

The More the Merrier: Efficient Multi-Source Graph Traversal* Slides
A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers)

Internally Deterministic Parallel Algorithms Can Be Fast

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Chapter 3.6 of Networks, Crowds, and Markets (describes Betweenness Centrality with an example)

Better Approximation of Betweenness Centrality

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

Fast approximation of betweenness centrality through sampling

Scalable Betweenness Centrality Maximization via Sampling

Articulation Points Guided Redundancy Elimination for Betweenness Centrality

Betweenness Centrality: Algorithms and Implementations
Thursday 2/14Shared-Memory Graph AlgorithmsJessica Shi

Julian Shun
Parallel Graph Decompositions Using Random Shifts* Slides

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity* Slides

Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable Slides
Improved Parallel Algorithms for Spanners and Hopsets

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Shared Memory Graph Frameworks

Chapter 14 of Networks, Crowds, and Markets
Tuesday 2/19No Class (Presidents Day)
Thursday 2/21External-Memory Algorithms Charlotte Chen

Amartya Shankha Biswas
The input/output complexity of sorting and related problems* Slides

A Functional Approach to External Graph Algorithms* Slides
Algorithms and Data Structures for External Memory

I/O-Complexity of Graph Algorithms

External-Memory Graph Algorithms

Beyond Synchronous: New Techniques for External-Memory Graph Connectivity and Minimum Spanning Forest

Fundamental parallel algorithms for private-cache chip multiprocessors

Parallel External Memory Graph Algorithms

Efficient Parallel and External Matching

External-Memory Computational Geometry

External Memory Geometric Data Structures

A Bulk-Parallel Priority Queue in External Memory with STXXL
Tuesday 2/26Cache-Oblivious Algorithms William Kuszmaul

Rawn Henry
Cache-Oblivious Algorithms* Slides

Engineering a cache-oblivious sorting algorithm* Slides
Cache-Oblivious Algorithms and Data Structures

Cache Oblivious Distribution Sweeping

Cache-oblivious databases: Limitations and opportunities

An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms

Cache-Oblivious B-Trees

Cache-oblivious streaming B-trees

An experimental comparison of cache-oblivious and cache-conscious programs
Thursday 2/28Parallel Cache-Oblivious Algorithms Endrias Kahssay

Julian Shun
Low depth cache-oblivious algorithms* Slides

Multicore Triangle Computations Without Tuning* Slides
Parallel and I/O efficient set covering algorithms

Resource Oblivious Sorting on Multicores

Scheduling Irregular Parallel Computations on Hierarchical Caches

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

Cache-efficient dynamic programming algorithms for multicores

Provably good multicore cache performance for divide-and-conquer algorithms

Concurrent cache-oblivious B-trees

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation

AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs

Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms

Oblivious algorithms for multicores and networks of processors

Algorithmic Aspects of Triangle-Based Network Analysis

Triangle Listing Algorithms: Back from the Diversion

The Input/Output Complexity of Triangle Enumeration

I/O-efficient join dependency testing, Loomis-Whitney join, and triangle enumeration

2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection
Tuesday 3/5Parallel SortingTim Kralj

Helen He
In-place Parallel Super Scalar Samplesort (IPS4o)* Slides

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort* Slides
Super Scalar Sample Sort

Parallel Merge Sort

Parallelism in Randomized Incremental Algorithms

Optimal and sublogarithmic time randomized parallel sorting algorithms

A comparison of sorting algorithms for the connection machine CM-2

Efficient implementation of sorting on multi-core SIMD CPU architecture

Sorting Data on Ultra-Large Scale with RADULS

Even Faster Sorting of (Not Only) Integers

Highly scalable parallel sorting

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Designing efficient sorting algorithms for manycore GPUs

TritonSort: A Balanced Large-Scale Sorting System

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

Practical Massively Parallel Sorting

Engineering a Multi-core Radix Sort

Sort Benchmark
Thursday 3/7Suffix Arrays and TreesRoshni Sahoo Linear work suffix array construction* Slides Better external memory suffix array construction

Engineering a Lightweight External Memory Suffix Array Construction Algorithm

Fast parallel skew and prefix-doubling suffix array construction on the GPU

Parallel suffix array and least common prefix for the GPU

Fast Parallel Computation of Longest Common Prefixes

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

Parallel lightweight wavelet tree, suffix array and FM-index construction

Practical Parallel Lempel-Ziv Factorization

Suffix arrays: a new method for on-line string searches

A taxonomy of suffix array construction algorithms
Tuesday 3/12Pre-proposal Meeting
Thursday 3/14Delaunay TriangulationYiqiu Wang

Julian Shun

Jared Di Carlo
Computational Geometry - Algorithms and Applications (Chapter 9)* Slides Slides

Parallel d-D Delaunay Triangulations in Shared and Distributed Memory* Slides
Engineering a compact parallel delaunay algorithm in 3D

Internally Deterministic Parallel Algorithms Can Be Fast

Parallelism in Randomized Incremental Algorithms

One machine, one minute, three billion tetrahedra

TIPP: parallel Delaunay triangulation for large-scale datasets
Friday 3/15Project Proposal due today. Due 3/18 if you go to the Comm Lab to discuss your proposal by 3/14.
Tuesday 3/19JoinsTaylor Andrews

Haonan Wang
Main-Memory Hash Joins on Modern Processor Architectures* Slides

Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited* Slides
Sort vs. Hash Join Revisited for Near-Memory Execution

Many-query join: efficient shared execution of relational joins on modern hardware

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory
Thursday 3/21AggregationYihan Sun (Guest Lecture)

Divya Gopinath
A Top-Down Parallel Semisort* Slides

Cache-Efficient Aggregation: Hashing Is Sorting* Slides
Optimal and sublogarithmic time randomized parallel sorting algorithms
Tuesday 3/26Spring break
Thursday 3/28Spring break
Tuesday 4/2Distributed Graph FrameworksRamya Nagarajan

Omar Obeya

Julian Shun
Pregel: A System for Large-Scale Graph Processing* Slides

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs* Slides

A Simple Parallel Cartesian Tree Algorithm and its Application to Parallel Suffix Tree Construction* Slides
Distributed Memory Graph Frameworks
Thursday 4/4Disk-based Graph FrameworksPatrick Insinger

Joana M. F. da Trindade

Divya Gopinath
Scalability! But at what COST? Slides

GraphChi: Large-Scale Graph Computation on Just a PC* Slides

Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System* Slides
External Memory Graph Frameworks
Tuesday 4/9Dynamic Graph AlgorithmsTim Kralj

Jessica Shi

Jeremy Bogle
A New Parallel Algorithm for Connected Components in Dynamic Graphs* Slides

Work-Efficient Parallel Union-Find* Slides

Exact and Parallel Triangle Counting in Dynamic Graphs* Slides
Dynamic Graph Algorithms
Thursday 4/11Relational and Streaming Graph FrameworksAaron Huang

Ramya Nagarajan
EmptyHeaded: A Relational Engine for Graph Processing* Slides

Distributed evaluation of subgraph queries using worst-case optimal low-memory dataflows* Slides
Skew Strikes Back: New Developments in the Theory of Join Algorithms

Streaming Graph Frameworks
Tuesday 4/16No Class (Patriots Day)
Thursday 4/18Locality in Graph ProcessingYunming Zhang (Guest Lecture) Reducing Pagerank Communication via Propagation Blocking*

Making Caches Work for Graph Analytics*

GraphIt: A High-Performance DSL for Graph Analytics*

Slides   Slides
Locality Optimizations

Shared Memory Graph Frameworks
Friday 4/19Mid-term Report due today. Due 4/22 if you go to the Comm Lab to discuss your report by 4/18.
Tuesday 4/23Graph Partitioning and ReorderingJared Di Carlo

Haonan Wang
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs* Slides

Speedup Graph Processing by Graph Ordering* Slides
Graph Partitioning and Reordering
Thursday 4/25CompressionRoshni Sahoo

Rawn Henry
Decoding billions of integers per second through vectorization* Slides

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms* Slides
Stream VByte: Faster byte-oriented integer compression

Roaring bitmaps: Implementation of an optimized software library

SIMD compression and the intersection of sorted integers

A General SIMD-Based Approach to Accelerating Compression Algorithms

An Experimental Study of Bitmap Compression vs. Inverted List Compression

Lightweight Data Compression Algorithms: An Experimental Survey

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

Compact Representations of Separable Graphs

An Experimental Analysis of a Compact Graph Representation

Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable
Tuesday 4/30No class
Thursday 5/2Parallel SchedulingYan Gu (Guest Lecture)

Omar Obeya
Scheduling Multithreaded Computations by Work Stealing*

Thread Scheduling for Multiprogrammed Multiprocessors* Slides

The Data Locality of Work Stealing* Slides
Cilk: An Efficient Multithreaded Runtime System

Scheduling Irregular Parallel Computations on Hierarchical Caches

Provably Efficient Scheduling for Languages with Fine-Grained Parallelism

Effectively sharing a cache among threads

Experimental analysis of space-bounded schedulers

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
Tuesday 5/7Write-Efficient AlgorithmsYan Gu (Guest Lecture) Sorting with Asymmetric Read and Write Costs*

Parallel Algorithms with Asymmetric Read and Write Costs*

Write-Avoiding Algorithms

Algorithmic Building Blocks for Asymmetric Memories

Parallel Write-Efficient Algorithms and Data Structures for Computational Geometry

Implicit Decomposition for Write-Efficient Connectivity Algorithms

Efficient Algorithms with Asymmetric Read and Write Costs

Lower Bounds in the Asymmetric External Memory Model
Thursday 5/9Parallel Binary TreesYihan Sun (Guest Lecture) Just Join for Parallel Ordered Sets*

PAM: parallel augmented maps*

Parallel Range, Segment and Rectangle Queries with Augmented Maps*
Computational Geometry - Algorithms and Applications (Chapters 5 and 10)
Tuesday 5/14Parallel Shortest Paths and Core DecompositionHelen He

Julian Shun
Delta-stepping: a parallelizable shortest path algorithm* Slides

Fast algorithms for determining (generalized) core groups in social networks

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing* Slides
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

K-Core Decomposition of Large Networks on a Single PC

Patterns and Anomalies in k-Cores of Real-World Graphs with Applications

Parallel Shortest Paths Using Radius Stepping

An Experimental Study of a Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances

DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

A Randomized Parallel Algorithm for Single-Source Shortest Path

A Parallel Priority Queue with Constant Time Operations
Thursday 5/16Project Presentations

Course Policy

This is a graduate-level course where we will cover research in algorithm engineering. Advanced undergraduates may enroll if they have taken 6.046 and 6.172. The course units are 3-0-9.


For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.


The assignments consist of writing weekly paper reviews, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Reviews15%
Paper Presentations20%
Research Project60%
Class Participation5%

You must complete all assignments to pass the class.

Paper Reviews

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. In addition, students are required to submit one paper review every week, due 11:59pm on Mondays. The review should be on a paper chosen from any of the starred papers (*) under "Required Reading" for the two lectures the week (i.e., the Tuesday and Thursday immediately after the due date).

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of. The length should be 3-4 paragraphs.

The paper reviews will be submitted on Learning Modules. The paper reviews will be made visible after each submission deadline, and you are encouraged to read other reviews to improve your understanding and to prepare for the class discussion.

Paper Presentations

Students will give several presentations on assigned papers throughout the semester. These presentations should be 25-30 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical results and proofs, experimental results, and existing work. Theoretical proofs may be presented on the board. The presentation should also include any relevant content from related work that would be needed to fully understand the paper being presented. The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient algorithms; analyzing and optimizing the performance of existing algorithm implementations; designing and implementing new algorithms that are theoretically and/or practically efficient; applying algorithms in the context of larger applications; and improving or designing new programming frameworks/systems for classes of algorithms. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. There should be an implementation component to the project. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 1-3 people and consist of a proposal, mid-term report, final presentation, and final report. The timeline for the project is as follows.

AssignmentDue Date
Pre-proposal Meeting3/12
Weekly Progress Reports3/22, 4/5, 4/12, 4/26, 5/3, 5/10
Mid-term Report4/19
Project Presentations5/16
Final Report5/16


We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper questions, paper reviews, and the project reports. Please do not look at anyone else's answers before submitting your answers to the paper questions and your paper reviews.

Computing Resources

We will be using Amazon Web Services.

Useful Resources

Introduction to Algorithms, 3rd Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

Algorithm Design: Parallel and Sequential by Umut Acar and Guy Blelloch

Computational Geometry - Algorithms and Applications, Third Edition by Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

A list of papers related to graph analytics