6.506 Algorithm Engineering Spring 2023

Basic Information

Instructors: Charles Leiserson and Julian Shun
TA: Amartya Shankha Biswas
Schedule: Tuesdays and Thursdays 11am-12:30pm ET
Location: 24-121
Office hours: By appointment
Email: cel AT mit.edu, jshun AT mit.edu, asbiswas AT mit.edu
Piazza
Units: 3-0-9
Prerequisites: 6.122 (6.046), 6.106 (6.172)

Course Description

This is a research-oriented course on algorithm engineering, which will cover both the theory and practice of algorithms and data structures. Students will learn about models of computation, algorithm design and analysis, and performance engineering of algorithm implementations. We will study the design and implementation of sequential, parallel, cache-efficient, external-memory, and write-efficient algorithms for fundamental problems in computing. Many of the principles of algorithm engineering will be illustrated in the context of parallel algorithms and graph problems. Students will read and present research papers, write paper reviews, complete problem sets, participate in classroom discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project meetings. This course is suitable for graduate students or advanced undergraduates who have taken 6.122 (6.046) and 6.106 (6.172). Mathematical maturity and familiarity with algorithm analysis and performance engineering will be assumed. Lectures will consist of instructor and student presentations. Lecture attendance is required and participation counts toward the grade.

Skip to Course Policy

Announcements

Schedule (tentative)

This is a tentative schedule that will be updated before the course begins. However, it should give you an idea of the topics that we will cover.

DateTopicSpeakerRequired ReadingOptional Reading
Tuesday 2/7Course IntroductionJulian Shun Algorithm Engineering - An Attempt at a Definition

A Theoretician's Guide to the Experimental Analysis of Algorithms

Slides
Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

A Guide to Experimental Algorithmics

Algorithm engineering: an attempt at a definition using sorting as an example

Algorithm Engineering for Parallel Computation

Distributed Algorithm Engineering

Experimental algorithmics

Programming Pearls

Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Thursday 2/9Parallel AlgorithmsJulian ShunParallel Algorithms

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (Chapters 4-8)

CLRS Chapter 26 (Parallel Algorithms)

Slides
Prefix Sums and Their Applications

Algorithm Design: Parallel and Sequential

Introduction to Parallel Algorithms

Scheduling Multithreaded Computations by Work Stealing

Thread Scheduling for Multiprogrammed Multiprocessors

Multidimensional Included and Excluded Sums

Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums

Problem Based Benchmark Suite
Tuesday 2/14Cache-Oblivious Algorithms Charles Leiserson Cache-Oblivious Algorithms

Cache-Oblivious Algorithms and Data Structures

Slides
Engineering a cache-oblivious sorting algorithm

Cache Oblivious Distribution Sweeping

Cache-oblivious databases: Limitations and opportunities

An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms

Cache-Oblivious B-Trees

Cache-oblivious streaming B-trees

An experimental comparison of cache-oblivious and cache-conscious programs
Thursday 2/16CilkAlexandros-Stavros Iliopoulos The Implementation of the Cilk-5 Multithreaded Language

Scheduling Multithreaded Computations by Work Stealing

Slides
Thread Scheduling for Multiprogrammed Multiprocessors

The Data Locality of Work Stealing
Tuesday 2/21No class
Thursday 2/23Stencil ComputationsRezaul Chowdhury Cache oblivious stencil computations

Fast Stencil Computations using Fast Fourier Transforms

Slides
The cache complexity of multithreaded cache oblivious algorithms

The pochoir stencil compiler
Tuesday 2/28Shared-Memory Graph ProcessingJulian Shun A Simple and Practical Linear-Work Parallel Algorithm for Connectivity Slides

Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable Slides
Direction-Optimizing Breadth-First Search

Parallel Graph Decompositions Using Random Shifts

Shared Memory Graph Frameworks

Chapter 14 of Networks, Crowds, and Markets
Thursday 3/2Storage SystemsMartin Farach-Colton Don't thrash: how to cache your hash on flash

SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores
The input/output complexity of sorting and related problems

Algorithms and Data Structures for External Memory
Monday 3/6Problem Set due today
Tuesday 3/7Software Productivity Tools for Fork-Join ProgramsCharles Leiserson Efficient detection of determinacy races in Cilk programs

The Cilkprof Scalability Profiler

Slides
Thursday 3/9Pre-proposal Meeting
Tuesday 3/14Sorting AlgorithmsChristopher Rinard

Julian Shun
Engineering In-place (Shared-memory) Sorting Algorithms Slides

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting Slides
Super Scalar Sample Sort

Parallel Merge Sort

Parallelism in Randomized Incremental Algorithms

Optimal and sublogarithmic time randomized parallel sorting algorithms

A comparison of sorting algorithms for the connection machine CM-2

Efficient implementation of sorting on multi-core SIMD CPU architecture

Sorting Data on Ultra-Large Scale with RADULS

Even Faster Sorting of (Not Only) Integers

Highly scalable parallel sorting

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Designing efficient sorting algorithms for manycore GPUs

TritonSort: A Balanced Large-Scale Sorting System

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

Practical Massively Parallel Sorting

Engineering a Multi-core Radix Sort

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort

Parallel In-Place Algorithms: Theory and Practice

Sort Benchmark
Thursday 3/16Shared-Memory Graph Algorithms Eeshan Tripathii

Ethan LaBelle
Parallel Graph Decompositions Using Random Shifts Slides

Ordering heuristics for parallel graph coloring Slides
Improved Parallel Algorithms for Spanners and Hopsets

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
Friday 3/17Project Proposal due today
Tuesday 3/21Distributed Graph FrameworksJesse Yang

Richard Sollee
Pregel: A System for Large-Scale Graph Processing Slides

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Slides
Distributed Memory Graph Frameworks
Thursday 3/23Single-Machine Graph FrameworksJulian Shun

CJ Quines

Nikola Samardzic
Scalability! But at what COST? Slides

GraphChi: Large-Scale Graph Computation on Just a PC* Slides

Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling* Slides

(Pick one of the starred papers to write your review on.)
Shared Memory Graph Frameworks

External Memory Graph Frameworks
Tuesday 3/28Spring Break
Thursday 3/30Spring Break
Tuesday 4/4Locality in Graph ProcessingIsabelle Quaye

Kevin Tong
Reducing Pagerank Communication via Propagation Blocking Slides

Making Caches Work for Graph Analytics Slides
GraphIt: A High-Performance DSL for Graph Analytics Locality Optimizations

Shared Memory Graph Frameworks
Thursday 4/6Subgraph FindingDeniz Sert

Collin Warner
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra Slides

ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations Slides
Subgraph Finding
Tuesday 4/11Graph Compression and ReorderingEric Wang

Nick Dow
Log(graph): a near-optimal high-performance graph representation Slides

Locality Analysis of Graph Reordering Algorithms Slides
An Experimental Analysis of a Compact Graph Representation

Compact Representations of Separable Graphs

Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable

Graph Compression

Graph Partitioning and Reordering
Thursday 4/13CompressionKrit Boonsiriseth

Giorgi Kldiashvili
Decoding billions of integers per second through vectorization Slides

Techniques for Inverted Index Compression Slides
An Experimental Study of Bitmap Compression vs. Inverted List Compression

Stream VByte: Faster byte-oriented integer compression

Roaring bitmaps: Implementation of an optimized software library

SIMD compression and the intersection of sorted integers

A General SIMD-Based Approach to Accelerating Compression Algorithms

Lightweight Data Compression Algorithms: An Experimental Survey

Compact inverted index storage using general-purpose compression libraries

An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
Tuesday 4/18Dynamic Graph Algorithms

Midterm report due today
Temi Taylor

Thomas Bergamaschi
A New Parallel Algorithm for Connected Components in Dynamic Graphs Slides

Work-Efficient Parallel Union-Find Slides
Dynamic Graph Algorithms

Streaming Graph Frameworks
Thursday 4/20Transactional MemoryLuka Govedic

Elie Cuevas
Software transactional memory Slides

A simple deterministic algorithm for guaranteeing the forward progress of transactions Slides
Tuesday 4/25AggregationAnton Ni

Adam Janicki
Cache-Efficient Aggregation: Hashing Is Sorting Slides

Efficient Sorting, Duplicate Removal, Grouping, and Aggregation Slides
A Top-Down Parallel Semisort

Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Main-Memory Hash Joins on Modern Processor Architectures

Sort vs. Hash Join Revisited for Near-Memory Execution

Many-query join: efficient shared execution of relational joins on modern hardware

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory

Optimal and sublogarithmic time randomized parallel sorting algorithms
Thursday 4/27Functional TreesJulian Shun Low-Latency Graph Streaming Using Compressed Purely-Functional Trees Slides

PaC-trees: supporting parallel and compressed purely-functional collections
Streaming Graph Frameworks
Tuesday 5/2Graph Neural NetworksSteven Raphael

Helen Yang
LightNE: A Lightweight Graph Processing System for Network Embedding Slides

Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining Slides
Marius: Learning Massive Graph Embeddings on a Single Machine

ThunderRW: An In-Memory Graph Random Walk Engine

Random Walks on Huge Graphs at Cache Efficiency

Graph Representation Learning Book

CS224W: Machine Learning with Graphs (Stanford course)
Thursday 5/4Random Number Generation and Multidimensional SumsMaximo Machado

Derrick Liang
Deterministic parallel random-number generation for dynamic-multithreading platforms Slides

Multidimensional Included and Excluded Sums Slides
Tuesday 5/9Progress in algorithms: the big trendsNeil Thompson How Fast Do Algorithms Improve?

Building the algorithm commons: Who discovered the algorithms that underpin computing in the modern enterprise?
Thursday 5/11Revisiting Matrix MultiplicationTao SchardlSlides
Tuesday 5/16Project Presentations

Final report due today

Course Policy

This is a graduate-level course where we will cover research in algorithm engineering. Advanced undergraduates may enroll if they have taken 6.122 (6.046) and 6.106 (6.172). The course units are 3-0-9.

Assignments will be posted and submitted on Canvas.

Lectures

For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.

Grading

The assignments consist of answering questions about papers, writing paper reviews, completing a problem set, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Reviews20%
Problem Set10%
Paper Presentations15%
Research Project45%
Class Participation10%

You must complete all assignments to pass the class.

Paper Reviews and Problem Set

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. In addition, students are required to submit one paper review before every lecture, due at 10am on the day of the lecture. The review should be on a paper chosen from any of the papers under "Required Reading" for the lecture.

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of. The length should be 3-4 paragraphs.

Finally, there will be a problem set on parallel algorithms to be released in the second week of the semester and due at 11:59pm on Monday 3/6.

Paper Presentations

Students will give presentations on assigned papers during the semester. These presentations should be 25-30 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical results and proofs, experimental results, and existing work. Theoretical proofs may be presented on the board. The presentation should also include any relevant content from related work that would be needed to fully understand the paper being presented. The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Students are required to meet with an instructor at least 3 business days before their presentation to go over their slides and get feedback.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient algorithms; analyzing and optimizing the performance of existing algorithm implementations; designing and implementing new algorithms that are theoretically and/or practically efficient; applying algorithms in the context of larger applications; and improving or designing new programming frameworks/systems for classes of algorithms. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. There should be an implementation component to the project. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 1-3 people and consist of a proposal, mid-term report, final presentation, and final report. The timeline for the project is as follows.

AssignmentDue Date
Pre-proposal Meeting3/9
Proposal3/17
Weekly Progress Reports3/24, 4/7, 4/14, 4/21, 4/28, 5/5
Mid-term Report4/18
Project Presentations5/16
Final Report5/16

Piazza

We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper reviews, problem set, and the project reports.

Computing Resources

You can obtain student credits on a cloud computing provider, such as Google Cloud Platform.

Useful Resources

Introduction to Algorithms, 4th Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

A list of papers related to graph analytics