6.506 Algorithm Engineering Spring 2024

Basic Information

Instructor: Julian Shun
TA: Amy Hu
Schedule: Tuesdays and Thursdays 11am-12:30pm ET
Location: 3-370
Office hours: By appointment
Email: jshun AT mit.edu, amyhu AT mit.edu
Piazza
Units: 3-0-9
Prerequisites: 6.122 (6.046), 6.106 (6.172)

Course Description

This is a research-oriented course on algorithm engineering, which will cover both the theory and practice of algorithms and data structures. Students will learn about models of computation, algorithm design and analysis, and performance engineering of algorithm implementations. We will study the design and implementation of sequential, parallel, cache-efficient, external-memory, and write-efficient algorithms for fundamental problems in computing. Many of the principles of algorithm engineering will be illustrated in the context of parallel algorithms and graph problems. Students will read and present research papers, write paper reviews, complete problem sets, participate in classroom discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project meetings. This course is suitable for graduate students or advanced undergraduates who have taken 6.122 (6.046) and 6.106 (6.172). Mathematical maturity and familiarity with algorithm analysis and performance engineering will be assumed. Lectures will consist of instructor and student presentations. Lecture attendance is required and participation counts toward the grade.

Skip to Course Policy

Announcements

Please sign up for Piazza.
TA Notes for the class are available here.

Schedule (tentative)

This is a tentative schedule that will be updated before the course begins. However, it should give you an idea of the topics that we will cover.

Date	Topic	Speaker	Required Reading	Optional Reading
Tuesday 2/6	Course Introduction	Julian Shun	Algorithm Engineering - An Attempt at a Definition A Theoretician's Guide to the Experimental Analysis of Algorithms Slides	Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice A Guide to Experimental Algorithmics Algorithm engineering: an attempt at a definition using sorting as an example Algorithm Engineering for Parallel Computation Distributed Algorithm Engineering Experimental algorithmics Programming Pearls Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time
Thursday 2/8	Parallel Algorithms	Julian Shun	Parallel Algorithms Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (Chapters 4-8) CLRS Chapter 26 (Parallel Algorithms) Slides	Prefix Sums and Their Applications Algorithm Design: Parallel and Sequential Introduction to Parallel Algorithms Scheduling Multithreaded Computations by Work Stealing Thread Scheduling for Multiprogrammed Multiprocessors Multidimensional Included and Excluded Sums Work-Efficient Parallel Algorithms for Accurate Floating-Point Prefix Sums Problem Based Benchmark Suite
Tuesday 2/13	Parallel Graph Traversal	Daniel Zeng Joseph Zhang	Direction-Optimizing Breadth-First Search Slides The More the Merrier: Efficient Multi-Source Graph Traversal Slides (Presentations postponed to 2/15 due to MIT closing.)	A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers) Internally Deterministic Parallel Algorithms Can Be Fast SlimSell: A Vectorizable Graph Representation for Breadth-First Search
Thursday 2/15	Parallel Graph Traversal		Presentations from 2/13
Tuesday 2/20	No class
Thursday 2/22	External-Memory Algorithms	Alex Bardon Nguyen Le	The input/output complexity of sorting and related problems Slides A Functional Approach to External Graph Algorithms Slides	Algorithms and Data Structures for External Memory I/O-Complexity of Graph Algorithms External-Memory Graph Algorithms Beyond Synchronous: New Techniques for External-Memory Graph Connectivity and Minimum Spanning Forest Fundamental parallel algorithms for private-cache chip multiprocessors Parallel External Memory Graph Algorithms Efficient Parallel and External Matching External-Memory Computational Geometry External Memory Geometric Data Structures A Bulk-Parallel Priority Queue in External Memory with STXXL
Tuesday 2/27	Cache-Oblivious Algorithms	Alex Bardon Julian Shun	Cache-Oblivious Algorithms Engineering a cache-oblivious sorting algorithm Slides A Simple and Practical Linear-Work Parallel Algorithm for Connectivity Slides	Cache-Oblivious Algorithms and Data Structures Cache Oblivious Distribution Sweeping Cache-oblivious databases: Limitations and opportunities An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms Cache-Oblivious B-Trees Cache-oblivious streaming B-trees An experimental comparison of cache-oblivious and cache-conscious programs
Thursday 2/29	Sorting Algorithms	Kasra Mazaheri Alex Wang	Engineering In-place (Shared-memory) Sorting Algorithms Slides High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems Slides	Theoretically-Efficient and Practical Parallel In-Place Radix Sorting Super Scalar Sample Sort Parallel Merge Sort Parallelism in Randomized Incremental Algorithms Optimal and sublogarithmic time randomized parallel sorting algorithms A comparison of sorting algorithms for the connection machine CM-2 Efficient implementation of sorting on multi-core SIMD CPU architecture Sorting Data on Ultra-Large Scale with RADULS Even Faster Sorting of (Not Only) Integers Highly scalable parallel sorting A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs Designing efficient sorting algorithms for manycore GPUs TritonSort: A Balanced Large-Scale Sorting System A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort Practical Massively Parallel Sorting Engineering a Multi-core Radix Sort PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort Parallel In-Place Algorithms: Theory and Practice Sort Benchmark
Monday 3/4	Problem Set due today
Tuesday 3/5	Pre-proposal Meeting
Thursday 3/7	Distributed Graph Frameworks	Julian Shun Miranda Cai Daniel Zeng	Scalability! But at what COST? Slides Pregel: A System for Large-Scale Graph Processing* Slides PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs* Slides (Pick one of the starred papers to write a paper review for.)	Distributed Memory Graph Frameworks
Tuesday 3/12	Single-Machine Graph Frameworks	Jonathan Li Daniel Schaffer	GraphChi: Large-Scale Graph Computation on Just a PC Slides CoroGraph: Bridging Cache Efficiency and Work Efficiency for Graph Algorithm Execution Slides	External Memory Graph Frameworks Shared Memory Graph Frameworks
Thursday 3/14	Scheduling and Triangle Counting	Jason Liu Julian Shun	Thread Scheduling for Multiprogrammed Multiprocessors Slides Multicore Triangle Computations Without Tuning Slides	The Implementation of the Cilk-5 Multithreaded Language Scheduling Multithreaded Computations by Work Stealing The Data Locality of Work Stealing OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code
Friday 3/15	Project Proposal due today
Tuesday 3/19	Locality in Graph Processing	Miranda Cai Julian Shun	Making Caches Work for Graph Analytics Slides Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable Slides	Reducing Pagerank Communication via Propagation Blocking GraphIt: A High-Performance DSL for Graph Analytics Locality Optimizations Shared Memory Graph Frameworks
Thursday 3/21	Subgraph Finding	Raymond Pan Xander Morgan	GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra Slides ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations Slides	Subgraph Finding
Tuesday 3/26	Spring Break
Thursday 3/28	Spring Break
Tuesday 4/2	Graph Compression and Reordering	Raymond Pan Anika Cheerla	An Experimental Analysis of a Compact Graph Representation Slides CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression Slides	Compact Representations of Separable Graphs Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable Graph Compression Graph Partitioning and Reordering
Thursday 4/4	Compression	Xander Morgan Joseph Zhang	Decoding billions of integers per second through vectorization Slides Techniques for Inverted Index Compression Slides	An Experimental Study of Bitmap Compression vs. Inverted List Compression Stream VByte: Faster byte-oriented integer compression Roaring bitmaps: Implementation of an optimized software library SIMD compression and the intersection of sorted integers A General SIMD-Based Approach to Accelerating Compression Algorithms Lightweight Data Compression Algorithms: An Experimental Survey Compact inverted index storage using general-purpose compression libraries An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
Tuesday 4/9	Dynamic Graph Algorithms	Ian Gatlin Kasra Mazaheri	A New Parallel Algorithm for Connected Components in Dynamic Graphs Slides Work-Efficient Parallel Union-Find Slides	Dynamic Graph Algorithms Streaming Graph Frameworks
Thursday 4/11	Stencil Computations	Vishaal Ram Alex Wang	Cache oblivious stencil computations Slides Fast Stencil Computations using Fast Fourier Transforms Slides	The cache complexity of multithreaded cache oblivious algorithms The pochoir stencil compiler
Tuesday 4/16	Aggregation and Sorting Mid-term report due today	Ian Gatlin Jonathan Li	Efficient Sorting, Duplicate Removal, Grouping, and Aggregation Slides Parallel Integer Sort: Theory and Practice Slides	A Top-Down Parallel Semisort Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited Main-Memory Hash Joins on Modern Processor Architectures Sort vs. Hash Join Revisited for Near-Memory Execution Many-query join: efficient shared execution of relational joins on modern hardware An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory Optimal and sublogarithmic time randomized parallel sorting algorithms
Thursday 4/18	Static and Streaming Graph Processing	Julian Shun	GraphIt: A High-Performance DSL for Graph Analytics Low-Latency Graph Streaming Using Compressed Purely-Functional Trees Slides	Streaming Graph Frameworks PaC-trees: supporting parallel and compressed purely-functional collections
Tuesday 4/23	Concurrent Algorithms	Yuanhao Wei	Art of Multiprocessor Programming Chapter 3.1-3.6, Chapter 9 Slides	Linearizability: A Correctness Condition for Concurrent Objects A Lazy Concurrent List-Based Set Algorithm A Pragmatic Implementation of Non-Blocking Linked-Lists Non-blocking binary search trees
Thursday 4/25	Sparse Computations on GPUs	Changwan Hong	Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU spECK: Accelerating GPU Sparse Matrix-matrix Multiplication through Lightweight Analysis Slides	Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication yaSpMV: yet another SpMV framework on GPUs Merge-Based Parallel Sparse Matrix-Vector Multiplication Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication Adaptive Sparse Matrix-Matrix Multiplication on the GPU
Tuesday 4/30	Clustering	Julian Shun	Theoretically-Efficient and Practical Parallel DBSCAN Parallel Index-Based Structural Graph Clustering and Its Approximation Slides	Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain
Thursday 5/2	Nearest Neighbors		Parallel Nearest Neighbors in Low Dimensions with Batch Updates DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node	Scaling Graph-Based ANNS Algorithms to Billion-Size Datasets: A Comparative Analysis Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
Tuesday 5/7	Graph Neural Networks		Scalable and Efficient Full-Graph GNN Training for Large Graphs MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks	Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning LightNE: A Lightweight Graph Processing System for Network Embedding Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining Marius: Learning Massive Graph Embeddings on a Single Machine ThunderRW: An In-Memory Graph Random Walk Engine Random Walks on Huge Graphs at Cache Efficiency Graph Representation Learning Book CS224W: Machine Learning with Graphs (Stanford course)
Thursday 5/9	Batch-Dynamic Algorithms		Parallel Batch-Dynamic Algorithms for k-Core Decomposition and Related Graph Problems	Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing
Tuesday 5/14	Project Presentations Final report due today

Course Policy

This is a graduate-level course where we will cover research in algorithm engineering. Advanced undergraduates may enroll if they have taken 6.122 (6.046) and 6.106 (6.172). The course units are 3-0-9.

Assignments will be posted and submitted on Canvas.

Lectures

For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.

Grading

The assignments consist of answering questions about papers, writing paper reviews, completing a problem set, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Reviews	20%
Problem Set	10%
Paper Presentations	15%
Research Project	45%
Class Participation	10%

You must complete all assignments to pass the class.

Paper Reviews and Problem Set

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. In addition, students are required to submit one paper review before every lecture, due at 10am on the day of the lecture. The review should be on a paper chosen from any of the papers under "Required Reading" for the lecture.

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of. The length should be 3-4 paragraphs. You should write the review in your own words. Please do not copy text from the paper, your classmates, or ChatGPT.

Finally, there will be a problem set on parallel algorithms to be released in the second week of the semester and due at 11:59pm on Monday 3/4.

Paper Presentations

Students will give presentations on assigned papers during the semester. These presentations should be 25-30 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical results and proofs, experimental results, and existing work. Theoretical proofs may be presented on the board. The presentation should also include any relevant content from related work that would be needed to fully understand the paper being presented. The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient algorithms; analyzing and optimizing the performance of existing algorithm implementations; designing and implementing new algorithms that are theoretically and/or practically efficient; applying algorithms in the context of larger applications; and improving or designing new programming frameworks/systems for classes of algorithms. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. There should be an implementation component to the project. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 1-3 people and consist of a proposal, mid-term report, final presentation, and final report. The timeline for the project is as follows.

Assignment	Due Date
Pre-proposal Meeting	3/5
Proposal	3/15
Weekly Progress Reports	3/22, 4/5, 4/12, 4/19, 4/26, 5/3, 5/10
Mid-term Report	4/16
Project Presentations	5/14
Final Report	5/14

Pre-proposal meeting: You will schedule a 15 minute meeting with the instructor to discuss what you would like to propose. Feedback will be given to be incorporated into the proposal.
Proposal: The proposal should be about 2 pages long (excluding figures and references) and will describe the project that you are proposing to work on, the main components of the project, as well as a projected weekly schedule of what you plan to accomplish throughout the semester.
Weekly progress report: You will submit a weekly progress report due at 11:59pm on Friday, starting 3/22. This should be a few sentences (or longer) describing your progress on the project during the week, and any issues that you encountered. Each student will submit an individual progress report.
Mid-term report: The mid-term report will talk about what you have accomplished so far, a breakdown of the contribution among group members so far, any obstacles you encountered, any changes to the proposed tasks, and a schedule of the remaining work to be done. This should be about 6 pages long (excluding figures and references).
Final presentation: We will have final project presentations, where you will describe your research to the instructor and classmates, and learn about other projects. This may be held in the form of a poster session, depending on enrollment.
Final report: The final report will be in the style of a research paper describing your project. It should include an abstract summarizing the project, an introduction describing and motivating the problem, a brief discussion of related work, a brief overview of any background knowledge needed to understand the paper, followed by your contributions. It should also discuss any open problems or directions for further work, and include a breakdown of work among group members. The report should be about 10 pages long (excluding figures and references).

Piazza

We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper reviews, problem set, and the project reports.

Computing Resources

You can obtain student credits on a cloud computing provider, such as Google Cloud Platform.

Useful Resources

Introduction to Algorithms, 4th Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

A list of papers related to graph analytics