6.886 Graph Analytics Spring 2018

Basic Information

Instructor: Prof. Julian Shun
TA: Sherry (Mengjiao) Yang
Schedule: Wednesdays and Fridays 2:30-4pm in room 56-154
Office hours: By appointment
Emails: jshun AT mit.edu, mengjiao AT mit.edu (please try to use Piazza)
Units: 3-0-9
Prerequisites: 6.046, 6.172

Course Description

This subject qualifies as a Computer Systems concentration subject.

Graph analytics have applications in a variety of domains, such as social network and Web analysis, computational biology, machine learning, and computer networking. This course will cover research topics in graph analytics including algorithms, optimizations, frameworks, and applications. Students will learn about both the theory and practice of designing efficient graph algorithms (parallel, cache-efficient, external-memory, etc.). We will also study design choices in high-level graph processing frameworks. Students will read and present research papers, and also complete a research project. This course is suitable for graduate students or advanced undergraduates who have taken 6.046 and 6.172/6.871.

Skip to Course Policy

Announcements

Schedule (tentative)

DateTopicSpeakerRequired ReadingOptional Reading
Wednesday 2/7Course Introduction and Graph AlgorithmsJulian ShunReview CLRS Chapters 22-24

Chapter 1-2 of Networks, Crowds, and Markets

Slides
Algorithm Engineering - An Attempt at a Definition

The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing
Friday 2/9Parallel AlgorithmsJulian ShunParallel Algorithms

CLRS Chapter 27

Slides
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques

Scheduling Multithreaded Computations by Work Stealing

Thread Scheduling for Multiprogrammed Multiprocessors

Provably Efficient Scheduling for Languages with Fine-Grained Parallelism

Prefix Sums and Their Applications

Book: Introduction to Parallel Algorithms by Joseph JaJa
Wednesday 2/14Real-World and Synthetic GraphsAndrew Xia
Joana M. F. da Trindade
Yijiang Huang
Chapters 13, 18, and 20 of Networks, Crowds, and Markets Slides

The Graph Structure in the Web - Analyzed on Different Aggregation Levels* Slides

Kronecker Graphs: An Approach to Modeling Networks* Slides
Graph structure in the Web

Power laws and the AS-level internet topology

R-MAT: A Recursive Model for Graph Mining

Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications

Statistical mechanics of complex networks

Collective dynamics of 'small-world' networks

The Small-World Phenomenon: An Algorithmic Perspective

Four Degrees of Separation
Friday 2/16Parallel Graph TraversalJason Priest
Joana M. F. da Trindade
Victor Ying
Direction-Optimizing Breadth-First Search* Slides

A Faster Algorithm for Betweenness Centrality Slides

The More the Merrier: Efficient Multi-Source Graph Traversal* Slides
A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers)

Internally Deterministic Parallel Algorithms Can Be Fast

SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Chapter 3.6 of Networks, Crowds, and Markets (describes Betweenness Centrality with an example)

Better Approximation of Betweenness Centrality

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation

Fast approximation of betweenness centrality through sampling

Scalable Betweenness Centrality Maximization via Sampling

Articulation Points Guided Redundancy Elimination for Betweenness Centrality

Betweenness Centrality: Algorithms and Implementations

Wednesday 2/21PageRank, Connectivity, and Triangle ComputationsBishesh Khadka
Julian Shun
Chapter 14 of Networks, Crowds, and Markets Slides

A Simple and Practical Linear-Work Parallel Algorithm for Connectivity* Slides

Multicore Triangle Computations Without Tuning* Slides
A Survey of PageRank Computing

Mining the link structure of the World Wide Web

Parallel Graph Decompositions Using Random Shifts

Improved Parallel Algorithms for Spanners and Hopsets

Algorithmic Aspects of Triangle-Based Network Analysis

Triangle Listing Algorithms: Back from the Diversion

Cache-Oblivious Algorithms

Cache-Oblivious Algorithms and Data Structures

Scheduling Irregular Parallel Computations on Hierarchical Caches

The Input/Output Complexity of Triangle Enumeration

I/O-efficient join dependency testing, Loomis-Whitney join, and triangle enumeration
Friday 2/23Graph Processing FrameworksBrian Wheatman
Hyun Ryong Lee
Omar Obeya
Pregel: A System for Large-Scale Graph Processing* Slides

GraphLab: A New Framework For Parallel Machine Learning* Slides

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs* Slides
Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing (Survey)

Big Graph Analytics Platforms (Survey)

Parallel Graph Analytics

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis

Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation

An Experimental Comparison of Pregel-like Graph Processing Systems

Architectural Implications on the Performance and Cost of Graph Analytics Systems

An Evaluation and Analysis of Graph Processing Frameworks on Five Key Issues

The Parallel BGL: A Generic Library for Distributed Graph Computations

Signal/Collect: Graph Algorithms for the (Semantic) Web

GPS: A Graph Processing System

Simplifying Scalable Graph Processing with a Domain-Specific Language

Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud

PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs

Exploring the Hidden Dimension in Graph Processing

GraphX: Graph Processing in a Distributed Dataflow Framework

Asynchronous Large-Scale Graph Processing Made Easy

ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM

Efficient Processing of Large Graphs via Input Reduction

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics

KLA: A New Algorithmic Paradigm for Parallel Graph Computations

Computation and Communication Efficient Graph Processing with Distributed Immutable View

SYNC or ASYNC: Time to Fuse for Distributed Graph-Parallel Computation

Scaling Iterative Graph Computations with GraphMap

Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

From "Think Like a Vertex" to "Think Like a Graph"

NScale: neighborhood-centric large-scale graph analytics in the cloud

High Performance Graph Processing with Locality Oriented Design

TuX2: Distributed Graph Computation for Machine Learning

Latency-Tolerant Software Distributed Shared Memory

Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs

PrIter: A Distributed Framework for Prioritized Iterative Computations

Fast Iterative Graph Computation with Block Updates

Maiter: An Asynchronous Graph Processing Framework for Delta-Based Accumulative Iterative Computation

Parallelizing Sequential Graph Computations

GraM: Scaling Graph Computation to the Trillions

LazyGraph: Lazy Data Coherency for Replicas in Distributed Graph-Parallel Computation

One Trillion Edges: Graph Processing at Facebook-Scale

An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing

PGX.D: A Fast Distributed Graph Processing Engine

CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing

Zorro: zero-cost reactive failure recovery in distributed graph processing

PAGE: A Partition Aware Engine for Parallel Graph Computation

MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing

Vertexica: Your Relational Friend for Graph Analytics!

Graph Analytics using Vertica Relational Database

GraphiQL: A Graph Intuitive Query Language for Relational Databases

The case against specialized graph analytics engines
Wednesday 2/28Shared-Memory FrameworksJulian Shun
Jason Priest
Chapters 7-8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable* Slides

Scalability! But at what COST? Slides

EmptyHeaded: A Relational Engine for Graph Processing* Slides
Green-Marl: A DSL for Easy and Efficient Graph Analysis

A Lightweight Infrastructure for Graph Analytics

GraphMat: High performance graph analytics made productive

Ringo: Interactive Graph Analytics on Big-Memory Machines

Graph Analytics Through Fine-Grained Parallelism

Software and Algorithms for Graph Queries on Multithreaded Architectures

SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks

Everything you always wanted to know about multicore graph processing but were afraid to ask

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

Making Pull-Based Graph Processing Performant

Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines

Using Domain-Specific Languages For Analytic Graph Databases
Friday 3/2Disk-based FrameworksGuest lecture by Sang-Woo Jun GraphChi: Large-Scale Graph Computation on Just a PC*

MOSAIC: Processing a Trillion-Edge Graph on a Single Machine*

BigSparse: High-performance external graph analytics*

Slides
X-Stream: Edge-centric Graph Processing using Streaming Partitions

TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC

MMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping

PrefEdge: SSD Prefetcher for Large-Scale Graph Traversal

PathGraph: A Path Centric Graph Processing System

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning

VENUS: A System for Streamlined Graph Computation on a Single PC

NXgraph: An Efficient Graph Processing System on a Single Machine

Chaos: Scale-out graph processing from secondary storage

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O

Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing

Graphene: Fine-Grained IO Management for Graph Computing

Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing

Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code

Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System

Algorithms and Data Structures for External Memory

I/O-Complexity of Graph Algorithms

External-Memory Graph Algorithms

A Functional Approach to External Graph Algorithms

Beyond Synchronous: New Techniques for External-Memory Graph Connectivity and Minimum Spanning Forest

Parallel External Memory Graph Algorithms

Efficient Parallel and External Matching
Wednesday 3/7No lectureUse lecture time and classroom to meet with project groups
Friday 3/9Streaming AlgorithmsOmar Obeya
Brian Wheatman
Endrias Kahssay
A New Parallel Algorithm for Connected Components in Dynamic Graphs* Slides

Work-Efficient Parallel Union-Find* Slides

Faster Betweenness Centrality Updates in Evolving Networks* Slides
STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation

STINGER: High Performance Data Structure for Streaming Graphs

Static and dynamic parallel computation of connected components

Sparsification-A Technique for Speeding Up Dynamic Graph Algorithms

Randomized Fully Dynamic Graph Algorithms with Polylogarithmic Time per Operation

Faster Worst Case Deterministic Dynamic Connectivity

QUBE: a Quick algorithm for Updating BEtweenness centrality

A Fast Algorithm for Streaming Betweenness Centrality

Incremental Algorithm for Updating Betweenness Centrality in Dynamically Growing Networks

Scalable Online Betweenness Centrality in Evolving Graphs

Fully Dynamic Betweenness Centrality Maintenance on Massive Networks

Approximating Betweenness Centrality in Fully Dynamic Networks

Fully Dynamic Betweenness Centrality

ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

An Experimental Analysis of Self-Adjusting Computation
Wednesday 3/14Project pre-proposal meeting in 32-G736
Friday 3/16Linear Algebra and Graphs

(Project proposal due today. Due 3/19 with Comm Lab appointment.)
Guest lecture by Jeremy Kepner (MIT Lincoln Laboratory) Mathematical Foundations of the GraphBLAS*

Enabling Massive Deep Neural Networks with the GraphBLAS*

GraphMat: High performance graph analytics made productive*

The Combinatorial BLAS: design, implementation, and applications

A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs

GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms

PEGASUS: Mining Peta-Scale Graphs

Graphulo: Linear Algebra Graph Kernels for NoSQL Databases

Some papers on sparse matrix storage formats and optimizing sparse matrix-vector multiply (SpMV)

Design of the GraphBLAS API for C

GBTL-CUDA: Graph Algorithms and Primitives for GPUs

Graph Algorithms in the Language of Linear Algebra

Graph BLAS Website
Wednesday 3/21Streaming FrameworksEdward Fan
Endrias Kahssay
Edward Park
LLAMA: Efficient graph analytics using Large Multiversioned Arrays* Slides

GraphIn: An Online High Performance Incremental Graph Processing Framework* Slides

KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations* Slides
STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation

STINGER: High Performance Data Structure for Streaming Graphs

DISTINGER: A Distributed Graph Data Structure for Massive Dynamic Graph Processing

On Querying Historical Evolving Graph Sequences

Naiad: A Timely Dataflow System

Differential dataflow

Distributed Evaluation of Subgraph Queries Using Worst-case Optimal Low-Memory Dataflows

Facilitating Real-Time Graph Mining

Kineograph: Taking the Pulse of a Fast-Changing and Connected World

Tornado: A System For Real-Time Iterative Analysis Over Evolving Data

Chronos: a graph engine for temporal graph analysis

Time-Evolving Graph Processing at Scale

Real-time Analytics for Fast Evolving Social Graphs

Synergistic Analysis of Evolving Graphs

Towards Large-Scale Graph Stream Processing Platform

ImmortalGraph: A System for Storage and Analysis of Temporal Graphs

Efficient Snapshot Retrieval over Historical Graph Data

Storing and Analyzing Historical Graph Data at Scale

Towards a Distributed Large-Scale Dynamic Graph Data Store

GraphJet: Real-Time Content Recommendations at Twitter
Friday 3/23LocalityYunming Zhang Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server*

Making Caches Work for Graph Analytics*

Reducing Pagerank Communication via Propagation Blocking*

Slides
Optimizing Indirect Memory References with milk

Parallel Graph Processing: Prejudice and State of the Art
Wednesday 3/28Spring break!
Friday 3/30Spring break!
Wednesday 4/4Locality IISherry (Mengjiao) Yang Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask*

Gemini: A Computation-Centric Distributed Graph Processing System*

GraphGrind: Addressing Load Imbalance of Graph Partitioning*

Slides
NUMA-Aware Graph-Structured Analytics

Friday 4/6CompressionEdward Park
Helen Xu
An Experimental Analysis of a Compact Graph Representation* Slides

The WebGraph Framework I: Compression Techniques* Slides

Compressed representations for web and social graphs* Slides
Compact Representations of Separable Graphs

Chapter 8 of Shared-Memory Parallelism Can Be Simple, Fast, and Scalable

The WebGraph Framework II: Codes For The World Wide Web

Fast and Compact Web Graph Representations

Speeding up Algorithms on Compressed Web Graphs

Tight and simple Web graph compression for forward and reverse neighbor queries

Exploiting Computation-Friendly Graph Compression Methods for Adjacency-Matrix Multiplication

Graph Summarization Methods and Applications: A Survey
Wednesday 4/11Graph Partitioning and ReorderingYijiang Huang
Victor Ying
Bishesh Khadka
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs* Slides

Compressing Graphs and Indexes with Recursive Graph Bisection* Slides

Speedup Graph Processing by Graph Ordering* Slides
Chapter 3 of Networks, Crowds, and Markets

Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks

On Compressing Social Networks

SlashBurn: Graph Compression and Mining beyond Caveman Communities

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations

ReCALL: Reordered Cache Aware Locality based Graph Processing

Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis

A Multilevel Algorithm for Partitioning Graphs

Recent Advances in Graph Partitioning (Survey of graph partitioning methods)

A Tutorial on Spectral Clustering

Spectral partitioning with multiple eigenvectors

Weighted Graph Cuts without Eigenvectors: A Multilevel Approach

A Local Clustering Algorithm for Massive Graphs and its Application to Nearly Linear Time Graph Partitioning

Geometry, Flows, and Graph-Partitioning Algorithms

METIS Software and Publications

Karlsruhe High Quality Partitioning Software and Publications
Friday 4/13More Parallel Graph Algorithms

(Project mid-term report due today. Due 4/16 with Comm Lab appointment.)
Julian Shun An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs* Slides

Greedy Sequential Maximal Independent Set and Matching are Parallel on Average* Slides
ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs

HADI: Mining Radii of Large Graphs

HyperANF: Approximating the Neighbourhood Function of Very Large Graphs on a Budget

Computing the Eccentricity Distribution of Large Graphs

Fast Approximation Algorithms for the Diameter and Radius of Sparse Graphs

Better Approximation Algorithms for the Graph Diameter

A Simple Parallel Algorithm for the Maximal Independent Set Problem

Tight Analysis of Parallel Randomized Greedy MIS

Efficient Parallel and External Matching

Notes on simple analysis of parallel maximal independent set and maximal matching algorithms
Wednesday 4/18Subgraph findingBishesh Khadka
Hyun Ryong Lee
Omar Obeya
DualSim: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine* Slides

ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph* Slides

ESCAPE: Efficiently Counting All 5-Vertex Subgraphs* Slides
Biological network motif detection: principles and practice

Network Motifs: Simple Building Blocks of Complex Networks

Biomolecular network motif counting and discovery by color coding

Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking

MODA: an efficient algorithm for network motif discovery in biological networks

NeMoFinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs

Efficient sampling algorithm for estimating sub-graph concentrations and detecting network motifs

Efficient detection of network motifs

Kavosh: a new algorithm for finding network motifs

Efficient Subgraph Frequency Estimation with G-Tries

Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks

Parallel discovery of network motifs

Network Motif Discovery: A GPU Approach

EmptyHeaded: A Relational Engine for Graph Processing

Arabesque: A System for Distributed Graph Mining

Skew Strikes Back: New Developments in the Theory of Join Algorithms

Distributed Evaluation of Subgraph Queries Using Worst-case Optimal Low-Memory Dataflows

TurboISO: Towards UltraFast and Robust Subgraph Isomorphism Search in Large Graph Databases

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Distributed Estimation of Graph 4-Profiles

Estimation of Local Subgraph Counts

Counting Graphlets: Space vs Time

Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts

Graphlet decomposition: framework, algorithms, and applications

Mining Graphlet Counts in Online Social Networks

Scalable Distributed Subgraph Enumeration

GraphQ: Graph Query Processing with Abstraction Refinement-Scalable and Programmable Analytics over Very Large Graphs on a Single PC

GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph

Parallel Subgraph Listing in a Large-Scale Graph

Parallel Subgraph Counting for Multicore Architectures

SAHAD: Subgraph Analysis in Massive Networks Using Hadoop

Complex Network Analysis using Parallel Approximate Motif Counting

Shared Memory Parallel Subgraph Enumeration

A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph

Subgraph Matching: on Compression and Computation

A survey of frequent subgraph mining algorithms (Survey)

A Survey of Algorithms for Dense Subgraph Discovery (Survey)

The K-clique Densest Subgraph Problem

On Finding Dense Subgraphs

Denser than the Densest Subgraph: Extracting Optimal Quasi-Cliques with Quality Guarantees

Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling

More dense subgraph discovery papers
Friday 4/20Clustering and Community DetectionJulian Shun Ch 10.2-10.5 of Mining Massive Data Sets

Parallel Local Graph Clustering* Slides

Community detection in graphs (Survey of methods)

Community structure in social and biological networks

Modularity and community structure in networks

Uncovering the overlapping community structure of complex networks in nature and society

Fast unfolding of communities in large networks

On Modularity Clustering

Fast algorithm for detecting community structure in networks

Community detection algorithms: a comparative analysis

Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

Empirical Comparison of Algorithms for Network Community Detection

Think locally, act locally: Detection of small, medium-sized, and large communities in large networks

A Local Clustering Algorithm for Massive Graphs and its Application to Nearly Linear Time Graph Partitioning

Local Graph Partitioning using PageRank Vectors

Scalable Motif-aware Graph Clustering

Local Higher-Order Graph Clustering

An Optimization Approach to Locally-Biased Graph Algorithms

A Simple and Strongly-Local Flow-Based Method for Cut Improvement

Flow-Based Algorithms for Local Graph Clustering

A Local Algorithm for Finding Well-Connected Clusters

Heat Kernel Based Community Detection

Higher-order organization of complex networks

Graph Clustering Via a Discrete Uncoupling Process

Scalable Discovery of Best Clusters on Large Graphs

Community detection in large-scale networks: a survey and empirical evaluation

Local Search of Communities in Large Graphs
Wednesday 4/25Graph StoresEndrias Kahssay
Andrew Xia
Joana M. F. da Trindade
TAO: Facebook's Distributed Data Store for the Social Graph* Slides

Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration* Slides

Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data* Slides
RDF in the clouds: a survey (Survey)

A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data (Survey)

Survey of Graph Database Models (Survey)

ZipG: A Memory-efficient Graph Store for Interactive Queries

Trinity: A Distributed Graph Engine on a Memory Cloud

A Distributed Graph Engine for Web Scale RDF Data

Hexastore: Sextuple Indexing for Semantic Web Data Management

Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps

Managing Large Graphs on Multi-Cores With Graph Awareness

G-SQL: Fast Query Processing via Graph Exploration

A General-Purpose Query-Centric Framework for Querying Big Graphs

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

gStore: a graph-based SPARQL query engine

G-Store: High-Performance Graph Store for Trillion-Edge Processing

Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs

Scalable SPARQL Querying of Large RDF Graphs

S2RDF: RDF Querying with SPARQL on Spark

Combining Vertex-Centric Graph Processing with SPARQL for Large-Scale RDF Data Analytics

Processing SPARQL queries over distributed RDF graphs

EAGRE: Towards Scalable I/O Efficient SPARQL Query Evaluation on the Cloud

Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning

Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning

The RDF-3X engine for scalable management of RDF data

Cypher: An Evolving Query Language for Property Graphs

PGQL: a Property Graph Query Language

Graphs-at-a-time: Query Language and Access Methods for Graph Databases

T-SPARQL: A TSQL2-Like Temporal Query Language for RDF

The G* graph database: efficiently managing large distributed dynamic graphs

Fast In-Memory SQL Analytics on Typed Graphs

TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing

Unicorn: A System for Searching the Social Graph

GraphCache: A Caching System for Graph Queries

H2RDF+: High-performance Distributed Joins over Large-scale RDF Graphs

Graph-Aware, Workload-Adaptive SPARQL Query Caching

Scalable Join Processing on Very Large RDF Graphs

Building an Efficient RDF Store Over a Relational Database

SQLGraph: An Efficient Relational-Based Property Graph Store

TripleBit: a Fast and Compact System for Large Scale RDF Data

Matrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data

Towards Effective Partition Management for Large Graphs

Taming Subgraph Isomorphism for RDF Query Processing

EmptyHeaded: A Relational Engine for Graph Processing An Analytics-Aware Conceptual Model For Evolving Graphs

Friday 4/27GPUBrian Wheatman
Edward Fan
Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication* Slides

A Distributed Multi-GPU System for Fast Graph Processing* Slides

Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing* Slides
Graph Processing on GPUs: A Survey (Survey of GPU graph processing)

Gunrock: GPU Graph Analytics

Multi-GPU Graph Analytics

Puffin: Graph Processing System on Multi-GPUs

Medusa: Simplified Graph Processing on GPUs

MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs

CuSha: Vertex-Centric Graph Processing on GPUs

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

Efficient graph computation on hybrid CPU and GPU systems

Optimizing Graph Processing on GPUs

Graph Processing on GPUs: Where are the Bottlenecks?

Performance Characterization of High-Level Programming Models for GPU Graph Analytics

GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs

Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

GBTL-CUDA: Graph Algorithms and Primitives for GPUs

LightHouse: An Automatic Code Generator for Graph Algorithms on GPUs

GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems

EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU

cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs

Autonomous, Independent Management of Dynamic Graphs on GPUs

Accelerating Dynamic Graph Analytics on GPUs

Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU
Wednesday 5/2Graph ColoringGuest lecture by Tim Kaler Ordering heuristics for parallel graph coloring*

Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling*
A Parallel Graph Coloring Heuristic

Scalable parallel graph coloring algorithms

A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers

Parallel Graph Coloring for Manycore Architectures

Algorithms for Balanced Graph Colorings with Applications in Parallel Computing

Effective and Efficient Dynamic Graph Coloring
Friday 5/4Graph Decomposition and PrefetchingJulian Shun
Hyun Ryong Lee
Fast algorithms for determining (generalized) core groups in social networks*

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing* Slides

Graph Prefetching Using Data Structure Knowledge* Slides
K-Core Decomposition of Large Networks on a Single PC

Incremental k-core decomposition: algorithms and evaluation

Patterns and Anomalies in k-Cores of Real-World Graphs with Applications

I/O Efficient Core Graph Decomposition at Web Scale

Truss Decomposition in Massive Networks

Truss decomposition on shared-memory parallel systems

Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs

Parallel Local Algorithms for Core, Truss, and Nucleus Decompositions

The k-peak Decomposition: Mapping the Global Structure of Graphs

Novel Approaches for Analyzing Biological Networks

When Engagement Meets Similarity: Efficient (k,r)-Core Computation on Social Networks

Density-friendly graph decomposition

IMP: Indirect Memory Prefetcher

Software Prefetching for Indirect Memory Accesses

An Event-Triggered Programmable Prefetcher for Irregular Workloads
Wednesday 5/9Parallel Shortest Paths and Priority Queues Guest lecture by Justin Kopinsky Delta-stepping: a parallelizable shortest path algorithm*

The SprayList: A Scalable Relaxed Priority Queue*
Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

Parallel Shortest Paths Using Radius Stepping

An Experimental Study of a Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances

DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

A Randomized Parallel Algorithm for Single-Source Shortest Path

Randomized Speedup of the Bellman-Ford Algorithm

A Parallel Priority Queue with Constant Time Operations

Skiplist-Based Concurrent Priority Queues

Mounds: Array-Based Concurrent Priority Queues

A Skiplist-Based Concurrent Priority Queue with Minimal Memory Contention

The Adaptive Priority Queue with Elimination and Combining

A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms

CBPQ: High Performance Lock-Free Priority Queue

MultiQueues: Simpler, Faster, and Better Relaxed Concurrent Priority Queues

Priority Queues Are Not Good Concurrent Priority Schedulers

The Power of Choice in Priority Scheduling
Friday 5/11ArchitectureGuest lectures by Mark Jeffrey and Anurag Mukkara A Scalable Architecture for Ordered Parallelism*

Cache-Guided Scheduling: Exploiting Caches to Maximize Locality in Graph Processing*
Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics

Novel Graph Processor Architecture, Prototype System, and Results

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

GraphOps: A Dataflow Library for Graph Analytics Acceleration

FPGP: Graph Processing Framework on FPGA

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing

GraphR: Accelerating Graph Processing Using ReRAM

GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition

Energy Efficient Architecture for Graph Analytics Accelerators

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

GraphGen: An FPGA Framework for Vertex-Centric Graph Computation

High-throughput and Energy-efficient Graph Processing on FPGA

Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism

Data-Centric Execution of Speculative Parallel Programs

SAM: optimizing multithreaded cores for speculative parallelism

Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform

OSCAR: Optimizing SCrAtchpad Reuse for Graph Processing
Monday 5/14Poster Session

Course Policy

This is a graduate-level course where we will cover the latest research on graph analytics. Advanced undergraduates may enroll if they have taken 6.046 and 6.172. The course units are 3-0-9.

Lectures

For each lecture we will study several research papers; for each paper one student will give a presentation, which will be followed by a discussion.

Grading

The assignments consist of writing weekly paper reviews, doing several paper presentations, class participation, and completing an open-ended research project. The grading breakdown is as follows.

Grading Breakdown
Paper Questions and Reviews20%
Paper Presentations25%
Research Project50%
Class Participation5%

Paper Questions and Reviews

Prior to each lecture, you are expected to read the papers under "Required Reading" in the schedule for that lecture. Students are required to answer a posted question about each paper, due at 12:00pm on the day of the lecture.

In addition, students are required to submit one paper review every week, due 11:59pm on Tuesdays. The review should be on a paper chosen from any of the starred papers (*) under "Required Reading" for the two lectures the week (i.e., the Wednesday and Friday immediately after the due date).

The review should first describe the problem the paper is trying to solve, why it is important, the main ideas proposed, and the results obtained. The review should then describe how the ideas are novel compared to existing work at the time, the strengths and weaknesses of the paper, and any ideas you may have for improving the techniques and/or evaluation criteria, and any open problems or directions for further work that you can think of.

The answers to the paper questions as well as the paper reviews will be submitted on Learning Modules. The paper reviews will be made visible after each submission deadline, and you are encouraged to read other reviews to improve your understanding and to prepare for the class discussion.

Paper Presentations

Students will give several presentations on assigned papers throughout the semester. These presentations should be 20 minutes long with slides, followed by a discussion. The presentation should discuss the motivation for the problem being solved, any definitions needed to understand the paper, key technical ideas in the paper, theoretical and/or experimental results, and existing work at the time (doing a literature search and including a discussion of recent related work would be a bonus). The presenter should also present his or her own thoughts on the paper (perceived strengths and weaknesses, directions for future work, etc.), and pose several questions for discussion. The presenter will be expected to lead the discussion on the paper.

Research Project

A large part of the work in this course is in proposing and completing an open-ended research project. This will give you a taste of doing research in the area of graph analytics. The project may involve (but is not limited to) any of the following tasks: implementation of non-trivial and theoretically-efficient graph algorithms; analyzing and optimizing the performance of existing graph codes; designing new graph algorithms that are theoretically and/or practically efficient; applying graph algorithms in the context of larger applications, e.g., in social network and Web analysis, machine learning, or computational biology; exploring new problems for analyzing the structure of graphs; improving or designing new graph processing frameworks/systems; and conducting a survey of a topic. The project may explore parallelism, cache-efficiency, I/O-efficiency, and memory-efficiency. The project can be related to research that you are currently doing, subject to instructor approval.

The project will be done in groups of 2-3 people and consist of a proposal, mid-term report, poster presentation, and final report. The timeline for the project is as follows.

AssignmentDue Date
Pre-proposal meeting3/14
Proposal3/16
Mid-term Report4/13
Poster Session5/14
Final Report5/17

Piazza

We will be using Piazza for answering questions. You may publicly post questions about the papers or any cool ideas that you have about graph analytics. Discussions among students is encouraged. If possible, please use Piazza instead of emailing the course staff. You may use a private note if you want your post to be visible only to the course staff.

Collaboration Policy

You are welcome to read the papers with other students, but all of your written work should be your own. You should carefully acknowledge any collaborators and any outside sources you use to complete the paper questions, paper reviews, and the project reports. Please do not look at anyone else's answers before submitting your answers to the paper questions and your paper reviews.

Computing Resources

We will be using Amazon Web Services.

Textbooks

Some of the readings will be from the following textbooks.

Introduction to Algorithms, 3rd Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (CLRS)

Networks, Crowds, and Markets by David Easley and Jon Kleinberg

Some Other Courses on Graph Analytics

Networks (Daron Acemoglu and Asu Ozdaglar, MIT)
Analysis of Networks (Jure Leskovec, Stanford)
Networks (David Easley and Jon Kleinberg, Cornell)
Topics in Social Data (Johan Ugander, Stanford)
Network Theory (Mark Newman, University of Michigan)
Graphs and Networks (Dan Spielman, Yale)
Statistical Network Analysis (Jennifer Neville, Purdue)
Network Analysis and Modeling (Aaron Clauset, Sante Fe Institute)
Parallel Graph Analysis (George Slota, RPI)
Large-Scale Graph Mining (A. Erdem Sariyuce, University of Buffalo)
Mining Large-scale Graph Data (Danai Koutra, University of Michigan)
Data Mining meets Graph Mining (Leman Akoglu, Stony Brook)
Graphs and Networks (Charalampos Tsourakakis, Aalto University)
Large-Scale Graph Processing (Keval Vora, Simon Fraser University)