Shoaib Kamil

(skamil AT csail DOT mit DOT edu)

[Curriculum Vitae] [Research Statement] [Teaching Statement]

Research Interests

My research is primarily in the area of programming systems (compilers, languages, and runtimes) for efficient computation using domain-specific optimizations. I work on domain-specific languages, compilers, and infrastructure that makes it easier to build DSL compilers, as well as new strategies for integrating domain-specific knowledge into general-purpose compilers.

I am currently a principal research scientist in the Creative Intelligence Lab at Adobe Research.

Previously, I was a research scientist at MIT CSAIL, working on the D-TEC X-STACK project and with Prof. Saman Amarasinghe and Prof. Armando Solar Lezama.

I completed my PhD in December 2012, and was co-advised by Prof. Armando Fox and Prof. Kathy Yelick, working with the BeBOP Group in the Parallel Computing Laboratory. I was previously affiliated with the Future Technologies Group at LBNL.


taco - the Tensor Algebra COmpiler, a library and code generator for high performance tensor and linear algebra, for any mixture of sparse and dense formats.

Metalift - a framework for building verified lifting systems that translate from existing source code to domain-specific languages using a sound, synthesis-based technique.

Simit - a language for computing on sparse systems using linear algebra, combining graph operations and traditional linear algebra operations on meshes.

OpenTuner - an extensible framework building domain-specific multi-objective program auto-tuners, using customizable configuration representatives and ensembles of search techniques.

Asp (Asp is SEJITS for Python) - an implementation of Selective Embedded Just-in-Time Specialization for Python, which bridges the gap between productivity and performance using domain-specific embedded compilers. Asp's goal is to simplify the creation of DSLs in Python, and enable expert programmers in a domain (who are not language experts) to write DSLs or auto-tuned libraries appropriate for their domain. Current results show non-expert programmers can utilize these DSLs and auto-tuned libraries to meet or beat state-of-the-art hand-tuned low-level code, while still writing in a high-level productive language.

Stanza Triad - a modified version of STREAM Triad that tests the effectiveness of prefetch engines. Download v. 0.4

Stencil Probe - small easily-modifiable probe for simulating behavior of stencil applications. used as a testbed for evaluating optimizations for stencil codes.


6.005: Software Construction, Spring 2012. Co-lecturer with Saman Amarasinghe and Max Goldman.

CS169: Software Engineering, Fall 2010 (Instructor: Armando Fox)

CS267: Applications of Parallel Computers, Fall 2008 (Instructor: Horst Simon)

CS164: Compilers and Programming Languages, Fall 2002 (Instructor: Richard Fateman)

CS170: Efficient Algorithms and Intractable Problems, Spring 2001 (Instructors: James Demmel and Jonathan Shewchuk)


PhD Dissertation

Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
PhD Dissertation, EECS Dept, University of California, Berkeley (Tech Report EECS-2012-255), 2012

Peer-Reviewed Publications

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra
Ryan Senanayake, Changwon Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, Fredrik Kjolstad
OOPSLA, Proceedings of ACM Programming Languages, 2020

Verifying and Improving Halide’s Term Rewriting System with Program Synthesis
Julie Newcomb, Andrew Adams, Steven Johnson, Rastislav Bodik, Shoaib Kamil
OOPSLA, Proceedings of ACM Programming Languages, 2020

NASOQ: Numerically Accurate Sparsity-Oriented QP Solver
Kazem Cheshmi, Danny Kaufman, Shoaib Kamil, Maryam Mehri Dehnavi

Optimizing Ordered Graph Algorithms with Graphit
Yunming Zhang, Ajay Brahmakshatriya, Xinyi Chen, Laxman Dhulipala, Shoaib Kamil, Saman Amarasinghe, Julian Shun
Code Generation and Optimization (CGO), 2020

Automatically Translating Image Processing Libraries to Halide
Maaz Bin Safeer Ahmad, Jonathan Ragan-Kelley, Alvin Cheung, Shoaib Kamil
SIGGRAPH Asia, 2019

Modular Verification of Web Page Layout
Pavel Panchekha, Michael Ernst, Zachary Tatlock, Shoaib Kamil
OOPSLA, Proceedings of ACM Programming Languages, 2019

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, Saman Amarasinghe
Code Generation and Optimization (CGO), 2019

Tensor Algebra Compilation with Workspaces
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, Saman Amarasinghe
Code Generation and Optimization (CGO), 2019

GraphIt - A High-Performance Graph DSL
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, Saman Amarasinghe
OOPSLA, Proceedings of ACM Programming Languages, 2018

ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism
Kazem Cheshmi, Shoaib Kamil. Michelle Strout, Maryam Mehri Dehnavi
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2018

Verifying that Web Pages have Accessible Layout
Pavel Panchekha, Adam Geller, Michael Ernst, Zachary Tatlock, Shoaib Kamil
Programming Language Design and Implementation (PLDI), 2018

The Tensor Algebra Compiler
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe
OOPSLA, Proceedings of ACM Programming Languages (Distinguished Paper), 2017

Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis
Kazem Cheshmi, Shoaib Kamil, Michelle Strout, Maryam Mehri Dahnavi
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2017

Parallel Associative Reductions in Halide
Patricia Suriana, Andrew Adams, Shoaib Kamil
International Symposium on Code Generation and Optimization (CGO), 2017

Verified Lifting of Stencil Computations
Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, Armando Solar-Lezama
Programming Language Design and Implementation (PLDI), 2016

Simit: A Language for Physical Simulation
Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M. Kaufman, Gurtej Kanwar, Wojciech Matusik, Saman Amarasinghe
ACM Transactions on Graphics (TOG), 2016

Distributed Halide
Tyler Denniston, Shoaib Kamil, Saman Amarasinghe
Principles and Practice of Parallel Programming (PPoPP), 2016

Bridging the Gap Between General-Purpose and Domain-Specific Compilers with Synthesis
Alvin Cheung, Shoaib Kamil, Armando Solar-Lezama
Summit oN Advances in Programming Languages (SNAPL), 2015

Helium: Lifting High-Performance Stencil Kernels from Stripped x86 Binaries to Halide DSL Code
Charith Mendis, Jeffrey Bosboom, Kevin Lu, Shoaib Kamil, Jonathan Ragan-Kelly, Qin Zhao, Sylvain Paris, Saman Amarasinghe
Program Language Design and Implementation (PLDI), 2015

MSL: A Synthesis-Enabled Language for Distributed Implementations
Zhilei Xu, Shoaib Kamil, Armando Solar-Lezama
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2014

OpenTuner: An Extensible Framework for Program Autotuning
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Una-May O'Reilly, Saman Amarasinghe
Parallel Architectures and Compilation Techniques (PACT), 2014

Parallel Processing of Filtered Queries in Attributed Semantic Graphs
Adam Lugowski, Shoaib Kamil, Aydin Buluc, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John Gilbert
Journal of Parallel and Distributed Computing (JPDC), 2014

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, Omer Spillinger
International Parallel and Distributed Processing Symposium (IPDPS), 2013

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs
Aydin Buluc, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams
International Parallel and Distributed Processing Symposium (IPDPS), 2013

Auto-tuning the Matrix Powers Kernel with SEJITS
Jeffrey Morlan, Shoaib Kamil, Armando Fox
Seventh International Workshop on Automatic Performance Tuning (iWAPT), 2012

Parallel High Performance Statistical Bootstrapping in Python
Aakash Prasad, David Howard, Shoaib Kamil, Armando Fox
Scientific Computing with Python Conference, 2012

Portable Parallel Performance from Sequential, Productive, Embedded Domain Specific Languages
S. Kamil, D. Coetzee, S. Beamer, H. Cook, E. Gonina, J. Harper, J. Morlan, A. Fox
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Extended Abstract, 2012

Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization
Shoaib Kamil, Derrick Coetzee, Armando Fox
10th Python for Scientific Computing Conference, 2011

CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications
H. Cook, E. Gonina, S. Kamil, G. Friedland, D. Patterson, A. Fox
USENIX Workshop on Hot Topics in Parallelism (HotPar), 2011

Hardware/Software Co-design of Global Cloud System Resolving Models
M. F. Wehner, L. Oliker, J. Shalf, D. Donofrio, L. A. Drummond, R. Heikes, S. Kamil, C. Kono, N. Miller, H. Miura, M. Mohiyuddin, D. Randall, W.-S. Yang
Journal of Advances in Modeling Earth Systems, 2011

Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
G. Hendry, J. Chan, S. Kamil, L. Oliker, J. Shalf, L. P. Carloni, K. Bergman
IEEE Symposium on High Performance Interconnects (HOTI), 2011

An Auto-tuning Framework for Parallel Multicore Stencil Computations
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2010

SEJITS: Getting Productivity and Performance with Selective Embedded JIT Specialization
Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovic, James Demmel, Kurt Keutzer, John Shalf, Kathy Yelick, Armando Fox
Workshop on Programming Models for Emerging Architectures (PMEA), 2009

A Generalized Framework for Auto-tuning Stencil Computations
Shoaib Kamil, Cy Chan, Sam Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat
Cray User Group Conference, 2009
Best Paper Award

Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications
Gilbert Hendry, Shoaib Kamil, A. Biberman, J. Chan, B. Lee, M. Mohiyuddin, A. Jain, K. Bergman, L. Carloni, J. Kubiatowicz, L. Oliker, J. Shalf
International Symposium on Networks-on-Chip (NOCS), 2009

Communication Requirements and Interconnect Optimization for High-End Scientific Applications
Shoaib Kamil, Leonid Oliker, Ali Pinar, John Shalf
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
Kaushik Datta, Shoaib Kamil, Sam Williams, Leonid Oliker, John Shalf, Katherine Yelick
SIAM Review, 2009

Power Efficiency in High Performance Computing
Shoaib Kamil, John Shalf, Erich Strohmaier
International Parallel and Distributed Processing Symposium, 2008

Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs
Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz
High Performance Embedded Computing (HPEC), 2008

Reconfigurable Hybrid Interconnection for Static and Dynamic Scientific Applications
Shoaib Kamil, Ali Pinar, Daniel Gunter, Michael Lijewski, Leonid Oliker, John Shalf
ACM International Conference on Computing Frontiers, 2007

Scientific Application Performance on Candidate PetaScale Platforms
Leonid Oliker, Andrew Canning, Jonathan Carter, Costin Iancu, Michael Lijewski, Shoaib Kamil, John Shalf, H. Shan, Erich Strohmaier, Stephane Ethier, Tim Goodale
International Parallel and Distributed Processing Symposium (IPDPS), 2007
Best Paper Award

Scientific Computing Kernels on the Cell Processor
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick
International Journal of Parallel Programming (IJPP), 2007

Implicit and Explicit Optimizations for Stencil Computations
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, Katherine Yelick
Memory Systems Performance and Correctness (MSPC), 2006

The Potential of the Cell Processor for Scientific Computing
Sam Williams, John Shalf, Parry Husbands, Shoaib Kamil, Leonid Oliker, Katherine Yelick
ACM International Conference on Computing Frontiers, 2006

Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect
John Shalf, Shoaib Kamil, Leonid Oliker, David Skinner
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2005

Understanding Ultra-Scale Application Communication Requirements
Shoaib Kamil, Leonid Oliker, John Shalf, David Skinner
IEEE International Symposium on Workload Characterization (IISWC), 2005

Impact of Modern Memory Subsystems on Cache Optimizations for Stencil Computations
Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, Katherine Yelick
ACM SIGPLAN Workshop on Memory Systems Performance (MSP), 2005

Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply
Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, Benjamin Lee
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2002
Finalist, Best Student Paper

Automatic Performance Tuning and Analysis of Sparse Triangular Solve
Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, Katherine A. Yelick
Workshop on Performance Optimization of High-level Languages and Libraries (POHLL), 2002
Best Student Paper, Best Presentation

Other Publications

StencilMark: Towards a Benchmark for Stencil Computations
Shoaib Kamil
1st Workshop on Stencil Computations (WOSC), 2013

Ubiquitous Dynamic Code Generation and Compilation on Future Computing Devices
Shoaib Kamil and Armando Fox
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Provocative Ideas Session, 2012

Energy-Efficient Computing for Extreme Scale Science
David Donofrio, Leonid Oliker, John Shalf, Michael Wehner, Chris Rowen, Jens Krueger, Shoaib Kamil, Marghoob Mohiyuddin
IEEE Computer Magazine, 2009

Invited Talks

Computer-Aided Programming: Productivity and High Performance
Rutgers Department of Electrical and Computer Engineering Colloqium, 2015

Recent Results, Insights, and Lessons from Auto-tuning Three Motifs
Center for Scalable Application Development Software (CScADS), 2008

Bridging the Productivity-Performance Gap with Selective Embedded Just-in-Time Specialization
IEEE International Symposium on Embedded Multicore SoCs, 2012

SEJITS - Bridging the Productivity-Performance Gap
Workshop on Domain Specific Multicore Computing (DSMC) at ICCAD, 2012