Shoaib Kamil

(skamil AT csail DOT mit DOT edu)

[Curriculum Vitae]


Research Interests

Scientific computing, parallel programming languages, software synthesis, programming systems for parallel productive programming, software engineering, auto-tuning, embedded DSLs, power-efficient parallel computing, software as a service (SaaS)

I am currently a research scientist at MIT CSAIL, working on the D-TEC X-STACK project and with Prof. Saman Amarasinghe and Prof. Armando Solar Lezama.

I completed my PhD in December 2012, and was co-advised by Prof. Armando Fox and Prof. Kathy Yelick, working with the BeBOP Group in the Parallel Computing Laboratory. I was previously affiliated with the Future Technologies Group at LBNL.

Projects

OpenTuner - an extensible framework building domain-specific multi-objective program auto-tuners, using customizable configuration representatives and ensembles of search techniques.

Asp (Asp is SEJITS for Python) - an implementation of Selective Embedded Just-in-Time Specialization for Python, which bridges the gap between productivity and performance using domain-specific embedded compilers. Asp's goal is to simplify the creation of DSLs in Python, and enable expert programmers in a domain (who are not language experts) to write DSLs or auto-tuned libraries appropriate for their domain. Current results show non-expert programmers can utilize these DSLs and auto-tuned libraries to meet or beat state-of-the-art hand-tuned low-level code, while still writing in a high-level productive language.

Stanza Triad - a modified version of STREAM Triad that tests the effectiveness of prefetch engines. Download v. 0.4

Stencil Probe - small easily-modifiable probe for simulating behavior of stencil applications. used as a testbed for evaluating optimizations for stencil codes.

Teaching

6.005: Software Construction, Spring 2012. Co-lecturer with Saman Amarasinghe and Max Goldman.

CS169: Software Engineering, Fall 2010 (Instructor: Armando Fox)

CS267: Applications of Parallel Computers, Fall 2008 (Instructor: Horst Simon)

CS164: Compilers and Programming Languages, Fall 2002 (Instructor: Richard Fateman)

CS170: Efficient Algorithms and Intractable Problems, Spring 2001 (Instructors: James Demmel and Jonathan Shewchuk)

Publications

PhD Dissertation

Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
PhD Dissertation, EECS Dept, University of California, Berkeley (Tech Report EECS-2012-255), 2012

Peer-Reviewed Publications

MSL: A Synthesis-Enabled Language for Distributed Implementations
Zhilei Xu, Shoaib Kamil, Armando Solar-Lezama
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2014

OpenTuner: An Extensible Framework for Program Autotuning
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Una-May O'Reilly, Saman Amarasinghe
Parallel Architectures and Compilation Techniques (PACT), 2014

Parallel Processing of Filtered Queries in Attributed Semantic Graphs
Adam Lugowski, Shoaib Kamil, Aydin Buluc, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John Gilbert
Journal of Parallel and Distributed Computing (JPDC), 2014

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, Omer Spillinger
International Parallel and Distributed Processing Symposium (IPDPS), 2013

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs
Aydin Buluc, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams
International Parallel and Distributed Processing Symposium (IPDPS), 2013

Auto-tuning the Matrix Powers Kernel with SEJITS
Jeffrey Morlan, Shoaib Kamil, Armando Fox
Seventh International Workshop on Automatic Performance Tuning (iWAPT), 2012

Parallel High Performance Statistical Bootstrapping in Python
Aakash Prasad, David Howard, Shoaib Kamil, Armando Fox
Scientific Computing with Python Conference, 2012

Portable Parallel Performance from Sequential, Productive, Embedded Domain Specific Languages
S. Kamil, D. Coetzee, S. Beamer, H. Cook, E. Gonina, J. Harper, J. Morlan, A. Fox
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Extended Abstract, 2012

Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization
Shoaib Kamil, Derrick Coetzee, Armando Fox
10th Python for Scientific Computing Conference, 2011

CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications
H. Cook, E. Gonina, S. Kamil, G. Friedland, D. Patterson, A. Fox
USENIX Workshop on Hot Topics in Parallelism (HotPar), 2011

Hardware/Software Co-design of Global Cloud System Resolving Models
M. F. Wehner, L. Oliker, J. Shalf, D. Donofrio, L. A. Drummond, R. Heikes, S. Kamil, C. Kono, N. Miller, H. Miura, M. Mohiyuddin, D. Randall, W.-S. Yang
Journal of Advances in Modeling Earth Systems, 2011

Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
G. Hendry, J. Chan, S. Kamil, L. Oliker, J. Shalf, L. P. Carloni, K. Bergman
IEEE Symposium on High Performance Interconnects (HOTI), 2011

An Auto-tuning Framework for Parallel Multicore Stencil Computations
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2010

SEJITS: Getting Productivity and Performance with Selective Embedded JIT Specialization
Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovic, James Demmel, Kurt Keutzer, John Shalf, Kathy Yelick, Armando Fox
Workshop on Programming Models for Emerging Architectures (PMEA), 2009

A Generalized Framework for Auto-tuning Stencil Computations
Shoaib Kamil, Cy Chan, Sam Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat
Cray User Group Conference, 2009
Best Paper Award

Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications
Gilbert Hendry, Shoaib Kamil, A. Biberman, J. Chan, B. Lee, M. Mohiyuddin, A. Jain, K. Bergman, L. Carloni, J. Kubiatowicz, L. Oliker, J. Shalf
International Symposium on Networks-on-Chip (NOCS), 2009

Communication Requirements and Interconnect Optimization for High-End Scientific Applications
Shoaib Kamil, Leonid Oliker, Ali Pinar, John Shalf
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
Kaushik Datta, Shoaib Kamil, Sam Williams, Leonid Oliker, John Shalf, Katherine Yelick
SIAM Review, 2009

Power Efficiency in High Performance Computing
Shoaib Kamil, John Shalf, Erich Strohmaier
International Parallel and Distributed Processing Symposium, 2008

Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs
Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz
High Performance Embedded Computing (HPEC), 2008

Reconfigurable Hybrid Interconnection for Static and Dynamic Scientific Applications
Shoaib Kamil, Ali Pinar, Daniel Gunter, Michael Lijewski, Leonid Oliker, John Shalf
ACM International Conference on Computing Frontiers, 2007

Scientific Application Performance on Candidate PetaScale Platforms
Leonid Oliker, Andrew Canning, Jonathan Carter, Costin Iancu, Michael Lijewski, Shoaib Kamil, John Shalf, H. Shan, Erich Strohmaier, Stephane Ethier, Tim Goodale
International Parallel and Distributed Processing Symposium (IPDPS), 2007
Best Paper Award

Scientific Computing Kernels on the Cell Processor
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick
International Journal of Parallel Programming (IJPP), 2007

Implicit and Explicit Optimizations for Stencil Computations
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, Katherine Yelick
Memory Systems Performance and Correctness (MSPC), 2006

The Potential of the Cell Processor for Scientific Computing
Sam Williams, John Shalf, Parry Husbands, Shoaib Kamil, Leonid Oliker, Katherine Yelick
ACM International Conference on Computing Frontiers, 2006

Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect
John Shalf, Shoaib Kamil, Leonid Oliker, David Skinner
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2005

Understanding Ultra-Scale Application Communication Requirements
Shoaib Kamil, Leonid Oliker, John Shalf, David Skinner
IEEE International Symposium on Workload Characterization (IISWC), 2005

Impact of Modern Memory Subsystems on Cache Optimizations for Stencil Computations
Shoaib Kamil, Parry Husbands, Leonid Oliker, John Shalf, Katherine Yelick
ACM SIGPLAN Workshop on Memory Systems Performance (MSP), 2005

Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply
Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, Benjamin Lee
Supercomputing: The International Conference for High Performance Computing Networking, Storage, and Analysis (SC), 2002
Finalist, Best Student Paper

Automatic Performance Tuning and Analysis of Sparse Triangular Solve
Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, Katherine A. Yelick
Workshop on Performance Optimization of High-level Languages and Libraries (POHLL), 2002
Best Student Paper, Best Presentation

Other Publications

StencilMark: Towards a Benchmark for Stencil Computations
Shoaib Kamil
1st Workshop on Stencil Computations (WOSC), 2013

Ubiquitous Dynamic Code Generation and Compilation on Future Computing Devices
Shoaib Kamil and Armando Fox
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Provocative Ideas Session, 2012

Energy-Efficient Computing for Extreme Scale Science
David Donofrio, Leonid Oliker, John Shalf, Michael Wehner, Chris Rowen, Jens Krueger, Shoaib Kamil, Marghoob Mohiyuddin
IEEE Computer Magazine, 2009

Invited Talks

Recent Results, Insights, and Lessons from Auto-tuning Three Motifs
Center for Scalable Application Development Software (CScADS), 2008

Bridging the Productivity-Performance Gap with Selective Embedded Just-in-Time Specialization
IEEE International Symposium on Embedded Multicore SoCs, 2012

SEJITS - Bridging the Productivity-Performance Gap
Workshop on Domain Specific Multicore Computing (DSMC) at ICCAD, 2012