suvinay

Suvinay Subramanian

Computer Architect @ Google
Email: suvinay@csail.mit.edu

CV   •   Scholar   •   LinkedIn   •   Twitter

TL;DR

  • 👨‍💻 Building AI Accelerator Systems (TPUs) @ Google
  • 🎓 EECS Ph.D., S.M. @ MIT | EE B.Tech @ IIT-Madras
  • 🎙️ Co-host the Computer Architecture Podcast
  • 🔖 Interests: Computing | Music | Cooking | Word Games | Trivia | Hiking | Books

About Me

I work at Google building hardware systems (TPUs) to accelerate machine learning and AI. My expertise is in hardware-software codesign: I have worked on solutions spanning the hardware-software stack ranging from circuits, microarchitecture, architecture, programming models, and interconnection networks.

I received a Ph.D. from MIT (CSAIL), advised by Daniel Sanchez. I helped develop new programming models and multi-core architectures to exploit challenging forms of parallelism in applications. I completed my masters at MIT advised by Li-Shiuan Peh on high-performance interconnection networks. Prior to MIT, I spent a wonderful four years as an undergraduate at IIT Madras.

Interests and Activities

I co-host the Computer Architecture Podcast with Lisa Hsu (Microsoft). Check out our episodes on your favorite podcast player (iTunes, Spotify, Stitcher, etc. ).

I also give technical talks on TPUs and other topics I work on at universities, conferences, and other forums.

Outside work, I enjoy hiking, traveling, cooking, reading, word games, and most genres of music. I am passionate about Carnatic music, and learn Carnatic vocal music and the Mridangam, an Indian percussion instrument. I actively support these art forms through operating roles in some Bay Area organizations (e.g., South India Fine Arts).

While at MIT, I was part of a South Asian fusion a cappella group, MIT Ohms. We enjoyed performing live and participated in several inter-collegiate a capella competitions. I have also enjoyed singing with others for local events through my high school, undergraduate and graduate school years [select performances playlist below].

Publications

  • Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
    Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin
    ICML 2025 (To appear) [X (Twitter) Thread]
  • RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
    Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gustavo Alonso, Amir Yazdanbakhsh, Vidushi Dadu
    ISCA 2025 (To appear)
  • Understanding and Optimizing Multi-Stage AI Inference Pipelines
    Abhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna
    arxiv
  • Effective Interplay between Sparsity and Quantization: From Theory to Practice
    Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh
    ICLR 2025 (Spotlight)
  • The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
    Tian Jin, Ahmed Imtiaz Humayun, Utku Evci, Suvinay Subramanian, Amir Yazdanbakhsh, Dan Alistarh, Gintare Karolina Dziugaite
    ICLR 2025
  • Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
    Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna
    CPAL 2025
  • TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
    Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson
    ISCA 2023 (Industry Track)
  • FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
    Sheng-Chun Kao, Suvinay Subramanian, Gaurav Agrawal, Amir Yazdanbakhsh, Tushar Krishna
    ASPLOS 2023
  • STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
    Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh
    ICML 2023
  • Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask
    Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna
    SNN 2022
  • Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism
    Mark C. Jeffrey, Suvinay Subramanian, Victor A. Ying, Hyun Ryong Lee, Joel Emer, Daniel Sanchez
    MICRO 2018
  • SAM: Optimizing Multithreaded Cores for Speculative Parallelism
    Maleen Abeydeera, Suvinay Subramanian, Mark C. Jeffrey, Joel Emer, Daniel Sanchez
    PACT 2017
  • Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism
    Suvinay Subramanian, Mark C. Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, Victor A. Ying, Joel Emer, Daniel Sanchez
    ISCA 2017
    MIT News Article, Hacker News Discussion
  • Data-Centric Execution of Speculative Parallel Programs
    Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, Daniel Sanchez
    MICRO 2016
  • Programmable Packet Scheduling at Line Rate
    Anirudh Sivaraman, Suvinay Subramanian, Anurag Agrawal, Sharad Chole, Shang-Tse Chuang, Tom Edsall, Mohammad Alizadeh, Sachin Katti, Nick McKeown, Hari Balakrishnan
    SIGCOMM 2016
    Web site
  • Unlocking Ordered Parallelism with the Swarm Architecture
    Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, Daniel Sanchez
    IEEE Micro 2016
    Top Picks from the Computer Architecture Conferences
    MIT News Article, EE Journal Article
  • A Scalable Architecture for Ordered Parallelism
    Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, Daniel Sanchez
    MICRO 2015
    Selected for IEEE Micro's Top Picks special issue of "most significant papers in computer architecture based on novelty and long-term impact" from 2015
  • Towards Programmable Packet Scheduling
    Anirudh Sivaraman, Suvinay Subramanian, Anurag Agrawal, Sharad Chole, Shang-Tse Chuang, Tom Edsall, Mohammad Alizadeh, Sachin Katti, Nick McKeown, Hari Balakrishnan
    HotNets 2015
  • SCORPIO: A 36-core Research Chip Demonstrating Snoopy Coherence on a Scalable Mesh NoC with In-Network Ordering
    Chia-Hsin Owen Chen, Sunghyun Park, Suvinay Subramanian, Tushar Krishna, Bhavya K. Daya, Woo-Cheol Kwon, Brett Wilkerson, John Arends, Anantha P. Chandrakasan, Li-Shiuan Peh
    HotChips 2014
  • SCORPIO: A 36-core Research Chip Demonstrating Snoopy Coherence on a Scalable Mesh NoC with In-Network Ordering
    Bhavya K. Daya, Chia-Hsin Owen Chen, Suvinay Subramanian, Woo-Cheol Kwon, Sunghyun Park, Tushar Krishna, Anantha P. Chandrakasan, Li-Shiuan Peh
    ISCA 2014
    MIT News Article
  • No Silver Bullet: Extending SDN to the Data Plane
    Anirudh Sivaraman, Keith Winstein, Suvinay Subramanian, Hari Balakrishnan
    HotNets 2013
    Selected for the final round of the Qualcomm Innovation Fellowship
    Web site
  • Single-Cycle Multihop Asynchronous Repeated Traversal: A SMART Future for Reconfigurable On-Chip Networks
    Tushar Krishna, Chia-Hsin Owen Chen, Sunghyun Park, Woo-Cheol Kwon, Suvinay Subramanian, Anantha P. Chandrakasan, Li-Shiuan Peh
    IEEE Computer, October 2013
  • SMART: A Single-Cycle Reconfigurable NoC for SoC Applications
    Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramanian, Anantha P. Chandrakasan, Li-Shiuan Peh
    DATE 2013

Theses

  • Architectural Techniques to Unlock Ordered and Nested Speculative Parallelism
    Ph.D. Dissertation, MIT, September 2018
  • Ordered Mesh Network Interconnect (OMNI): Design and Implementation of In-Network Coherence
    S.M. Thesis, MIT, June 2013
    Poster

Copyright Suvinay Subramanian © 2016 - Present