Books/Book Chapters

  • Efficient processing of deep neural networks.
    Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer.
    Synthesis Lectures on Computer Architecture, 15(2):1--341, 2020.
    [DOI]

  • DEC Alpha
    Joel Emer, Tryggve Fossum.
    Encyclopedia of Parallel Computing, Springer, 535--545, 2011.
    [DOI]

Journal Publications

2020s

  • Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing
    M. Pellauer, J. Clemons, V. Balaji, N. Crago, A. Jaleel, D. Lee, M. O’Connor, A. Parashar, S. Treichler, P-A. Tsai, S. W. Keckler, and J. S. Emer
    ACM Transactions on Computer Systems, Vol. 41, No. 1-4, Article 4. December 2023.
    [paper]

  • Theres plenty of room at the top: What will drive computer performance after moores law?,
    Charles E. Leiserson, Neil C. Thompson, Joel S. Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez, and Tao B. Schardl.
    Science, 368(6495), 2020.
    [web]

  • How to evaluate deep neural network processors: Tops/w (alone) considered harmful,
    V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer.
    IEEE Solid-State Circuits Magazine, 12(3):28--41, 2020.
    [paper]

  • A 0.32128 tops, scalable multi-chip-module-based deep neural network inference accelerator with ground-referenced signaling in 16 nm.
    B. Zimmer, R. Venkatesan, Y. S. Shao, J. Clemons, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina, S. G. Tell, Y. Zhang, W. J. Dally, J. S. Emer, C. T. Gray, S. W. Keckler, and B. Khailany.
    IEEE Journal of Solid-State Circuits, 55(4):920--932, 2020.
    (JSSC "Best Paper" of 2021)
    [paper]

2010s

  • Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
    Y. Chen, T. Yang, J. Emer, and V. Sze.
    IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(2):292-308, June 2019.
    [paper]

  • Efficient Processing of Deep Neural Networks: A Tutorial and Survey.
    Vivienne Sze; Yu-Hsin Chen; Tien-Ju Yang; Joel S. Emer,
    Proceedings of the IEEE, Volume: 105, Issue: 12, December 2017.
    [paper]

  • Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators,
    Yu-Hsin Chen, Joel Emer, Vivienne Sze,
    IEEE Micro's Top Picks from the Computer Architecture Conferences, May/June 2017.
    [paper]

  • Scavenger: Automating the Construction of Application-Optimized Memory Hierarchies,
    Hsin-Jung Yang, Kermin Fleming, Felix Winterstein, Michael Adler, Joel Emer,
    ACM Transactions on Reconfigurable Technology and Systems (TRETS) March 2017.
    [paper]

  • Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,
    Yu-Hsin Chen, Tushar Krishna, Joel Emer, Vivienne Sze,
    IEEE Journal of Solid State Circuits (JSSC), ISSCC Special Issue, Vol. 52, No. 1, pp. 127-138, January 2017.
    [paper]

  • Unlocking Ordered Parallelism with the Swarm Architecture,
    Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, Daniel Sanchez,
    in IEEE Micro's Top Picks from the Computer Architecture Conferences, May/June 2016.
    [paper]

  • Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures,
    Michael Pellauer, Angshuman Parashar, Michael Adler, Busha Ahasan, Randy Allmon, Neal Crago, Kermin Fleming, Mohit Gamhir, Aamer Jaleel, Tushar Krishna, Daniel Lustig, Stephen Maresh, Vladimir Pavlov, Rachid Rayess, Antonia Zhai and Joel Emer,
    ACM Transactions on Computing Systems, May 2015.
    [paper]

  • Efficient Spatial Processing Element Control via Triggered Instructions,
    Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, Joel Emer,
    in IEEE Micro's Top Picks from the Computer Architecture Conferences, May 2014.
    [paper]

  • Using In-flight Chains to Build a Scalable Cache Coherence Protocol,
    Samantika Subramaniam, Simon C. Steely Jr., William Hasenplaugh, Aamer Jaleel, Carl Beckmann, Tryggve Fossum, Joel Emer,
    ACM Transactions on Architecture and Code Optimization (TACO).
    Presented at International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC), Vienna, Austria, January 2014.
    [paper]

  • The Gradient-based Cache Replacement Algorithm,
    William Hassenplaugh, Pritpal Ahuja, Aamer Jaleel, Simon Steely, Joel Emer,
    ACM Transactions on Architecture and Code Optimization (TACO),
    Presented at International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC), January 2012.
    [paper]

2000s

  • A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs,
    Michael Pellauer, Muralidaran Vijayaraghavan, Michael Adler, Arvind, Joel Emer,
    Transactions on Reconfigurable Technology and Systems (TRETS), Volume 2 Issue 3, September 2009.
    [paper]

  • Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching,
    Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., Joel Emer,
    in IEEE Micro's Top Picks from the Computer Architecture Conferences, January 2008.
    [paper]

  • Reducing the Soft-Error Rate of a High-Performance Microprocessor,
    Christopher Weaver, Joel Emer, Shubendu Mukherjee, Steven Reinhart,
    in IEEE Micro's Top Picks from the Computer Architecture Conferences, November 2004.
    [paper]

  • Measuring Architectural Vulnerability Factors
    Shubendu Mukherjee, Christopher Weaver, Joel Emer, Steven Reinhardt, Todd Austin,
    in IEEE Micro's Top Picks from the Computer Architecture Conferences, November 2003.
    [paper]

1990s

  • Simultaneous Multithreading: A Platform for Next-generation Processors,
    Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen,
    IEEE Micro, September/October 1997.
    [paper]

  • Converting Thread-Level Parallelism Into Instruction-Level Parallelism via Simultaneous Multithreading,,
    Jack Lo, Susan Eggers, Joel Emer, Henry Levy, Rebecca Stamm, Dean Tullsen,
    ACM Transactions on Computer Systems, August 1997.
    [paper]

1980s

  • Performance Analysis of Mass Storage Service Alternatives for Distributed Systems,
    K.K. Ramakrishnan, J.S. Emer,
    IEEE Transactions on Software Engineering, February 1989.

  • Design and Implementation of the VAX Distributed File Service,
    W.G. Nichols, J.S. Emer,
    Digital Technical Journal, June 1989.

  • Performance of the VAX-11/780 Translation Buffer: Simulation and Measurement,
    J. Emer and D.W. Clark,
    ACM Transactions on Computer Systems, February 1985.
    (Reprinted in "Readings in Computer Architecture" by Hill, Jouppi and Sohi, 2000)

Conference Publications

2024

  • CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool
    Tanner Andrulis, Joel S Emer, Vivienne Sze
    In Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), May 2024.
    To appear.

  • Onyx: A Programmable Accelerator for Sparse Tensor Algebra
    Kalhan Koul, Maxwell Strange, Jackson Melchert, Alex Carsello, Yuchen Mei, Olivia Hsu, Taeyoung Kong, Po-Han Chen, Huifeng Ke, Keyi Zhang, Qiaoyi Liu, Gedeon Nyengele, Akhilesh Balasingam, Jayashree Adivarahan, Ritvik Sharma, Zhouhua Xie, Christopher Torng, Joel S Emer, Fredrik Kjolstad, Mark Horowitz, Priyanka Raina
    In Proceedings of the 2024 IEEE Symposium on VLSI Technology & Circuits (VLSI), June 2024.
    To appear.

  • Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications
    Yifan Yang, Joel S Emer, Daniel Sanchez
    In Proceedings of the 51st Annual International Symposium on Computer Architecture (ISCA-51), June 2024.
    To appear.

  • Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms
    Qijing (Jenny) Huang, Po-An Tsai, Joel S Emer, Angshuman Parashar
    In Proceedings of the 51st Annual International Symposium on Computer Architecture (ISCA-51), June 2024.
    To appear.

2023

  • Accelerating RTL Simulation with Hardware-Software Co-Design
    Fares Elsabbagh, Shabnam Sheikha, Victor A Ying, Quan M Nguyen, Joel S Emer, Daniel Sanchez
    In Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023.
    [paper] [slides]

  • SecureLoop: Design Space Exploration of Secure DNN Accelerators
    Kyungmi Lee, Mengjia Yan, Joel S Emer, Anantha Chandrakasan
    In Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023.
    [paper]

  • HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
    Yannan (Nellie) Wu, Po-An Tsai, Maurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S Emer
    In Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023.
    [paper] [slides] [project website]

  • TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
    Nandeeka Nayak, Toluwanimi O Odemuyiwa, Shubham Ugare, Christopher W Fletcher, Michael Pellauer, Joel S Emer
    In Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023.
    (IEEE Micro’s Top Picks in Computer Architecture - Honorable Mention, 2023)
    [paper]

  • Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
    Fisher Xue, Yannan (Nellie) Wu, Joel S Emer, Vivienne Sze
    In Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023.
    [paper] [slides] [project website]

  • Unified Convolution Framework: A Compiler-Based Approach to Support Sparse Convolutions
    Jaeyeon Won, Changwan Hong, Charith Mendis, Joel S Emer, Saman Amarasinghe
    In Proceedings of the Sixth Conference on Machine Learning and Systems (MLSYS-2023), June 2023.
    [paper]

  • RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!
    Tanner Andrulis, Joel S Emer, Vivienne Sze
    In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA-50), June 2023.
    [paper]

  • Metior: A Comprehensive Model to Evaluate Obfuscating Side-Channel Defense Schemes
    Peter W. Deutsch, Weon Taek Na, Thomas Bourgeat, Joel S Emer, Mengjia Yan,
    In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA-50) June 2023.
    [paper]

  • Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling
    T. Odemuyiwa, H. Asghari-Moghaddam, M. Pellauer, K. Hegde, P. Tsai, N. Crago, A. Jaleel, J. Owens, E. Solomonik, J. S. Emer, and C. Fletcher
    In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023.
    [paper]

  • The Sparse Abstract Machine
    Olivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel S Emer, Mark Horowitz, Fredrik Kjølstad
    In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023.
    [paper]

  • WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program
    Jaeyeon Won, Charith Mendis, Joel S. Emer, Saman Amarasinghe
    In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023.
    [paper]

  • Optimizing Compression Schemes for Parallel Sparse Tensor Algebra
    Helen Xu, Tao B. Schardl, Michael Pellauer, Joel S. Emer
    In Proceedings of the Data Compression Conference (DCC), March 2023.
    [paper]

  • ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining
    Yifan Yang, Joel S. Emer, Daniel Sanchez
    In Proceedings of the 29th international symposium on High Performance Computer Architecture (HPCA-29), February 2023.
    [paper]

2022

  • Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling
    Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Viviennne Sze, Joel Emer
    In 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2022.
    (SIGMICRO Distinguished Artifact Award)
    [paper] [project website]

  • Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization
    Mark Horeni, Pooria Taheri, Po-An Tsai, Angshuman Parashar, Joel Emer, Siddharth Joshi
    In 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), May 2022.
    [paper]

  • DAGguise: Mitigating Memory Timing Side Channels
    Peter W. Deutsch*, Yuheng Yang*, Thomas Bourgeat, Jules Drean, Joel Emer, Mengjia Yan
    In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2022.
    [paper] [MIT News] [NSF Research News]

2021

  • Mentoring Opportunities in Computer Architecture: Analyzing the Past to Develop the Future
    Elba Garza, Gururaj Saileshwar, Tianyi Liu, Abdulrahman Mahmoud, Saugata Ghose, Joel Emer
    In Workshop on Computer Architecture Education (WCAE) at the International Symposium on Computer Architecture (ISCA), June 2021. [paper] [slides] [lightning talk] [talk+panel]

  • SpZip: Architectural Support for Effective Data Compression In Irregular Applications.
    Yifan Yang, Joel Emer, Daniel Sanchez
    In Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA-44) June 2021.
    [paper]

  • Gamma: Exploiting Gustavson's Algorithm to Accelerate Sparse Matrix Multiply.
    Guowei Zhang, Nithya Attaluri, Joel Emer, Daniel Sanchez
    In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2021.
    [paper]

2020

  • Casa: End-to-end quantitative security analysis of randomly mapped caches..
    T. Bourgeat, J. Drean, Y. Yang, L. Tsai, J. Emer, and M. Yan.
    In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1110--1123, 2020.
    (Intel Hardware Security Academic Award Finalist)
    (Top Picks in Hardware and Embedded Security Honorable Mention)
    [paper]

  • An architecture-level energy and area estimator for processing-in-memory accelerator designs.
    Y. N. Wu, V. Sze, and J. S. Emer.
    In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 116--118, 2020.

  • Digital optical neural networks for large-scale machine learning.
    L. Bernstein, A. Sludds, R. Hamerly, V. Sze, J. Emer, and D. Englund.
    In 2020 Conference on Lasers and Electro-Optics (CLEO), pages 1--2, 2020.

2019

  • MAGNet: A Modular Accelerator Generator for Neural Networks.
    Rangharajan Venkatesany, Yakun Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Yanqing Zhang, Brian Zimmer, William J. Dally, Joel Emer, Stephen W. Keckler, and Brucek Khailany.
    In Proceedings of the International Conference on Computer-Aided Design (ICCAD), November 2019.
    [paper]

  • Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs.
    Yannan Nellie Wu, Joel Emer, and Vivienne Sze.
    In Proceedings of the International Conference on Computer-Aided Design (ICCAD), November 2019.
    [paper] [project website]

  • Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.
    Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia K linefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Br ucek Khailany, and Stephen W. Keckler.
    In 2019 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2019.
    (Micro 2019 "Best Paper" and ACM 2020 "Research Highlight")
    [paper]

  • ExTensor: An Accelerator for Sparse Tensor Algebra.
    Kartik Hegde, Michael Pellauer, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Jo el Emer, and Christopher W. Fletcher.
    In 2019 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2019.
    (IEEE Micro’s Top Picks in Computer Architecture - Honorable Mention, 2019)
    [paper]

  • A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-P roductivity VLSI Methodology.
    R. Venkatesan, Y. S. Shao, B. Zimmer, J. Clemons, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina , S. G. Tell, Y. Zhang, W. J. Dally, J. S. Emer, C. T. Gray, S. W. Keckler, and B. Khailany.
    In Hot Chips, August 2019.

  • A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.
    B. Zimmer, R. Venkatesan, Y. S. Shao, J. Clemons, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina, S. G. Tell, Y. Zhang, W. J. Dally, J. S. Emer, C. T. Gray, S. W. Keckler, and B. Khailany.
    In 2019 Symposium on VLSI Circuits, June 2019.
    [paper]

  • Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.
    Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W. Keckler, Christopher W. Fletcher, and Joel Emer.
    In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 2019.
    (IEEE Micro’s Top Picks in Computer Architecture - Honorable Mention, 2019)
    [paper]

  • Timeloop: A Systematic Approach to DNN Accelerator Evaluation.
    A. Parashar, P. Raina, Y. S. Shao, Y. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer.
    In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2019.
    [paper] [project website]

2018

  • Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism.
    M. C. Jeffrey, V. A. Ying, S. Subramanian, H. R. Lee, J. Emer, and D. Sanchez.
    In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2018.
    [paper]

  • DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors.
    V. Kiriansky, I. Lebedev, S. Amarasinghe, S. Devadas, and J. Emer.
    In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2018.
    [paper]

  • A Modular Digital VLSI Flow for High-Productivity SoC Design.
    B. Khailany, E. Krimer, R. Venkatesan, J. Clemons, J. S. Emer, M. Fojtik, A. Klinefelter, M. Pellauer, N. Pinckney, Y. S. Shao, S. Srinath, C. Torng, S. L. Xi, Y. Zhang, and B. Zimmer.
    In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), June 2018.
    [paper]

  • Understanding the Limitations of Existing Energy-Efficient Design Approaches for Deep Neural Networks.
    Y. Chen, T. Yang, J. Emer, and V. Sze.
    In SysML Conference, February 2018.
    [paper]

  • Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks.
    C. Lee, Y. Shao, J. Zhang, A. Parashar, J. Emer, S. Keckler, and Z. Zhang.
    In SysML Conference, February 2018.
    [paper]

2017

  • Efficient Processing of Deep Neural Networks: A Tutorial and Survey.
    V. Sze, Y. Chen, T. Yang, and J. S. Emer.
    Proceedings of the IEEE, 105(12):2295-2329, December 2017.
    [paper]

  • Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications.
    Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W. Keckler.
    In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2017.
    (IEEE Top Picks in Test and Reliability, 2023)
    [paper]

  • A method to estimate the energy consumption of deep neural networks.
    T. Yang, Y. Chen, J. Emer, and V. Sze.
    In 2017 51st Asilomar Conference on Signals, Systems, and Computers, October 2017.
    [paper]

  • Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications,
    Guanpeng Li, Siva Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, Stephen W. Keckler,
    Supercomputing 2017, November 2017.
    [paper]

  • SAM: Optimizing Multithreaded Cores for Speculative Parallelism,
    Maleen Abeydeera, Suvinay Subramanian, Mark C. Jeffrey, Joel Emer, Daniel Sanchez,
    in Proceedings of the 26th international conference on Parallel Architectures and Compilation Techniques (PACT-26), September 2017.
    [paper]

  • SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks,
    Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, William J Dally
    Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA-44) June 2017.
    [paper]

  • Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism,
    Suvinay Subramanian, Mark C. Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, Victor A. Ying, Joel Emer, Daniel Sanchez,
    Proceedings of the 44th International Symposium in Computer Architecture (ISCA-44), June 2017.
    [paper]

  • Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision,
    Amr Suleiman, Yu-Hsin Chen, Joel Emer, Vivienne Sze,
    IEEE International Symposium of Circuits and Systems (ISCAS), Invited Paper, May 2017.
    [paper]

  • Hardware for Machine Learning: Challenges and Opportunities,
    Vivienne Sze, Yu-Hsini Chen, Joel Emer, Amr Suleiman, Zhengdong Zhang,
    IEEE Custom Integrated Circuits Conference (CICC), Invited Paper, April 2017.
    (Received award for best invited paper).
    [paper]

  • SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation
    Siva Kumar Sastry Hari, Timothy Tsai, Mark Stephenson, Stephen W Keckler, Joel Emer,
    2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2017.
    [paper]

  • Automatic Construction of Program-Optimized FPGA Memory Networks,
    Hsin-Jung Yang, Kermin Fleming, Felix Winterstein, Annie I Chen, Michael Adler, Joel Emer,
    Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, February 2017.
    [paper]

2016

  • Data-Centric Execution of Speculative Parallel Programs,
    Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, Daniel Sanchez,
    in Proceedings of the 49th annual IEEE/ACM international symposium on Microarchitecture (MICRO-49), October 2016.
    (IEEE Micro’s Top Picks in Computer Architecture - Honorable Mention, 2016)
    [paper]

  • CLARA: Circular Linked-List Auto and Self Refresh Architecture,
    Aditya Agrawal, Mike O'Connor, Evgeny Bolotin, Niladrish Chatterjee, Joel Emer, Stephen Keckler,
    Proceedings of the Second International Symposium on Memory Systems October 2016.
    [paper]

  • Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,
    Yu-Hsin Chen, Joel Emer, Vivienne Sze,
    International Symposium on Computer Architecture (ISCA), June 2016.
    (IEEE Micro's Top Picks in Computer Architecture, 2016)
    (Selected for ISCA@50 25-year retrospective 1996-2020, 2023)
    [paper] [retrospective]

  • LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning,
    Hsin-Jung Yang, Kermin Fleming, Michael Adler, Felix Winterstein, Joel Emer,
    24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ISFPGA), February 2016.
    [paper]

  • Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,
    Yu-Hsin Chen, Tushar Krishna, Joel Emer, Vivienne Sze,
    IEEE International Conference on Solid-State Circuits (ISSCC), February 2016.
    [paper] [slides] (Project website)

2015

  • A Fast and Accurate Analytical Technique to Compute the AVF of Sequential Bits in a Processor
    Steven Raasch, Arijit Biswas, Jon Stephan, Paul Racunas, Joel Emer,
    International Symposium on Microarchitecture (MICRO), December 2015.
    [paper]

  • A Scalable Architecture for Ordered Parallelism 
    Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, Daniel Sanchez,
    International Symposium on Microarchitecture (MICRO), December 2015.
    (IEEE Micro's Top Picks in Computer Architecture, 2015).
    [paper]

  • Scavenger: Automating the Construction of Application-Optimized Memory Hierarchies
    Hsin-Jung Yang, Kermin Fleming, Michael Adler, Felix Winterstein, Joel Emer,
    International Conference on Field Programmable Logic and Applications (FPL), London, England, September 2015.
    [paper]

  • SASSIFI: Evaluating Resilience of GPU Applications
    Siva Kumar Sastry Hari, Timothy Tsai, Mark Stephenson, Steve Keckler and Joel Emer,
    SELSI, March 2015.

  • High Performing Cache Hierarchies for Server Workloads -- Relaxing Inclusion to Capture the Latency Benefits of Exclusive Caches,
    Aamer Jaleel, Joseph Nuzman, Adrian Moga, Simon Steely, and Joel Emer,
    Industry Session of International Symposium on High Performance Computer Architecture (HPCA), San Francisco, CA, February 2015.
    [paper]

2014

  • The LEAP FPGA Operating System,
    Kermin Fleming, Hsin-Jung Yang, Michael Adler, Joel Emer,
    International Conference on Field Programmable Logic and Applications (FPL), Munch, Germany; September 2014.
    [paper]

  • LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories,
    Hsin-Jung Yang, Kermin Fleming, Michael Adler, Joel Emer,
    IEEE International Symposium on Field-Programmable Custom Computing Machines, Boston, Massachusetts, May 2014.
    [paper]

  • Exploiting Spatial Architectures for Edit Distance Algorithms,
    Jesmin Tithi, Neal Crago, Joel Emer,
    ISPASS 2014_, March 2014.
    Best paper nomination.
    [paper]

  • Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures,
    Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, Joel Emer,
    In International Symposium on Computer Architecture (ISCA), June 2013.
    (IEEE Micro's Top Picks in Computer Architecture, 2014).
    [paper]

2013

  • A Hierarchical Architectural Framework for Reconfigurable Logic Computing,
    Peng Li, A. Parashar, M. Pellauer, Tao Wang, J. Emer,
    IEEE 27th International Parallel and Distributed Processing Symposium (IPDPSW), January 2013.

  • Optimizing under abstraction: Using prefetching to improve FPGA performance,
    Hsin-Jung Yang, K. Fleming, M. Adler, J. Emer,
    23rd International Conference on Field Programmable Logic and Applications (FPL), January 2013. [paper]

2012

  • Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE),
    Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, Joel Emer,
    In International Symposium on Computer Architecture (ISCA), June 2012.
    (Selected for ISCA@50 25-year retrospective 1996-2020, 2023)
    [paper] [retrospective]

  • CRUISE: Cache Replacement and Utility-aware Scheduling,
    Aamer Jaleel, Hashem H. Najaf-abadi, Samantika Subramaniam, Simon C. Steely, Joel Emer,
    Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2012.
    [paper]

  • Leveraging Latency-insensitivity to Ease Multiple FPGA Design,
    Kermin Elliott Fleming, Michael Adler, Michael Pellauer, Angshuman Parashar, Arvind, and Joel S. Emer,
    Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays (FPGA 2012), February 2012.
    [paper]

  • ZIP-IO: Architecture for application-specific compression of Big Data,
    Sang Woo Jun, K.E. Fleming, M. Adler, J. Emer,
    2012 International Conference on Field-Programmable Technology (FPT), January 2012.
    [paper]

2011

  • SHiP: Signature-based Hit Predictor for High Performance Caching,
    Carole-Jean Wu, Aamer Jaleel, William Hasenplaugh, Margaret Martonosi, Simon C. Steely Jr, Joel Emer,
    International Symposium on Microarchitecture (MICRO), December 2011.
    [paper]

  • PACMan: Prefetch-Aware Cache Management for High Performance Caching,
    Carole-Jean Wu, Aamer Jaleel, Margaret Martonosi, Simon C. Steely Jr, Joel Emer,
    International Symposium on Microarchitecture (MICRO), December 2011. [paper]

  • Leap Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic,
    Michael Adler, Kermin Fleming, Angshuman Parashar, Michael Pellauer, Joel S. Emer,
    Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays (FPGA 2011), February, 2011.
    [paper]

  • HAsim: FPGA-based High-detail Multicore Simulation Using Time-division Multiplexing,
    Michael Pellauer, Michael Adler, Michel A. Kinsy, Angshuman Parashar, and Joel S. Emer,
    Proceeding of: 17th International Conference on High-Performance Computer Architecture (HPCA-17), February, 2011.
    [paper]

2010

  • Achieving Non-Inclusive Cache Performance With Inclusive Caches -- Temporal Locality Aware (TLA) Cache Management Policies,
    Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon C. Steely Jr, Joel Emer,
    International Symposium on Microarchitecture (MICRO), Atlanta, Georgia, December 2010.
    [paper]

  • High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP),
    Aamer Jaleel, Kevin Theobald, and Simon Steely Jr., Joel Emer,
    Proceedings of the 37nd Annual Symposium on Computer Architecture (ISCA-37), June 2010.
    (Selected for ISCA@50 25-year retrospective 1996-2020, 2023)
    [paper] [retrospective]

2009

  • CAMP: A Technique to Estimate Per-Structure Power at Run-time using a Few Simple Parameters,
    Michael D. Powell, Arijit Biswas, Joel Emer, Shubhendu S. Mukherjee, Basit R. Sheikh, Shrirang Yardi,
    HPCA-15, February 2009.
    [paper]

  • Soft connections: Addressing the Hardware-design Modularity Problem,
    Michael Pellauer, Michael Adler, Derek Chiou, Joel S. Emer,
    Proceedings of the 46th Design Automation Conference (DAC 2009), July, 2009.
    [paper]

2008

  • Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs,
    Michael Pellauer, Muralidaran Vijayaraghavan, Michael Adler, Arvind, Joel Emer,
    Proceedings of ISPASS-2008, April 2008.
    [paper]

  • Adaptive Insertion Policies for Managing Shared Caches on CMPs,
    Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon C. Steely Jr, and Joel Emer,
    In the International Conference on Parallel Architectures and Compiler Techniques (PACT), Toronto, Canada, October 2008.
    [paper]

  • A-Ports: An Efficient Abstraction for Cycle-Accurate Models on FPGAs,
    Michael Pellauer, Muralidaran Vijayaraghavan, Michael Adler, Arvind, Joel Emer,
    Proceedings of FPGA-08, February 2008.
    [paper]

2007

  • Adaptive Insertion Policies for High Performance Caching,
    Moinuddin Qureshi, Aamer Jaleel, Yale Patt, Simon Steely, Joel Emer,
    Proceedings of the 34nd Annual Symposium on Computer Architecture, June 2007.
    (IEEE Micro's Top Picks in Computer Architecture, 2008).
    (Selected for ISCA@50 25-year retrospective 1996-2020, 2023)
    [paper] [retrospective]

  • Late-Binding: Enabling Unordered Load-Store Queues,
    Simha Sethumadhavan, Franzi Roesner, Joel Emer, Doug Burger, Steve Keckler,
    Proceedings of the 34nd Annual Symposium on Computer Architecture, June 2007.
    [paper]

2006

  • Computing the Architectural Vunerability Factor for Address-Based Structures,
    Arijit Biswas, Paul Racunas, Raz Cheveresan, Joel Emer, Shubhendu S. Mukherjee, Ram Rangan,
    Proceedings of the 32nd Annual Symposium on Computer Architecture, June 2005.
    [paper]

2005

  • The Soft Error Problem: an Architectural Perspective,
    Shubhendu S. Mukherjee, Joel Emer, Steven K. Reinhardt,
    HPCA, Feb. 2005.
    [paper]

2004

  • Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor,
    Christopher Weaver, Joel Emer, Shubhendu S. Mukherjee, Steven K. Reinhardt,
    Proceedings of the 31st Annual Symposium on Computer Architecture (ISCA-31), June 2004.
    (IEEE Micro's Top Picks in Computer Architecture, 2004).
    [paper]

  • Cache Scrubbing in Microprocessors: Myth or Necessity?,
    Shubhendu S. Mukherjee, Joel Emer, Tryggve Fossum, and Steven K. Reinhardt,
    10th IEEE PRDC, March 2004.
    [paper]

2003

  • A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor,
    Shubhendu Mukherjee, Christopher Weaver, Joel Emer, Steve Reinhardt, and Todd Austin,
    MICRO, December 2003.
    (IEEE Micro's Top Picks in Computer Architecture, 2003).
    (Winner MICRO Test of Time award, 2023).
    [paper]

2002

  • A Comparative Study of Arbitration Algorithms for the Alpha 21364 Router Pipeline,
    Shubhendu Mukherjee, Federico Silla, Peter Bannon, Joel Emer, Steve Lang, Dave Webb,
    Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-10), October 2002.
    [paper]

  • Tarantula: A Vector Extension to the Alpha Architecture,
    Roger Espasa, Federico Ardanaz, Joel Emer, Stephen Felix, Julio Gago, Roger Gramunt, Isaac Hernandez, Toni Juan, Geoff Lowney, Matthew Mattina, and Andre Seznec,
    Proceedings of the 29st Annual Symposium on Computer Architecture (ISCA-29), June 2002.
    [paper]

  • ASIM: A performance model framework,
    Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, Toni Juan,
    IEEE Computer, Feb. 2002.
    [paper]

  • Loose Loops Sink Chips,
    Eric Borch, Eric Tune, Srilatha Manne, Joel Emer,
    HPCA8, Feb. 2002.
    [paper]

2000

  • Combining Static and Dynamic Branch Prediction to Reduce Destructive Aliasing,
    Harish Patil, Joel Emer,
    HPCA6, Jan. 2000.
    [paper]

1990s

  • The Use of Multithreading for Exception Handling,
    Craig B. Zilles, Joel S. Emer, Gurindar S. Sohi,
    Micro-32, Nov. 1999.
    [paper]

  • Memory Dependence Prediction using Store Sets,
    George Chrysos, Joel Emer,
    Proceedings of the 25th Annual Symposium on Computer Architecture (ISCA-25), June 1998.
    [paper]

  • A Language for Describing Predictors and its Application to Automatic Synthesis,
    Joel Emer, Nick Gloy,
    Proceedings of the 24th Annual Symposium on Computer Architecture (ISCA-24), June 1997. [paper]

  • Architecture of a Flexible Real-time Video Encoder/Decoder: The DECchip 21230,
    Matthew Adiletta, Debra Bernstein, Samuel Ho, Joel Emer, Bill Wheeler,
    Proceedings of SPIE/IS&T Electronic Imaging Symposium, October 1996.

  • Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multitheading Processor,
    Dean Tullsen, Susan Eggers, Joel Emer, Hank Levy, Jack Lo and Rebecca Stamm,
    Proceedings of the 23th Annual Symposium on Computer Architecture (ISCA-23), May 1996.
    (Reprinted in "Readings in Computer Architecture" by Hill, Jouppi and Sohi, 2000)
    (Winner ISCA Test of Time award, 2011).
    (Selected for ISCA@50 25-year retrospective 1996-2020, 2023)
    [paper] [retrospective]

  • Predictive Sequential Associative Cache,
    Brad Calder and Dirk Grunwald, Joel Emer,
    Proceedings of HPCA-2, February 1996.
    [paper]

  • A System Level Perspective on Branch Architecture Performance,
    Brad Calder, Dirk Grunwald, Joel Emer,
    Proceedings of Micro-28, November 1995.
    [paper]

  • Instruction Fetching: Coping with Code Bloat,
    Rich Uhlig, David Nagle, Trevor Mudge, Stuart Sechrest, Joel Emer,
    Proceedings of the 22th Annual Symposium on Computer Architecture (ISCA-22), June 1995.
    [paper]

  • Integrated Access to Heterogeneous Distributed Services,
    J.S. Emer, W.E. Weihl,
    Proceedings of Winter USENIX, January 1990.

1980s

  • Performance Considerations for Distributing Services - A Case Study: Mass Storage
    J.S. Emer, K.K. Ramakrishnan,
    Proceedings of the 8th International Conference on Distributed Computing Systems, June 1988.

  • A Model of File Server Performance for a Heterogeneous Distributed System,
    K.K. Ramakrishnan, J.S. Emer,
    Proceedings of the ACM SIGCOMM &86 Symposium, August 1986.

  • A Programmable Interface Language for Heterogeneous Distributed Systems,
    J.S. Emer, J.R. Falcone,
    DEC Technical Report, DEC-TR-371, August 1985.

  • A Characterization of Processor Performance in the VAX-11/780
    J.S. Emer, D. W. Clark,
    Proceedings of the 11th International Conference on Computer Architecture, May 1984.
    (Reprinted in 25 Years of the International Symposium on Computer Architecture, 1999)
    (Reprinted in "Readings in Computer Architecture" by Hill, Jouppi and Sohi, 2000)
    [paper] [retrospective]

1970s

  • Control Store Organization for Multiple Steam Pipelined Processors,
    J.S. Emer, E. S. Davidson,
    Proceedings of the 1978 International Conference on Parallel Processing, August 1978.

Other Publications

2020s

  • The EDGE Language: Extended General Einsums for Graph Algorithms
    Toluwanim O Odemuyiwa, Joel S. Emer, John D. Owens
    arXiv, 2024.
    [arXiv]

  • Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
    Tanner Andrulis, Ruicong Chen, Hae-Seung Lee, Joel S. Emer, Vivienne Sze
    arXiv, 2024.
    [arXiv]

  • Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
    Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze
    arXiv, 2023.
    [arXiv]

  • Penetrating Shields: A Systematic Analysis of Memory Corruption Mitigations in the Spectre Era
    Weon Taek Na, Joel S. Emer, Mengjia Yan
    arXiv, 2023.
    [arXiv]

  • HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
    Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer
    arXiv, 2023.
    [arXiv]

  • TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
    Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer
    arXiv, 2023.
    [arXiv]

  • RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!
    Tanner Andrulis, Joel S Emer, Vivienne Sze
    arXiv, 2023.
    [arXiv]

  • The Sparse Abstract Machine
    Olivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel Emer, Mark Horowitz, Fredrik Kjolstad
    arXiv, 2023.
    [arXiv]

  • Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling
    Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer
    arXiv, 2022.
    [arXiv]

  • Freely scalable and reconfigurable optical hardware for deep learning,
    Liane Bernstein, Alexander Sludds, Ryan Hamerly, Vivienne Sze, Joel Emer, and Dirk Englund.
    arXiv, 2020.
    [arXiv]

  • Estimating Silent Data Corruption Rates Using a Two-Level Model
    Siva Kumar Sastry Hari, Paolo Rech, Timothy Tsai, Mark Stephenson, Arslan Zulfiqar, Michael Sullivan, Philip Shirvani, Paul Racunas, Joel Emer, Stephen W. Keckler.
    arXiv, 2020.
    [arXiv]

2010s

  • Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
    Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze
    arXiv, 2019.
    [arXiv]

  • SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
    Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally
    arXiv, 2017.
    [arXiv]

  • Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision
    Amr Suleiman, Yu-Hsin Chen, Joel Emer, Vivienne Sze
    arXiv, 2017.
    [arXiv]

  • Hardware for Machine Learning: Challenges and Opportunities
    Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, Zhengdong Zhang
    arXiv, 2017.
    [arXiv]

  • Efficient Processing of Deep Neural Networks: A Tutorial and Survey
    Vivienne Sze, Tien.-Ju Yang, Yu-Hsin Chen, Joel Emer.
    arXiv, August 2017.
    [paper]

Notes

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without explicit permission of the copyright holder.