Tushar Krishna

Spoon boy: Do not try and bend the spoon. That's impossible. Instead... only try to realize the truth.
Neo: What truth?
Spoon boy: There is no spoon.
Neo: There is no spoon?
Spoon boy: Then you'll see, that it is not the spoon that bends, it is only yourself.

-The Matrix-

Research

Goal: My research focuses on on-chip interconnects for homogenous and heterogenous many-core systems. My goal is to enable shared networks to deliver the performance (delay and throughput) of an ideal but impractical fully-connected fabric.

PhD Thesis:
Enabling dedicated single-cycle connections over a shared Network-on-Chip

Abstract:
Adding multiple processing cores on the same chip has become the de facto design choice as we continue extracting more and more performance/watt from our chips in every technology generation. In this context, the interconnect fabric connecting the cores starts gaining paramount importance. A high latency network can create performance bottlenecks and limit scalability. Thus conventional wisdom forces coherence protocol and software designers to develop techniques to optimize for locality and keep communication to the minimum. This dissertation challenges this conventional wisdom. We show that on-chip networks can be designed to provide extremely low-latencies while handling bursts of high-bandwidth traversals, thus reversing the trade-offs one typically associates with Private vs. Shared caches, or Broadcast vs. Directory protocols.

The dissertation progressively builds a network-on-chip fabric that dynamically creates single-cycle network paths across multiple-hops, for both unicast and collective (1-to-Many and Many-to-1) communication flows. We start with a prototype chip demonstrating single-cycle per-hop traversals over a mesh network-on-chip. This design is then enhanced to support 1-to-Many (multicast) and Many-to-1 (acknowledgement) traffic flows by intelligent forking and aggregation respectively at network routers. Finally, we leverage clock-less repeated wires on the data-path and propose a dynamic cycle-by-cycle network reconfiguration methodology to provide single-cycle traversals across 9-11 hops at GHz frequencies. The network architectures proposed in this thesis provide performance that is within 12% of that provided by an idealized contention-free fully-connected single-cycle network. Going forward, we believe that the ideas proposed in this thesis can pave the way for locality-oblivious shared-memory design.

Relevant Publications:

Single-Cycle Per-Hop NoC for 1-to-1 Traffic:
Tapeout of prototype chip called SWIFT in 90nm with an advanced flow-control to bypass buffers along control path, and a low-swing crossbar circuit to provide low-power datapath traversal. [abstract] | [ICCD 2010 paper]

Single-Cycle Per-Hop NoC for 1-to-Many and Many-to-1 Traffic:
Network-on-Chip Architectures FANOUT and FANIN to efficiently handle 1-to-Many and Many-to-1 traffic flows, respectively, that occur in cache coherence protocols and MPI routines. [abstract] | [MICRO 2011 paper]

Single-Cycle Multi-Hop NoC for 1-to-1 Traffic:
Network-on-Chip Architecture called SMART to enable single-cycle traversals across multiple hops of the network. [abstract] | [HPCA 2013 paper]

Single-Cycle Multi-Hop NoC for 1-to-Many and Many-to-1 Traffic:
Network-on-Chip Architecture extending SMART to perform single-cycle multicast and reductions on-chip . [abstract] | [NOCS 2014 paper]