Locality-Oblivious Cache Organization leveraging Single-Cycle Multi-Hop NoCs
Woo-Cheol Kwon, Tushar Krishna, and Li-Shiuan Peh
Locality has always been a critical factor in on-chip data
placement on CMPs as accessing further-away caches has in
the past been more costly than accessing nearby ones. Substantial
research on locality-aware designs have thus focused
on keeping a copy of the data private. However, this complicates
the problem of data tracking and search/invalidation;
tracking the state of a line at all on-chip caches at a directory
or performing full-chip broadcasts are both non-scalable and
extremely expensive solutions. In this paper, we make the
case for Locality-Oblivious Cache Organization (LOCO), a
CMP cache organization that leverages the on-chip network
to create virtual single-cycle paths between distant caches,
thus redefining the notion of locality. LOCO is a clustered
cache organization, supporting both homogeneous and heterogeneous
cluster sizes, and provides near single-cycle accesses
to data anywhere within the cluster, just like a private
cache. Globally, LOCO dynamically creates a virtual mesh
connecting all the clusters, and performs an efficient global
data search and migration over this virtual mesh, without
having to resort to full-chip broadcasts or perform expensive
directory lookups. Trace-driven and full system simulations
running SPLASH-2 and PARSEC benchmarks show
that LOCO improves application run time by up to 44.5%
over baseline private and shared cache.