I am a fifth year graduate student in CSAIL
My advisor is Professor Srinivas Devadas
. I am fortunate to work closely with
Professor Andrew Pavlo
Professor Daniel Sanchez
Professor Michael Stonebraker
Before coming to MIT, I earned my Bachelor degree from Electronic Engineering from
, Beijing, China.
I worked with Professor
at CMU as a research intern in the summer of 2011.
I am interested in building large-scale computer systems through optimizations at all levels across the hardware/software stack.
As a computer architect, I design hardware for better performance, scalability, energy efficiency and security.
I also did extensive work in database management systems (DBMS). Below is a summary of my recent projects.
Tardis Cache Coherence Protocol
is a cache coherence protocol that only requires Θ(log(N)) (for an N-core processor)
storage per cache block
while maintaining high performance and simplicity. Tardis is unique because it enforces the global memory
order using physiological time (a combination of physical and logical time) instead of pure physical time.
Tardis has no invalidations. Each read takes a logical lease on a cache block and a write jumps (or time travels)
to the end of the lease. Tardis eliminates broadcasting or multicasting, and yet does not require a synchronized clock.
the correctness of Tardis and extended it to
relaxed consistency models
. Recently, we are prototyping Tardis on FPGA boards.
TicToc Concurrency Control Algorithm
is a new optimistic concurrency control algorithm that avoids the scalability
and concurrency bottlenecks of prior Timestamp Ordering (T/O) schemes. TicToc relies on a novel and provably
correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol
assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction.
TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by
conventional T/O schemes. On a 40-core machine, TicToc outperforms other state-of-the-art concurrency control algorithms
by up to 92% while reducing the abort rate by 3.3x.
1000-core Concurrency Control Evaluation
Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip.
Current database management systems (DBMSs), however, are not designed for this level of parallelism.
In this project,
we performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips.
shows that all algorithms fail to scale to this magnitude but for different reasons.
In each case, we identify fundamental bottlenecks that are independent of the particular database implementation.
We conclude that rather than pursuing incremental solutions, many-core chips may require a completely
redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.
Our DBMS is open source on github (DBx1000)
Indirect Memory Prefetcher
Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses
which have little temporal or spatial locality and thus cannot be captured by traditional prefetchers.
A majority of these irregular accesses come from indirect patterns of the form A[B[i]].
We propose an efficient hardware indirect memory prefetcher (IMP)
to capture this access pattern and hide latency.
We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth
pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3X)
compared to a baseline 64 core system with streaming prefetchers.
Path Oblivious RAM
Oblivious RAM (ORAM) is a technique to hide the access pattern to an untrusted storage system.
With ORAM, a curious adversary cannot tell what address the user is accessing when observing the bits moving
between the user and the storage system. Path ORAM
is an extremely simple
Oblivious RAM protocol with a small amount of client storage.
I have worked on Path ORAM design space exploration
in secure processors,
its performance optimizations
, integrity verification
and timing leakage protection
. I have also proposed worked on secure ways to
interact with tamper-resistant hardware
with bounded information leakage.
Xiangyao Yu, Hongzhe Liu, Ethan Zou, Srinivas Devadas
Tardis 2.0: Optimized Time Traveling Coherence for Relaxed Consistency Models
Proceedings of the 25th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2016.
Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, Srinivas Devadas
TicToc: Time Traveling Optimistic Concurrency Control
Proceedings of SIGMOD, June 2016.
Xiangyao Yu, Christopher Hughes, Nadathur Satish, Srinivas Devadas
IMP: Indirect Memory Prefetcher
Proceedings of the 48th International Symposium on Microarchitecture (MICRO), December 2015.
Xiangyao Yu, Srinivas Devadas
Tardis: Time Traveling Coherence Algorithm for Distributed Shared Memory
Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), October 2015.
Presented in the Best Paper Session
Xiangyao Yu, Syed Kamran Haider, Ling Ren, Christopher Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas
PrORAM: Dynamic Prefetcher for Oblivious RAM
International Symposium on Computer Architecture (ISCA), June 2015.
Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, Michael Stronebraker
Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores
Proceedings of the VLDB Endowment, vol. 8, iss. 3, pages. 209—220, November 2014.
Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario,
Reetuparna Das, Gabriel Loh, and Onur Mutlu
Design and Evaluation of Hierarchical Rings with Deflection Routing
Proceedings of the 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), October 2014.
Christopher Fletcher, Ling Ren, Xiangyao Yu, Marten van Dijk, Omer Khan, and Srinivas Devadas
Suppressing the Oblivious RAM Timing Channel While Making Information Leakage and Program Efficiency Trade-offs
Proceedings of the Int'l Symposium on High Peformance Computer Architecture (HPCA), February 2014
Xiangyao Yu, Christopher Fletcher, Ling Ren, Marten Van Dijk, and Srinivas Devadas
Generalized External Interaction with Tamper-Resistant Hardware with Bounded Information Leakage
Proceedings of the Cloud Computing Security Workshop (CCSW), November 2013.
Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher Fletcher,
Ling Ren, Xiangyao Yu, and Srinivas Devadas
Path ORAM: An Extremely Simple Oblivious RAM Protocol
Proceedings of the 20th Computer and Communication Security Conference (CCS), November 2013.
Best Student Paper Award
Ling Ren, Christopher W. Fletcher, Xiangyao Yu,
Marten van Dijk, and Srinivas Devadas
Integrity Verification for Path Oblivious-RAM
Proceedings of the 17th IEEE High Performance Extreme Computing Conference (HPEC), September 2013
Ling Ren, Xiangyao Yu, Christopher W. Fletcher,
Marten van Dijk, and Srinivas Devadas
Design Space Exploration and Optimization of
Path Oblivious RAM in Secure Processors
40th International Symposium on Computer Architecture (ISCA), Jun, 2013
Yuan Lin Yeoh, Bo Wang, Xiangyao Yu, Tony Tae Hyoung Kim
A 0.4V 7T SRAM with Write Through Virtual Ground and
Ultra-fine Grain Power Gating Switches
IEEE International Symposium on Circuits and Systems (ISCAS), May 2013
Chris Fallin, Greg Nazario, Xiangyao Yu, Kevin Chang,
Rachata Ausavarungnirun, Onur Mutlu
MinBD: Minimally-Buffered Deflection Routing
for Energy-Efficient Interconnect
Proceedings of the 6th ACM/IEEE International Symposium on Networks on Chip (NOCS), May 2012
One of the five papers nominated for the Best Paper Award by the Program Committee