Xiangyao Yu

Ph.D. Candidate
Department of Electrical Engineering and Computer Science
Massachusetts Institue of Technology
Address: 32G-826, 32 Vassar St, Cambridge, MA, 02139
Email: yxy@mit.edu
[C.V.]


I am a fifth year graduate student in CSAIL at MIT. My advisor is Professor Srinivas Devadas. I am fortunate to work closely with Professor Andrew Pavlo (CMU), Professor Daniel Sanchez (MIT) and Professor Michael Stonebraker (MIT).

Before coming to MIT, I earned my Bachelor degree from Electronic Engineering from Tsinghua University, Beijing, China. I worked with Professor Onur Mutlu at CMU as a research intern in the summer of 2011.

I am on the academic job market now. Here are my application materials:
    C.V.
    Research Statement
    Teaching Statement

Research Projects

I am interested in building large-scale computer systems through optimizations at all levels across the hardware/software stack. As a computer architect, I design hardware for better performance, scalability, energy efficiency and security. I also did extensive work in database management systems (DBMS). Below is a summary of my recent projects.

Tardis Cache Coherence Protocol


Tardis is a cache coherence protocol that only requires Θ(log(N)) (for an N-core processor) storage per cache block while maintaining high performance and simplicity. Tardis is unique because it enforces the global memory order using physiological time (a combination of physical and logical time) instead of pure physical time. Tardis has no invalidations. Each read takes a logical lease on a cache block and a write jumps (or time travels) to the end of the lease. Tardis eliminates broadcasting or multicasting, and yet does not require a synchronized clock. We proved the correctness of Tardis and extended it to relaxed consistency models. Recently, we are prototyping Tardis on FPGA boards.


TicToc Concurrency Control Algorithm

TicToc is a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior Timestamp Ordering (T/O) schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. On a 40-core machine, TicToc outperforms other state-of-the-art concurrency control algorithms by up to 92% while reducing the abort rate by 3.3x.

1000-core Concurrency Control Evaluation

Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip. Current database management systems (DBMSs), however, are not designed for this level of parallelism. In this project, we performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips. Our analysis shows that all algorithms fail to scale to this magnitude but for different reasons. In each case, we identify fundamental bottlenecks that are independent of the particular database implementation. We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware. Our DBMS is open source on github (DBx1000).

Indirect Memory Prefetcher

Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses which have little temporal or spatial locality and thus cannot be captured by traditional prefetchers. A majority of these irregular accesses come from indirect patterns of the form A[B[i]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3X) compared to a baseline 64 core system with streaming prefetchers.

Path Oblivious RAM

Oblivious RAM (ORAM) is a technique to hide the access pattern to an untrusted storage system. With ORAM, a curious adversary cannot tell what address the user is accessing when observing the bits moving between the user and the storage system. Path ORAM is an extremely simple Oblivious RAM protocol with a small amount of client storage. I have worked on Path ORAM design space exploration in secure processors, its performance optimizations, integrity verification and timing leakage protection. I have also proposed worked on secure ways to interact with tamper-resistant hardware with bounded information leakage.

Publication

[Google Scholar] [DBLP]

Teaching