edmond lau

C-Store - A Column-Oriented High Performance Database

C-Store is a read-optimized, column-oriented, and distributed data warehouse architecture for supporting ad-hoc queries on terabytes of data. Initial experiments indicate that the architecture can achieve 10-100 times faster performance than commercial databases on a TPC-H decision support benchmark.

The key contributions of the design include:

  • a hybrid architecture with a write-oriented component geared toward providing reasonable performance on inserts and updates and a read-optimized component providing high performance on ad-hoc queries.
  • redundant storage of data in overlapping projections of varying sort orders in order to provide better performance on a variety of queries.
  • heavily compressed columns to reduce I/O bandwidth in favor of the higher (but relatively cheaper) CPU cost of decompression.
  • a column-oriented optimizer and executor.
  • high availability through K-safety and overlapping projections.
  • the use of snapshot isolation to avoid two-phase commit and locking.

My specific contributions to this project have centered around:

  • prototyping major parts of the column-oriented executor and relational operators.
  • refining and correcting the algorithms for maintaining the high and low watermarks used for snapshot isolation and site recovery.
  • designing and evaluating new commit protocols and recovery protocols (based on HARBOR) for the distributed version of the C-Store database.

I've co-authored a paper on this work:

  • Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik.
    "C-Store: A Column-oriented DBMS." Very Large Data Bases Conference 2005. [pdf]

Copyright © 2006 Edmond Lau