C-Store - A Column-Oriented High Performance Database
C-Store is a read-optimized, column-oriented, and distributed data warehouse architecture for supporting ad-hoc queries on terabytes of data. Initial experiments indicate that the architecture can achieve 10-100 times faster performance than commercial databases on a TPC-H decision support benchmark.
The key contributions of the design include:
- a hybrid architecture with a write-oriented component geared toward providing reasonable performance on inserts and updates and a read-optimized component providing high performance on ad-hoc queries.
- redundant storage of data in overlapping projections of varying sort orders in order to provide better performance on a variety of queries.
- heavily compressed columns to reduce I/O bandwidth in favor of the higher (but relatively cheaper) CPU cost of decompression.
- a column-oriented optimizer and executor.
- high availability through K-safety and overlapping projections.
- the use of snapshot isolation to avoid two-phase commit and locking.
My specific contributions to this project have centered around:
- prototyping major parts of the column-oriented executor and relational operators.
- refining and correcting the algorithms for maintaining the high and low watermarks used for snapshot isolation and site recovery.
- designing and evaluating new commit protocols and recovery protocols (based on HARBOR) for the distributed version of the C-Store database.
I've co-authored a paper on this work:
- Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik.
"C-Store: A Column-oriented DBMS." Very Large Data Bases Conference 2005. [pdf]