Matei Zaharia

Assistant Professor
Douglas T. Ross Career Development Professor of Software Technology

I’m an assistant professor at MIT CSAIL, where I work on computer systems and big data as part of the PDOS and bigdata@CSAIL groups. I’m also co-founder and CTO of Databricks, the big data company commercializing Apache Spark.

You can contact me at or find me in Stata Center office 32G-996.



I work on systems and algorithms for large-scale data-intensive computing. My projects include:

Spark: As big data analytics evolves beyond simple batch jobs, there is a need for both more complex multi-stage applications (e.g. machine learning algorithms) and more interactive ad-hoc queries. Spark provides an efficient abstraction for in-memory cluster computing called Resilient Distributed Datasets, and can run 100x faster than Hadoop for these workloads. (homepage) (short paper) (NSDI’12 paper)

Shark: This high-speed query engine runs Hive SQL queries on top of Spark up to 100x faster than Hive, and supports fault recovery and complex analytics (e.g. machine learning). (homepage) (SIGMOD’13)

Mesos: Clusters are running increasingly diverse applications, from batch jobs to interactive services. Mesos is a cluster manager that efficiently supports diverse applications by letting them control their own scheduling. The project is open source in the Apache Incubator. (homepage) (NSDI’11 paper)

Multi-Resource Fairness: Life is not fair, but with a little help, your computer system can be, ensuring predictable time-sharing between users. However, past work on fair sharing considered a single resource (e.g. CPU), while cluster applications have demands across multiple resources (memory, IO, CPU, etc). Dominant resource fairness generalizes max-min fairness for this case. (NSDI’11) (SIGCOMM’12)

MapReduce Scheduling: I’ve worked on several scheduling algorithms for MapReduce, including the LATE algorithm for straggler mitigation (OSDI’08) and delay scheduling for data locality (Eurosys’10). Both algorithms are now included in Hadoop. I also developed the Hadoop Fair Scheduler.

To learn more about my graduate research, you can also read my job application materials.










Full Publication List and Technical Reports


Open Source

Almost all of my work is open source:

I’m also a committer on the Apache Hadoop, Spark and Mesos projects.

Adapted from a template by Andreas Viklund.