Matei Zaharia’s Publications
2016
- F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia and A. Talwalkar. Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale, to appear at NIPS 2016.
- R.B. Zadeh, X. Meng, A. Staple, B. Yavuz, L. Pu, S. Venkataraman, E. Sparks, A. Ulanov and M. Zaharia. Matrix Computations and Optimizations in Apache Spark, to appear at KDD 2016. Best Paper Award Runner-Up.
- A. Dave, A. Jindal, L.E. Li, R. Xin, J. Gonzalez and M. Zaharia. GraphFrames: An Integrated API for Mixing Graph and Relational Queries, GRADES 2016.
- M. Vartak, H. Subramanyam, W.E. Lee, S. Viswanathan, S. Husnoo, S. Madden and M. Zaharia. ModelDB: A System for Machine Learning Model Management, HILDA 2016.
- S. Venkataraman, Z. Yang, D. Liu, E. Liang, X. Meng, R. Xin, A. Ghodsi, M. Franklin, I. Stoica and M. Zaharia. SparkR: Scaling R Programs with Spark, SIGMOD 2016.
- X. Meng, J. Bradley, B. Yuvaz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine Learning in Apache Spark, JMLR, 17(34):1–7, 2016.
- Q. Pu, H. Li, M. Zaharia, A. Ghodsi, and I. Stoica. FairRide: Near-Optimal, Fair Cache Sharing, NSDI 2016.
2015
- J. van den Hooff, D. Lazar, M. Zaharia and N. Zeldovich. Vuvuzela: Scalable Private Messaging Resistant to Traffic Analysis, SOSP 2015, October 2015.
- M. Armbrust, T. Das, A. Davidson, A. Ghodsi, A. Or, J. Rosen, I. Stoica, P. Wendell, R. Xin and M. Zaharia. Scaling Spark in the Real World: Performance and Usability, VLDB 2015, August 2015.
- M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi and M. Zaharia. Spark SQL: Relational Data Processing in Spark. SIGMOD 2015, June 2015.
2014
- H. Li, A. Ghodsi, M. Zaharia, S. Shenker and I. Stoica, Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks, SOCC 2014, November 2014.
- S.N. Naccache, S. Federman, N. Veeeraraghavan, M. Zaharia, D. Lee, E. Samayoa, J. Bouquet, A.L. Greninger, K. Luk, B. Enge, D.A. Wadford, S.L. Messenger, G.L. Genrich, K. Pellegrino, G. Grard, E. Leroy, B.S. Schneider, J.N. Fair, M.A. Martinez, P. Isa, J.A. Crump, J.L. DeRisi, T. Sittler, J. Hackett Jr., S. Miller and C.Y. Chiu, A Cloud-Compatible Bioinformatics Pipeline for Ultrarapid Pathogen Identification from Next-Generation Sequencing of Clinical Samples, Genome Research, June 2014.
2013
- M. Zaharia. An Architecture for Fast and General Data Processing on Large Clusters (PhD Disseration).
- M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-Tolerant Streaming Computation at Scale, SOSP 2013, November 2013.
- K. Ousterhout, P. Wendell, M. Zaharia and I. Stoica. Sparrow: Distributed, Low-Latency Scheduling, SOSP 2013, November 2013.
- R. Xin, J. Rosen, M. Zaharia, M. Franklin, S. Shenker, and I. Stoica. Shark: SQL and Rich Analytics at Scale, SIGMOD 2013, June 2013.
- A. Ghodsi, M. Zaharia, S. Shenker and I. Stoica. Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints, EuroSys 2013, April 2013.
2012
- A. Ghodsi, V. Sekar, M. Zaharia and I. Stoica. Multi-Resource Fair Queueing for Packet Processing, SIGCOMM 2012, August 2012. Best Paper Award.
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica. Fast and Interactive Analytics over Hadoop Data with Spark, USENIX ;login:, August 2012.
- A.N. Rafferty, M. Zaharia and T.L. Griffiths. Optimally Designing Games for Cognitive Science Research, Annual Conf. of the Cognitive Science Society, August 2012.
- M. Zaharia, T. Das, H. Li, S. Shenker and I. Stoica. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters, HotCloud 2012, June 2012.
- L. Martignoni, P. Poosankam, M. Zaharia, J. Han, S. McCamant, D. Song, V. Paxson, A. Perrig, S. Shenker, I. Stoica. Cloud Terminal: Secure Access to Sensitive Applications from Untrusted Systems, USENIX ATC 2012, June 2012.
- C. Engle, A. Lupher, R. Xin, M. Zaharia, M. Franklin, S. Shenker, I. Stoica. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory (demo), SIGMOD 2012, May 2012. Best Demo Award.
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI 2012, April 2012. Best Paper Award and Honorable Mention for Community Award.
2011
- T. Hunter, T. Moldovan, M. Zaharia, S. Merzgui, J. Ma, M.J. Franklin, P. Abbeel, and A.M. Bayen. Scaling the Mobile Millennium System in the Cloud, SOCC 2011, October 2011.
- M. Chowdhury, M. Zaharia, J. Ma, M.I. Jordan and I. Stoica, Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM 2011, August 2011.
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker and I. Stoica, Mesos: Flexible Resource Sharing for the Cloud, USENIX ;login:, August 2011.
- M. Zaharia, B. Hindman, A. Konwinski, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker and I. Stoica, The Datacenter Needs an Operating System, HotCloud 2011, June 2011.
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker and I. Stoica, Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, NSDI 2011, March 2011.
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, Dominant Resource Fairness: Fair Allocation of Multiple Resources Types, NSDI 2011, March 2011.
2010
- M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker and I. Stoica. Spark: Cluster Computing with Working Sets, HotCloud 2010, June 2010.
- M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker and I. Stoica. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling, EuroSys 2010, April 2010.
- M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica and M. Zaharia, Above the Clouds: A View of Cloud Computing, Communications of the ACM, April 2010.
- S. Guo, M. Derakhshani, M.H. Falaki, U. Ismail, R. Luk, E.A. Oliver, S. Ur Rahman, A. Seth, M.A. Zaharia, S. Keshav, Design and Implementation of the KioskNet System, Computer Networks, ISSN 1389-1286, DOI: 10.1016/j.comnet.2010.08.001
2009
- B. Hindman, A. Konwinski, M. Zaharia and I. Stoica, A Common Substrate for Cluster Computing, HotCloud 2009, June 2009.
- R. Luk, M. Zaharia, M. Ho, B. Levine and P. Aoki, ICTD for Healthcare in Ghana: Two Parallel Case Studies, ICTD 2009, April 2009.
2008
- M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz and I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, OSDI 2008, December 2008.
Before 2008
- S. Guo, M.H. Falaki, E.A. Oliver, S. Ur Rahman, A. Seth, M. Zaharia, U. Ismail, and S. Keshav, Design and Implementation of the KioskNet System, ICTD 2007, December 2007.
- S. Guo, M.H. Falaki, E.A. Oliver, S. Ur Rahman, A. Seth, M. Zaharia, and S. Keshav, Very Low-Cost Internet Access Using KioskNet, ACM Computer Communication Review, October 2007.
- M. Zaharia and S. Keshav, Gossip-based Search Selection in Hybrid Peer-to-Peer Networks, J. Concurrency and Computation: Practice and Experience, 2007.
- M. Zaharia, A. Chandel, S. Saroiu, and S. Keshav, Finding Content in File-Sharing Networks When You Can’t Even Spell, Proc. IPTPS, February 2007.
- A. Seth, D. Kroeker, M. Zaharia, S. Guo, S. Keshav, Low-cost Communication for Rural Internet Kiosks Using Mechanical Backhaul, Proc. MOBICOM 2006, September 2006.
- M. Zaharia and S. Keshav, Gossip-Based Search Selection in Hybrid Peer-to-Peer Networks, Proc. IPTPS, February 2006.
PhD Dissertation
An Architecture for Fast and General Data Processing on Large Clusters
Technical Reports
- M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing, UC Berkeley Tech Report UCB/EECS-2012-259, December 2012.
- R. Xin, J. Rosen, M. Zaharia, M.J. Franklin, S. Shenker, I. Stoica, and D. Song. Shark: SQL and Rich Analytics at Scale, UC Berkeley Technical Report UCB/EECS-2012-214, November 2012.
- M. Zaharia, S. Katti, C. Grier, V. Paxson, S. Shenker, I. Stoica, and D. Song. Hypervisors as a Foothold for Personal Computer Security: An Agenda for the Research Community, UC Berkeley Technical Report UCB/EECS-2012-12, January 2012.
- M. Zaharia, W.J. Bolosky, K. Curtis, A. Fox, D. Patterson, S. Shenker, I. Stoica, R.M. Karp, and T. Sittler, Faster and More Accurate Sequence Alignment with SNAP, arXiv:1111.5572v1, November 2011.
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, and I. Stoica, Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, UC Berkeley Technical Report UCB/EECS-2011-82, July 2011.
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, Dominant Resource Fairness: Fair Allocation of Multiple Resource Types, UC Berkeley Technical Report UCB/EECS-2011-18, March 2011.
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker, and I. Stoica, Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, UC Berkeley Technical Report UCB/EECS-2010-87, May 2010.
- M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, Job Scheduling for Multi-User MapReduce Clusters, UC Berkeley Technical Report UCB/EECS-2009-55, April 2009.
- M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica and M. Zaharia, Above the Clouds: A Berkeley View of Cloud Computing, UC Berkeley Technical Report UCB/EECS-2009-28, February 2009.
- S. Guo, M.H. Falaki, U. Ismail, E.A. Oliver, S. Ur Rahman, A. Seth, M. Zaharia, and S. Keshav, Design and Implementation of the KioskNet System (Extended Version), University of Waterloo Technical Report CS-2007-40, November 2007.
Adapted from a template by Andreas Viklund.