I am an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), founding co-director of the Data System and AI Lab (DSAIL) at MIT, and co-founder of einblick analytics, inc.
My group aims to dramatically increase the efficiency of data-intensive systems and democratize data science by enabling a broader range of users to unfold the potential of (their) data through the development of a new generation of algorithms and systems. This entails exploring how we can build systems to better support the recent advances in machine learning (Systems for ML) and how we can leverage machine learning to improve systems (ML for Systems). For example, with our work on SageDB we started to explore how we can enhance or even replace core systems components using machine learning models and early results suggest, that we can improve the state-of-the-art by more than an order-of-magnitude in performance. On the other hand, with Northstar we are exploring new user interfaces and infrastructure to democratize data science by enabling visual, interactive, and assisted data exploration and model building. One particular focus of this work is to help all types of users to analyse data and build models faster, but also make data exploration and model building safer by automatically preventing the user from common pitfalls.
Our work has been featured several times by the media (TechCrunch, Science, O'Reilly among others) and we are proud, that we had significant impact on academia and industry. For example, Northstar is now being commercialized by einblick analytics backed by venture capital and our ML for Systems work is getting extended by countless researchers around the world (for a slightly outdated overview see our SIGMOD 2019 tutorial on Learned Data Structures and Algorithms) and is even finding its way into some cloud products of leading internet companies.
I am fortunate to be working with an outstanding team of grad student, under-graduates, and post-docs, with numerous collaborators from academia and industry, and grateful for the research funding we have been receiving from NSF, DARPA, Airforce, Google, Microsoft, and Intel.
Current Research Interests
- ML-enhanced data structures and algorithms
- Systems for interactive data exploration and model building
- Infrastructure for rack-scale analytics and machine learning
- Transaction processing over high-speed networks
- Hybrid human-machine data management systems
Research Projects
In the following, a list of my current and past research projects:
- Learned Systems Components - How to Enhance traditional data structures and algorithms through machine learning
- Northstar - A System for Interactive Data Science
- NAM - Redefining Databases for the Next Generation of Networks
- QUDE - Quantifying the Uncertainty in Data Exploration
- Tupleware - Redefining Modern Analytics on Modern Hardware
- MLBase - The Distributed Machine-Learning Management System
- S-Store - A streaming OLTP system for big velocity applications
- MDCC - The Fastest Strong Consistent Multi-Data Center Replication Protocol
- CrowdDB - Answering Queries with Crowdsourcing
- PIQL - Performance Insightful Query Language
- Cloudy/Smoky - a distributed storage and streaming service in the cloud
- Building a database on cloud infrastructure
- CloudBench - a benchmark for the cloud
- Zorba - a general purpose XQuery processor implementing in C++
- MXQuery - A lightweight, full-featured Java XQuery Engine
- Mapping Data to Queries (MDQ) - data integration with XQuery
- XQIB - XQuery In the Browser