The Data Science Machine

Max Kanter Kalyan Verramachaneni
CSAIL, MIT

Why did we create the Data Science Machine?

In recent years, more and more data has begun to be collected, and is starting to come online (with cloud infrastructure). As data scientists who regularly work with this data, we noticed a few important aspects:

Different data problems

With this type of data and increased need, and as we set out to create data-driven solutions for multiple problems, we noticed:

Different data problems

We wanted to address our own need to scale up our data science efforts, and also to answer this question: How can we reduce the time it takes to bring the data to a format that is usable by machine learning algorithms? How can we bring some systematicity in this process, while considering the maximum possible complexities in the data?

Why fully automate?

We think it is important to challenge ourselves to build something that could replace our manual efforts with each new incoming data problem. A challenge like that pushes us to develop technologies that we would not conceive otherwise. However, once addressed, the technologies developed as part of the system are often used to enhance and aid humans, and not necessarily to replace humans. But starting with this goal allowed us to think big