Manasi Vartak

PhD Student, Database Group, MIT CSAIL

Interests: Data Analytics, Data Visualization,
Systems to Support Machine Learning



Email: mvartak _at_ csail.mit.edu
Twitter: @DataCereal


I am a PhD student in the MIT Database Group where I work on systems to support data analytics, including visualization and machine learning. My research focuses on automating and speeding up analytic tasks while keeping the human in the loop. My advisor is Sam Madden.

In the past, I've worked/interned at Microsoft, Google, and Facebook. I've also worked on genomics, clothing recommendations, and GI health monitoring devices. I am a recipient of the Facebook PhD Fellowship (2016) and the Google Anita Borg Scholarship (2013).


SeeDB: A Data-Driven Visualization Recommender System



Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SEEDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most “useful” or “interesting”. The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics.

Screencasts:

Publications:
SeeDB Demo, VLDB 2014
BigDawg Demo, VLDB 2015
Full Paper, PVLDB Volume 8, Issue 13

User Preferences in Visualization Recommendation

In this ongoing work, we extend SeeDB to incorporate user preferences for visualizations in terms of visualization types, visual encodings as well as trends and patterns of interest. We adapt and apply techniques from traditional recommender systems to discover user preferences in visual analysis.