UPDATE: I am on leave from MIT -- currently working at Instabase.

I am a Ph.D. student in the Computer Science & Artificial Intelligence Laboratory (CSAIL) at MIT, co-advised by David Karger and Samuel Madden. I received a M.S. in Computer Science from Stanford University and a B.E. in Computer Engineering from the University of Pune. At Stanford, I worked with Scott Klemmer, and Jeff Heer.

My primary interest these days is in developing systems and tools for data management. My projects draw ideas from various fields such as databases, distributed systems, algorithms, machine learning, and human-computer interaction.


DataHub is a unified, managed, collaborative platform for making data-processing easy. It consists of: (1) a flexible data store (files, relational databases, extensible to other data-storage backends) with sharing/collaboration capabilities, managed on behalf of different users/groups, and (2) an app ecosystem that hosts apps for various data-processing activities such as ingestion, curation, integration, discovery, query, analytics, visualization, and machine learning. The DataHub users can use any of the apps from the App Center for processing their data as it fits their need.

GitHub Repo:


Confer is a tool for conference planning. It helps a) conference attendees find interesting papers/talks, organize schedule, and discover people with similar interests, and b) conference organizers schedule sessions, plan community interactions, understand community structure, and discover new areas of research, practice, methodologies, and emerging application areas. It has been deployed at 13 academic conferences including CHI, CSCW, KDD, ACM MM, SIGMOD, SIGIR, and WSDM, and has more than 18,000 unique users.



Distill is a general purpose example-based data cleaning/extraction tool for converting semi-structured text into a structured table. A user provides a few examples (2 or 3 examples) by specifying the desired tabular output for a given sample input text. The system uses the sample examples to automatically infer a model which can be used to extract the complete table from the raw file.


Barista is a distributed, synchronously replicated, fault tolerant relational data store. It runs as a middleware service over database instances to provide an abstraction for a distributed relational store. It ensures that the data is replicated across many sets of Paxos state machines in replica groups to provide fault-tolerance and recovery. The replication enables load balancing and availability; clients automatically failover between replicas. Barista exposes SQL for data management. Client applications can use Barista with the same SQL code they used before, and under the hood it guarantees replication, consistency, and fault-tolerance seamlessly.


more projects...


Anant Bhardwaj, Amol Deshpande, U Maryland UMD, Aaron Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, Rebecca Zhang. Collaborative Data Analytics with DataHub. VLDB 2015.

Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J Elmore, Samuel Madden, and Aditya G Parameswaran. DataHub: Collaborative Data Science & Dataset Version Management at Scale. CIDR 2015.

Anant Bhardwaj, Juho Kim, Steven P. Dow, David Karger, Sam Madden, Robert C. Miller, and Haoqi Zhang. Attendee-Sourcing: Exploring the Design Space of Community-Informed Conference Scheduling. HCOMP 2014.

Juho Kim, Haoqi Zhang, Paul Andre, Lydia B. Chilton, Anant Bhardwaj, David Karger, Steven P. Dow, and Robert C. Miller. Cobi: Community-Informed Conference Scheduling. HCOMP 2013.

Anant P. Bhardwaj, Dave Luciano, and Scott R. Klemmer. Redprint: Integrating API specific "instant example" and "instant documentation" display interface in IDEs. UIST 2011.