Research

Here are projects I've worked on with others, the most recent appearing first:
  • Qurk, a Crowd-powered Database (with Eugene Wu, David Karger, Samuel Madden, and Rob Miller) - Crowdsourcing platforms such as Amazon's Mechanical Turk make it possible to organize crowd workers to perform tasks like translation or image labelling on demand. Building these workflows is challenging: how much should you pay crowd workers? can you trust the output of each worker? how can you coordinate workers to perform complicated high-level tasks? Qurk helps you build crowd-powered data processing workflows using a SQL-like language while tackling these challenges on your behalf.
  • Twitter Stream Processing and Visualization (with Osama Badar, Michael Bernstein, David Karger, Samuel Madden, and Rob Miller) - Social streams like Twitter offer us a great source of unstructured data. This data can be aggregated to summarize the events that make up a news story or detect events such as earthquakes and the flu.
  • Sync Kit: A Persistent Client-Side Database Caching Toolkit for Data Intensive Websites (with Edward Benson, David Karger, and Samuel Madden) - Increase throughput and drive down bandwidth consumption by using HTML5-style in-browser relational databases to cache data.
  • FeedMe: Understanding and Supporting Directed Link-Sharing (with Michael Bernstein, David Karger, and Rob Miller) - Receive recommendations in Google Reader as to which of your friends might like a news feed item.
  • PhotoCalorie: Picture your Diet (with Mark Boguski, Vincent Fusaro, and Larry Istrail) - iPhone food journal. Take pictures of what you eat, and we'll estimate the nutritional content of the food.
  • DataPress: Data and Visualizations for Blogs (with Edward Benson, Fabian Howahl, and David Karger) - Plugin for WordPress to allow bloggers WYSIWYG-access to embed data, data visualizations, and share their datasets.
  • Interactive Alerts: Cellphone Patient Alert/Identification System using RFID (with many people across MIT, MGH, and Indus Hospital, Pakistan) - Prototyped a cellphone alert system which identified infants using RFID cards on cellphones with NFC readers. This enabled a team to treat patients found to have Pneumonia.
  • BlendDB: A Relational Database that Supports Efficient Web Browsing Queries (with David Karger and Samuel Madden) - Increase database query throughput by reducing disk seeks when users browse related items in the database.
  • Scalable Semantic Web Triple Stores - The semantic web is a concept that was proposed by the W3C and is being used in a growing number of applications, including biology and libraries. While the potential uses for semantic web technologies are readily apparent, an essential step in realizing the semantic web vision is making systems to store, index, and query semantic web data (RDF). We are researching and benchmarking methods of storing semantic web data in a relational database efficiently. This work was done with Daniel Abadi, Kate Hollenbach, and Samuel Madden at MIT. Relevant Documents:
  • Collaborative Information Organization - Information workers such as scientists often generate data and findings that they wish to share with their collaborators. They then annotate their data to add useful metadata which describes their findings. As an undergraduate at RPI, I helped create an initial prototype of a system that allows metamorphic pretrologists to collaborate and annotate their findings in this way. The project continues and has grown, but while I was at RPI, my collaborators were Sibel Adali, Boleslaw Szymanski, Frank Spear, and Bouchra Bouqata. Relevant Documents:
  • Effective Web-Scale Crawling - Web Crawlers are used to traverse the web and find new or updated content to be indexed by search engines and other organizations. I spent a summer internship at IBM's Almaden Research Center working on making the WebFountain web crawler "intelligent" so that it could prioritize websites to crawl and recrawl. Collaborators included Roberto Bayardo, David Blackman, Ian Bergman, Ivan Gonzalez, Daniel Meredith, and Linda Nguyen. Relevant Document: