navigation
Research
Here are projects I've worked on with others, the most recent appearing first:
- Qurk, a Crowd-powered Database (with Eugene Wu, David Karger, Samuel
Madden, and Rob Miller) - Crowdsourcing platforms such as Amazon's
Mechanical Turk make it possible to organize crowd workers to perform tasks
like translation or image labelling on demand. Building these workflows
is challenging: how much should you pay crowd workers? can you trust
the output of each worker? how can you coordinate workers to perform
complicated high-level tasks? Qurk helps you build crowd-powered
data processing workflows using a SQL-like language while tackling these
challenges on your behalf.
- Twitter Stream Processing and Visualization (with Osama Badar,
Michael Bernstein, David Karger, Samuel Madden, and Rob Miller) - Social
streams like Twitter offer us a great source of unstructured data. This
data can be aggregated to summarize the events that make up a news story or
detect events such as earthquakes and the flu.
- Sync Kit: A Persistent Client-Side Database Caching Toolkit for
Data Intensive Websites (with Edward Benson, David Karger, and Samuel
Madden) - Increase throughput and drive down bandwidth consumption by using
HTML5-style in-browser relational databases to cache data.
- FeedMe: Understanding and Supporting Directed Link-Sharing (with Michael
Bernstein, David Karger, and Rob Miller) - Receive recommendations in
Google Reader as to which of your friends might like a news feed item.
- PhotoCalorie: Picture your Diet (with Mark Boguski, Vincent
Fusaro, and Larry Istrail) - iPhone food journal. Take pictures of what
you eat, and we'll estimate the nutritional content of the food.
- DataPress: Data and Visualizations for Blogs (with Edward
Benson, Fabian Howahl, and David Karger) - Plugin for WordPress to allow
bloggers WYSIWYG-access to embed data, data visualizations, and share their datasets.
- Interactive Alerts: Cellphone Patient Alert/Identification System using RFID (with many people across
MIT, MGH, and Indus Hospital, Pakistan) - Prototyped a cellphone alert system which identified infants using RFID cards on cellphones with NFC readers. This enabled a team to treat patients found to have Pneumonia.
- BlendDB: A Relational Database that Supports Efficient Web
Browsing Queries (with David Karger and Samuel Madden) - Increase
database query throughput by reducing disk seeks when users browse
related items in the database.
- Scalable Semantic Web Triple Stores - The semantic web is a
concept that was proposed by the
W3C and is
being used in a growing number of applications, including biology and
libraries.
While the potential uses for semantic web
technologies are readily apparent, an essential step in realizing the
semantic web vision is making systems to store, index, and query
semantic web data (RDF). We are researching and benchmarking
methods of storing semantic web data in a relational database
efficiently. This work was done with Daniel Abadi, Kate
Hollenbach, and Samuel Madden at MIT. Relevant Documents:
- Collaborative Information Organization - Information
workers such as scientists often generate data and findings that they
wish to share with their collaborators. They then annotate their data
to add useful metadata which describes their findings. As an
undergraduate at RPI, I helped create an initial prototype of a system
that allows metamorphic pretrologists to collaborate and annotate their
findings in this way. The project continues and has grown, but while I
was at RPI, my collaborators were Sibel Adali, Boleslaw Szymanski,
Frank Spear, and Bouchra Bouqata. Relevant Documents:
- Effective Web-Scale Crawling - Web Crawlers are used to
traverse the web and find new or updated content to be indexed by
search engines and other organizations. I spent a summer internship at
IBM's Almaden Research Center working on making the WebFountain web
crawler "intelligent" so that it could prioritize websites to
crawl and recrawl. Collaborators included Roberto Bayardo, David
Blackman, Ian Bergman, Ivan Gonzalez, Daniel Meredith, and Linda
Nguyen. Relevant Document: