Research Statement

This essay provides an overview of my research goals. It aims to provide a narrative that thematically stitches together the individual papers you can find listed in my publication list. I've also peppered it with pointers to the various tools we built as part of our research.

At a high level, what I'm passionate about is making it easier for people to manage the information they care about. I believe that people have lots of insight about their information, how it should be structured, what they want to say about it, and how they want to understand, organize, present, share, and manipulate it. But that they are limited by tools that don't let them use their knowledge, skill, insight, and creativity. My goal is to build tools that give people more power to do what they want, and to demonstrate that these tools make people more effective. This has led me through a variety of different areas of Computer Science including Information Retrieval, Machine Learning, Databases, and the Semantic Web, but primarily HCI because I found that the biggest barriers to empowering users are in the interfaces they use.

Right now I tackle this from two distinct perspectives that I hope will converge at some point in the future: working with structured data and working with online group discussion.

Structured Data

I've been working at the structured data angle almost since I started at MIT 20 years ago; if you want to go all the way back you can find the earliest paper we published on the Haystack system, which gave my group its name, in 1999. It laid out the general idea that the only way to make tools flexible enough to let every user manage information the way they want is to let every user create their own information management tools. And of course, if every user is going to be able to do this, it has to be doable without programming.

So we've done a long series of projects aimed at empowering end users to build their own specialized information management interfaces tools without programming, starting with the Haystack system I mentioned above (which kept us busy for several years---probably the best summary is this book chapter, or you can read this short 4-page summary), then moving on to a series of web-based tools: Exhibit (WWW 07), Dido (UIST 09), Cascading Tree Sheets (WWW 13) and Quilt (UIST 14). Exhibit ended up in use on about 2000 web sites, which let us write an interesting followup study of how and why people were using it (CHI 14). The most recent undertaking in this line is Mavo (UIST 16) which lets you build complete functional web applications just by editing an HTML document. These tools all aimed at a “web application” style of interaction; a rather different line of work led to Sieuferd (CHI 11, SIGMOD 16), a tool the generalizes a plain old spreadsheet to a graphical interface that gives non-programmers the full power of SQL database systems without requiring any SQL knowledge or programming.

These tools all focused on the interfaces, but it's equally important to think about the back ends, the computation, and access to the data that people need. I'm dissatisfied with the current web ecology where most data (even the data that people create and should “own”) is locked up inside particular web sites you have to visit to access that data (using and limited by the interfaces to those web sites). I want to move us to a different ecology where data is open and migrates to where it's needed, allowing you to access it and combine it according to your own needs, then interact with it using your own custom interfaces (again, all without programming). This led me to spend a good amount of time in the Semantic Web community, although I haven't worked as much in that space recently because I think they're concentrating on the wrong problems. Anyway, to explore those ideas of open and transparent data we built tools like Piggy Bank (ISWC 05), Sifter (UIST 06), Potluck (ISWC 07) and Atomate (WWW 10). The interface tools I listed previously also all reflect this same open data perspective. More recently, inverting that idea, we pursued a project called Datahub (VLDB 15)---a github for data---that explored the idea of having a single hosted environment where all sorts of people could store, manage, and connect all different kinds of data and easily build lightweight applications over it. Datahub is an active project, but somewhat stalled because it was so successful that the student who was building it left to turn it into a startup, Instabase. I'd love to get it rolling again.

This work all focused on managing structured data. Along the way we also explored information management more broadly, looking at the big mess of “information scraps” that people manage in sloppy ways (TOIS 08), and built a tool called List.it to help people tackle those scrape and study how they do it (CHI 11). At peak List.it had 25,000 users, and we still have all their data and usage logs waiting to be analyzed.

Online Discussion Tools

The second theme I've been pursuing is online discussion. This started more recently, but my interest in it has been growing steadily and took a giant jump when I saw what a total failure online discussion (and information sharing generally) was in the 2016 presidential election. I'm specifically interested in online discussion as an information sharing tool; I want to find ways to help people learn and understand what is true about the world around them, discuss how to make it better, and arrive at consensus. This includes both online education and civic discourse, which I actually think should be brought much closer together than they are presently.

As with structured data, I believe current interfaces are far too impoverished, offering little more than a linear stream of posts (possibly threaded) and no better curation techniques than upvotes and majority rule. So we've been building a number of tools that aim to enrich the discussion space. We got started around 2007 with work on Nb (CHI 12), which is in most ways a standard discussion forum except that it is intended to help people talk about a given document (pdf or html of video) by anchoring each post in a specific place in the document. We aimed Nb at the classroom where there are lots of course texts and notes that students want to talk about. It's been used by over 10,000 students in a few hundred classes around the world, and we've got over a million comments in the database waiting to be studied. Although it started as just an exploration of a way to structure discussion, Nb has raised all sorts of questions about online education, around how discussion helps people learn, and what we can do to make that learning happen better through better discussion.

More recently, we've tackled a whole series of systems that try different modes of online discussion. Eyebrowse (CSCW 16) explores ways to help people share and discuss more about the information they're consuming online. With Murmur, we're trying to reinvent the mailing list, which is still a remarkable popular and powerful discussion tool given that it hasn't changed in 40 years. We studied what people liked and didn't about mailing lists (CHI 15), and are building Murmur to try to make a tool that better meets their needs. A descendant project, Squadbox, is using Murmur as a platform to protect victims of email harassment by recruiting a team of friends and volunteers to moderate incoming email for them. We've also been developing Wikum (CSCW 17), a tool that aims to bridge the gap between discussion forums, where people talk without end but never arrive at consensus, and Wikis, which are great for showing you the consensus but provide no way to capture or explore all the conversation that led to that consensus. And we built a tool called Confer that is currently in use in many of the main computer science conferences---on the surface it's a conference program, but underneath it's a tool for exploring ways to improve the way people meet and share knowledge in a professional setting.

As you can see, we're very much a system-building group. But we combine the system building with studies of users both before and after we build the system. This means a lot of what we build gets deployed (and open sourced) so we can study its use in the wild (though I have to admit that I also just really enjoy putting stuff out in the real world where it seems to be helping people, independent of what research results it yields). As I discussed, there are two subthemes in the work, one focused on structured data and one focused on online conversation. You may also have noticed that pretty much all the tools are “dumb”—contrary to the current boom in machine learning and intelligent interfaces, the tools we build tend to leave the intelligence to the human users.