Tommi S. Jaakkola, Ph.D.
Professor of Electrical Engineering and Computer Science

MIT Computer Science and Artificial Intelligence Laboratory
Stata Center, Bldg 32-G470
Cambridge, MA 02139

E-mail: tommi at







Our primary research areas include machine learning, computational biology, and information retrieval.

Machine learning: In machine learning the focus has been on principles, methods, and algorithms pertaining to problems that are fundamental to learning and also of substantial value across a number of modern applied domains, especially computational biology and information retrieval. Abstract problem structures that we need to solve theoretically are often shared across applied areas. For example, decisions typically have to be made on the basis of multiple heterogeneous, predominantly incomplete, or fragmented information sources that may change over time.

  • Large scale (approximate) inference
  • Semi-supervised learning
  • Matrix completion / collaborative filtering
  • On-line learning
  • Reinforcement learning (older)
  • Computational biology: In computational biology the motivation comes from the need to understand cellular mechanisms responsible for transcriptional control. This is a problem of enormous scientific and practical importance. Our work has focused first on model organisms such as yeast with the ultimate goal of understanding regulatory control in more complex human cells.

  • Reconstruction of molecular interaction networks
  • Computational experiment design
  • Cell-cycle and time course expression analysis
  • Remote protein homology (older)
  • Information retrieval: In information retrieval the goal has been to develop methods that allow us to find a few pieces of relevant information within a large dataset of predominantly incomplete and superficially similar information. This involves, for example, understanding active interaction between the user and the system, exploiting partially annotated datasets, and using previously solved retrieval tasks to better solve a new one.

  • Text classification
  • collaborative filtering
  • Active information retrieval
  • Some current projects

    Collaborative prediction

    Recommender systems are increasingly mediating people's access to information. Many of such systems operate collaboratively in the sense that each user's limited experience with the system is strengthened and complemented by the experience of other users. The collaborative prediction approach, we argue, remains attractive even and especially when data are collected from a large number of users and a large number/types of items (movies, books, travel, even websites). Recommendations in collaborative prediction are derived in part from co-occurences (intersections) and thus the relevant information for predictions remains sparse despite access to seemingly plentiful data. This setting is ideal for collaborative prediction.

    There are two critical issues with collaborative prediction that we seek to address: scaling to realistic problem sizes and robustness. In terms of scaling, the approach we take is, at the core, tied to on-line algorithms for structured prediction and leverages recent advances in distributed incremental optimization. The approach is particularly well-suited for providing recommendations such as meals, vacations, or travel routes. Another critical issue is robustness. Typical collaborative prediction methods have difficulty of properly incorporating user feedback (due to various selection biases) as well as dealing with adversaries intent on biasing recommendations to their advantage. We develop estimation methods that are invariant to such influences.

    Probabilistic inference, learning

    Prediction lies at the heart of engineering and sciences. The objects of interest for prediction can be increasingly understood in terms of discrete structures such as annotations, rankings, alignments, or arrangements. This is the case, for example, in natural language processing (parsing, tagging), computer vision (stereopsis, scene understanding), computational biology (molecular structures, sequence alignment), material science (optimization of crystal structures, alloys), as well as recommender systems (rankings). Such structures also appear in various combinations with each other as in parsing a sentence based on part of speech tags, creating a phylogenetic tree based on aligned sequences, or annotating samples following data association (matchings). Predictions over structures are made in a context dependent manner. For example, a parse tree (discrete structure) is derived for a given sentence (the context).

    The key part of the overall prediction methodology involves translating any new context into scores over discrete structures and subsequently finding the highest scoring structure over the space of valid structures, i.e., solving the associated combinatorial optimization problem, parameterized by context. One of the major issues is the complexity of the discrete structures involved. The complexity of realistic models or scaling requirements often rule out exact calculations in generating predictions. In this project, we create a robust framework for prediction involving discrete structures through linear programming relaxations, develop scalable learning and prediction algorithms for large scale problems by decomposing the overall problem into simpler exactly solvable but interacting components, understand when we can and cannot expect to provide accurate predictions over complex structures, and demonstrate the approach on several key applied tasks.

    Predictive user modeling

    We can model how user interacts with a mobile device for the purpose of anticipating their actions and potential needs. For example, we can narrow down the set of numbers they are likely to call based on the time of day, the context (e.g., location), and/or current activity (e.g., applications that are running). The ability to make these and other predictions can be further strengthened by looking back on past interaction with the user such as their preceding actions or locations. The challenge is to abstract and distill this available information about the user -- past experience and their current activity -- into a predictive notion of a user state. Once we gain access to the user state, there are many ways to exploit it. For example, knowledge of the state would help us anticipate and facilitate user actions, from keystrokes to applications, help resolve ambiguous or incomplete voice commands, as well as contribute to recommending services that the user may be most receptive to.

    Our specific goal is to distill user experience into personalized predictions by means of learning to rank alternative information items. Since groups of users have similar preferences, personalized recommendations are best estimated collaboratively across users so as to gain additional statistical power when faced with limited experience with any particular user.

    Learning models for situational awareness

    Situational awareness lies at the core of tactical decision making. The decisions may be geared towards personalizing user experience, assisting users via recommendations, or acquiring additional information to better understand the situations they are in. The awareness results from the ability to represent the key objects in the environment, possible interactions between them, and be able to predict the consequences of any actions taken by the user or the system.

    There are two broad problems in this context: 1) develop flexible causal models of the user environment, and 2) map an array of sensor readings and the corresponding historical record to an appropriate context dependent model. In this project, we focus primarily on the second part, condensing sensor readings to an effective causal model, represented by dynamic Bayesian models. We limit the model initially to involve user ``activities'' rather than more detailed descriptions of their environments such as individuating the objects surrounding them. As a result, the type of actions whose consequences we intend to predict are also restricted, reflecting transitions between user activities. In contrast, we permit the context to vary widely with intermittent and incomplete sensor readings, focusing on learning how to map such user experience to clear-cut models of their environment. At the heart of this project is data integration across contexts. The integration is necessary to create a clear representation of the current context from sparse sensor readings.