Given two or more protein-protein interaction (PPI) networks, we aim to find the best overall alignment of the networks, taking into account both the network topologies as well as sequence similarities between the individual proteins of the networks. This network alignment problem is analogous to the global sequence alignment problem--- we are interested in the best overall match between the two inputs.

We introduce a spectral approach to this problem. Over a series of papers, my collaborators and I have described the IsoRank and IsoRank-N algorithms for finding such alignments. Using these, we are able to predict functional orthologs--- cross-species gene correspondences that take into account both sequence and protein data. These functional orthologs may provide certain advantages over existing sequence-only orthologs.



We predict protein interactions (PPIs) computationally, given just the sequence data of two proteins. The goal is to augment existing experimental data whose coverage remains spotty. We use structure-based approaches to predict whether two proteins interact, given just their sequence data. The structure based predictions are combined with functional genomic data using machine learning techniques.



Discovering the structure and dynamics of signaling networks is a key goal of systems biology. Towards this, we propose an approach to combine PPI and RNA-interference data to produce high-confidence hypotheses about the structure of a signaling network. The work, which is ongoing, was first presented at ISMB 2007. In it, we introduce the idea of using a multi-commodity flow framework to set up constraints on the structure of a signaling network, given PPI data and knock-down information from RNAi experiments. The constraints describe an Integer Linear Program (ILP) whose LP relaxation is then solved.



The Yeast 2-Hybrid protocol is one of the two main experimental approaches to discovering PPIs in a high-throughput way, the other being Co-Immunoprecipitation. The Y2H protocol is susceptible to some systematic biases, the most problematicbeing that certain proteins can behave "promiscuosly" in the assay and be responsible for many false-positive PPI pairs. We describe a Bayesian approach to modeling this systematic error. This approach allows us to combine information across multiple datasets and make more nuanced inferences than existing approaches.



One of the problems with performing gene-perturbation experiments is choosing the right cut-offs for the signal-vs-noise threshold in the assay. A too-high threshold will exclude promising hits; a too-low threshold will slow down downstream analysis with irrelevant genes. In the context of RNA-interference assays, we started with the intuition that the intended set of hits should share similar functions and hence be well-connected in the PPI network. We designed quantitative measures that express, given the list of all RNAi scores, how changes in cut-off will impact the connectivity (w.r.t. random) of the chosen set of hits. This leads to intuitive ways of selecting cut-offs for the experiment.



In protein structure prediction, one of the challenges is in efficiently exploring the local neighborhood of a conformation. We propose an approach that uses concepts from inverse kinematics (in robotics) to change a small part of a protein's backbone without changing anything else. This operation can be applied arbitrarily many times to explore the local neighborhood. Using this approach, we construct ensemble models of protein structure that better explain X-ray crystallization data than single-conformer models and are more effective than existing ensemble models.



Microarray experiments can quickly get costly, especially if one has to perform a number of them as part of a time-series study. We use the concept of active learning to compute the optimal points along the time-line at which microarray experiments should be run. The intuition is that the sampling should be focused in time-regions where the gene expression curves are least well-characterized.



  • Beckett Sterner, Bonnie Berger and I wrote a paper on using information theoretic ideas to identify and annotate active sites in proteins.
  • Mitul Saha and I wrote a paper on searching for a 3-D protein fragment in a database of protein structures.