Abstract

Influence Flow: Integrating Pathway-specific RNAi data and Protein Interaction Data

Rohit Singh and Bonnie Berger.

> Understanding the detailed structure of signaling sub-systems is a > major biological challenge. Often, the core cascade of the sub-system > is well-understood and our goal is to ascertain the other > genes/proteins involved and the corresponding network topology (e.g., > the MAP Kinase signaling network). Towards this goal, we describe > influence flow: a novel method for generating high-confidence > hypotheses about a specific signaling network's topology. These > hypotheses may then be used to direct further experiments. > > Our method combines pathway-specific RNA interference (RNAi) data with > genome-wide protein interaction networks. The RNAi data is generated > from a functional genomic screen of a specific signaling pathway. > These screens work as follows: a known end-effector gene of the > pathway is chosen as the reporter gene (e.g., Erk in the MAPK > pathway). Every other gene in the genome is systematically > knocked-down using RNAi and the effect on the reporter is measured. > The experiment produces a list of genes (hits) that significantly > influence the reporter and, for each hit, a score indicating the > relative strength of its influence. The second input to our method is > genome-wide protein-protein interaction (PPI) data (protein-DNA > interactions can also be included). To minimize false negatives in PPI > data, we use computational methods to predict new PPIs from other data > sources and from PPI data in other species. To mitigate the impact of > false positives in the data, we can estimate confidence values for > each edge in the PPI network and take these into account during our > computations. Given these inputs, we search for a directed acyclic > protein network such that all its edges are consistent with the input > PPI data, all its nodes and their relative placement is consistent > with the RNAi data. Furthermore, we require that the output topology > reflect the following biological intuition: for most proteins not in > the core cascade, their influence on the end-effector protein is > transmitted via the core cascade. Specifically, given an input PPI > network N, an RNAi reporter gene T, the corresponding list of RNAi > hits L = {i} and their scores S = {s_i }, our desired network G must > satisfy the following conditions A1-A4 and be optimal under condition > A5: > > A1. All the nodes in G are present as RNAi hits (i.e. in L). > A2. Each edge in G is directed. Also, each directed arc a->b in G is > either in the core cascade or corresponds to an edge a--b in N. > A3. Every node in G has a directed path to the target gene T. > A4. Nodes closer to T should have higher RNAi scores. If G has an arc > a->b that is not part of the core cascade, then s_a < s_b . > A5. For most nodes, the path(s) towards T should be routed through > the core cascade, i.e., the last segment of the path(s) should only > contain edges from the known core cascade. > > The optimal network G must satisfy A1-A4 and have the maximum number > of nodes that satisfy A5. We compute this solution by formulating an > integer linear program (ILP), borrowing ideas from the multi-commodity > network flow literature. The constraints A1-A5 are quite simple; yet, > the inferred influence flow network contains surprisingly plausible > hypotheses. When supplied only a part of the known MAPK cascade (in > fly), our method can successfully discover the other known components.