|
|
|
TRILOGY: Discovery of sequence-structure patterns across diverse proteins
|
|
Philip Bradley, Peter S. Kim, and Bonnie Berger
|
|
|
|
We describe a new computer program, TRILOGY, for the automated
discovery of sequence-structure patterns in proteins. TRILOGY
implements a pattern discovery algorithm that begins with an
exhaustive analysis of flexible three-residue patterns; a subset
of these patterns are selected as seeds for an extension process
in which longer patterns are identified. A key feature of the
method is explicit treatment of both the sequence and structure
components of these motifs: each TRILOGY pattern is a pair
consisting of a sequence pattern and a structure pattern. Matches
to both these component patterns are identified independently,
allowing the program to assign a significance score to each
sequence-structure pattern that assesses the degree of
correlation between the corresponding sequence and structure
motifs. TRILOGY identifies several thousand high-scoring patterns
that occur across protein families. These include both previously
identified and potentially novel motifs. We expect that these
sequence-structure patterns will be useful in predicting protein
structure from sequence, annotating newly determined protein
structures, and identifying novel motifs of potential functional
or structural significance. Further details on 7,768 significant
patterns identified by TRILOGY can be found at
http://theory.lcs.mit.edu/trilogy
|
|
http://www.pnas.org/cgi/content/full/99/13/8500
|
|
|
|