Publications of Sebastian Will

Overview 2010

Mathias Möhl, Raheleh Salari, Sebastian Will, Rolf Backofen, and S. Cenk Sahinalp.
Sparsification of RNA structure prediction including pseudoknots. In Proc. of the 10th Workshop on Algorithms in Bioinformatics (WABI), page 12, 2010. Accepted.
[bib]
Alessandro Dal Palu, Mathias Möhl, and Sebastian Will.
A propagator for maximum weight string matching with arbitrary pairwise dependencies. In Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming (CP-2010), page 8, 2010. Accepted.
[bib]
Alessandro Dal Palu, Mathias Möhl, and Sebastian Will.
Alignnment of RNA with structures of unlimited complexity. In Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB 2010), page 7, 2010. Accepted.
[bib]
Cameron Smith, Steffen Heyne, Andreas S. Richter, Sebastian Will, and Rolf Backofen.
Freiburg RNA tools: A web server integrating IntaRNA, ExpaRNA and LocARNA. Nucleic Acids Research, 2010. Accepted.
[pdf] [bib]
Raheleh Salari, Mathias Möhl, Sebastian Will, S. Cenk Sahinalp, and Rolf Backofen.
Time and space efficient RNA-RNA interaction prediction via sparse folding. In Proc. of RECOMB 2010, 2010. Accepted.
[bib]
Mathias Möhl, Sebastian Will, and Rolf Backofen.
Lifting prediction to alignment of RNA pseudoknots. Journal of Computational Biology, 2010. Accepted.
[bib]

Overview 2009

Mathias Möhl, Sebastian Will, and Rolf Backofen.
Lifting prediction to alignment of RNA pseudoknots. In Serafim Batzoglou, editor, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), volume 5541 of LNBI, pages 285--301. Springer, 2009.
[pdf] [Proofs.pdf] [bib]
Jérôme Waldispühl, Charles W. O'Donnell, Sebastian Will, Srinivas Devadas, Rolf Backofen, and Bonnie Berger.
Simultaneous alignment and folding of protein sequences. In Serafim Batzoglou, editor, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), volume 5541 of LNBI, pages 339--355. Springer, 2009.
[pdf] [bib]
Martin Mann, Rolf Backofen, and Sebastian Will.
Equivalence classes of optimal structures in HP protein models including side chains. In Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009.
[pdf] [arXiv] [bib]
Steffen Heyne, Sebastian Will, Michael Beckstette, and Rolf Backofen.
Lightweight comparison of RNAs based on exact sequence-structure matches. Bioinformatics, 25 no. 16 pp. 2095-2102, 2009.
[pdf] [supplementary_data.pdf] [doi] [pubmed] [bib]
Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, and Rolf Backofen.
CPSP-web-tools: a server for 3D lattice protein studies. Bioinformatics, 25 no. 5 pp. 676-7, 2009.
[pdf] [doi] [pubmed] [bib]
Jérôme Waldispühl, Charles W. O'Donnell, Sebastian Will, Srinivas Devadas, Rolf Backofen, and Bonnie Berger.
Simultaneous alignment and folding of protein sequences. In Serafim Batzoglou, editor, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), volume 5541 of LNBI, pages 339--355. Springer, 2009.
[pdf] [bib]
Mathias Möhl, Sebastian Will, and Rolf Backofen.
Lifting prediction to alignment of RNA pseudoknots. In Serafim Batzoglou, editor, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), volume 5541 of LNBI, pages 285--301. Springer, 2009.
[pdf] [Proofs.pdf] [bib]

Overview 2008

Stephan H. Bernhart, Ivo L. Hofacker, Sebastian Will, Andreas R. Gruber, and Peter F. Stadler.
RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics, 9 pp. 474, 2008.
[pdf] [doi] [pubmed] [bib]
Mathias Möhl, Sebastian Will, and Rolf Backofen.
Fixed parameter tractable alignment of RNA structures including arbitrary pseudoknots. In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM 2008), LNCS, pages 69-81. Springer-Verlag, 2008.
[pdf] [bib]
Sebastian Will, Anke Busch, and Rolf Backofen.
Efficient sequence alignment with side-constraints by cluster tree elimination. Constraints Journal, 13 no. 1 pp. 110-129, 2008.
[pdf] [doi] [bib]
Wolfgang Otto, Sebastian Will, and Rolf Backofen.
Structure local multiple alignment of RNA. In Proceedings of German Conference on Bioinformatics (GCB'2008), volume P-136 of Lecture Notes in Informatics (LNI), pages 178-188. Gesellschaft für Informatik (GI), 2008.
[pdf] [bib]
Steffen Heyne, Sebastian Will, Michael Beckstette, and Rolf Backofen.
Lightweight comparison of RNAs based on exact sequence-structure matches. In Proceedings of the German Conference on Bioinformatics (GCB'2008), volume P-136 of Lecture Notes in Informatics (LNI), pages 189-198. Gesellschaft für Informatik (GI), 2008.
[pdf] [bib]
Martin Mann, Sebastian Will, and Rolf Backofen.
CPSP-tools - exact and complete algorithms for high-throughput 3D lattice protein studies. BMC Bioinformatics, 9 pp. 230, 2008.
[pdf] [doi] [bib]
A. F. Bompfunewerer, R. Backofen, S. H. Bernhart, J. Hertel, I. L. Hofacker, P. F. Stadler, and S. Will.
Variations on RNA folding and alignment: lessons from Benasque. Journal of Mathematical Biology, 56 no. 1-2 pp. 129-144, 2008.
[pdf] [doi] [pubmed] [bib]

Overview 2007

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen.
Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Computational Biology, 3 no. 4 pp. e65, 2007.
[pdf] [doi] [pubmed] [bib]
Athanasius F. Bompfünewerer Consortium, Rolf Backofen, Stephan H. Bernhart, Christoph Flamm, Claudia Fried, Guido Fritzsch, Jorg Hackermuller, Jana Hertel, Ivo L. Hofacker, Kristin Missal, Axel Mosig, Sonja J. Prohaska, Dominic Rose, Peter F. Stadler, Andrea Tanzer, Stefan Washietl, and Sebastian Will.
RNAs everywhere: genome-wide annotation of structured RNAs. J Exp Zoolog B Mol Dev Evol, 308 no. 1 pp. 1-25, 2007.
[pdf] [doi] [pubmed] [bib]
Alessandro Dal Palù, Agostino Dovier, François Fages, and Sebastian Will.
Constraint-based methods for bioinformatics. In Frédéric Benhamou, Narendra Jussien, and Barry O'Sullivan, editors, Trends in Constraint Programming, chapter 6, pages 129--150. ISTE, London, UK, May 2007.
[bib]
Martin Mann, Sebastian Will, and Rolf Backofen.
The energy landscape library - a platform for generic algorithms. In BIRD'07 - 1st international Conference on Bioinformatics Research and Development, volume 217, pages 83-86. Oesterreichische Computer Gesellschaft, 2007.
[pdf] [bib]

Overview 2006

Rolf Backofen and Sebastian Will.
A constraint-based approach to fast and exact structure prediction in three-dimensional protein models. Journal of Constraints, 11 no. 1 pp. 5--30, January 2006.
[pdf] [ps.gz] [doi] [bib]
Michael T. Wolfinger, Sebastian Will, Ivo L. Hofacker, Rolf Backofen, and Peter F. Stadler.
Exploring the lower part of discrete polymer model energy landscapes. Europhysics Letters, 74 no. 4 pp. 725-732, 2006.
[pdf] [ps.gz] [doi] [bib]
Sebastian Will and Martin Mann.
Counting protein structures by dfs with dynamic decomposition. In Proc. of the Workshop on Constraint Based Methods for Bioinformatics. http://www.dimi.uniud.it/dovier/WCB06/WCB06_proceedings.pdf, pages 83-90, 2006.
[pdf] [ps.gz] [bib]

Overview 2005

Sebastian Will, Anke Busch, and Rolf Backofen.
Efficient constraint-based sequence alignment by cluster tree elimination. In Proceedings of the Workshop on Constraint Based Methods in Bioinformatics (WCB05), pages 66-74, 2005.
[ps.gz] [bib]
Anke Busch, Sebastian Will, and Rolf Backofen.
SECISDesign: a server to design SECIS-elements within the coding sequence. Bioinformatics, 21 no. 15 pp. 3312-3, 2005.
[pdf] [pubmed] [bib]

Overview 2004

Rolf Backofen and Sebastian Will.
Local sequence-structure motifs in RNA. Journal of Bioinformatics and Computational Biology (JBCB), 2 no. 4 pp. 681-698, 2004.
[pdf] [ps.gz] [pubmed] [bib]
Alessandro Dal Palù, Sebastian Will, Rolf Backofen, and Agostino Dovier.
Constraint based protein structure prediction exploiting secondary structure information. In Convegno Italiano di Logica Computazionale 2004 (CILC 2004), pages 16-17, 2004.
[pdf] [bib]

Overview 2003

Sebastian Will and Rolf Backofen.
Breaking of partial symmetries in the photo and alignment problem. In Barbara Smith, Ian Gent, and Warwick Harvey, editors, Proceedings of the Third International Workshop on Symmetry in Constraint Satisfaction Problems (SymCon 2003), pages 187--194, 2003.
[ps.gz] [bib]
Rolf Backofen and Sebastian Will.
A constraint-based approach to structure prediction for simplified protein models that outperforms other existing methods. In Proceedings of the 19th International Conference on Logic Programming (ICLP 2003), volume 2916 of Lecture Notes in Computer Science, pages 49--71. Springer, 2003.
[ps.gz] [bib]

Overview 2002

Rolf Backofen and Sebastian Will.
Excluding symmetries in constraint-based search. Constraints, 7 no. 3 pp. 333-349, 2002.
[pdf] [ps.gz] [bib]
Sebastian Will.
Constraint-based hydrophobic core construction for protein structure prediction in the face-centered-cubic lattice. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, and Teri E. Klein, editors, Proceedings of the Pacific Symposium on Biocomputing 2002 (PSB 2002), pages 661--672, Singapore, 2002. World Scientific Publishing Co. Pte. Ltd.
[ps.gz] [pubmed] [bib]

Overview 2001

Rolf Backofen and Sebastian Will.
Optimally compact finite sphere packings --- hydrophobic cores in the FCC. In Proc. of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM2001), volume 2089 of Lecture Notes in Computer Science, pages 257--272, Berlin, 2001. Springer--Verlag.
[pdf] [ps.gz] [bib]
Rolf Backofen and Sebastian Will.
Fast, constraint-based threading of HP-sequences to hydrophobic cores. In Proc. of the 7th International Conference on Principle and Practice of Constraint Programming (CP'2001), volume 2239 of Lecture Notes in Computer Science, pages 494--508, Berlin, 2001. Springer--Verlag.
[pdf] [ps.gz] [bib]

Overview 2000

Rolf Backofen, Sebastian Will, and Peter Clote.
Algorithmic approach to quantifying the hydrophobic force contribution in protein folding. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, and Teri E. Klein, editors, Proceedings of the Pacific Symposium on Biocomputing (PSB 2000), volume 5, pages 92-103, 2000.
[ps.gz] [bib]
Slim Abdennadher, Matthias Saft, and Sebastian Will.
Classroom assignment using constraint logic programming. In Proceedings of the Second International Conference and Exhibition on The Practical Application of Constraint Technologies and Logic Programming (PACLP 2000), 2000.
[ps.gz] [bib]

Overview 1999

Rolf Backofen, Sebastian Will, and Erich Bornberg-Bauer.
Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets. Bioinformatics, 15 no. 3 pp. 234-242, 1999.
[ps.gz] [bib]
Rolf Backofen and Sebastian Will.
Excluding symmetries in constraint-based search. In Joxan Jaffar, editor, Proceedings of 5^{\it th} International Conference on Principle and Practice of Constraint Programming (CP'99), volume 1713 of Lecture Notes in Computer Science, pages 73-87, Berlin, 1999. Springer--Verlag.
[ps.gz] [bib]
Rolf Backofen, Sebastian Will, and Peter Clote.
Algorithmic approach to quantifying the hydrophobic force contribution in protein folding. In Proceedings of the German Conference on Bioinformatics (GCB'99), pages 93-106, 1999. http://www.bioinfo.de/isb/gcb99.
[ps.gz] [bib]

Overview 1998

Rolf Backofen and Sebastian Will.
Structure prediction in an HP-type lattice with an extended alphabet. In Proc of German Conference on Bioinformatics (GCB'98), 1998.
[ps.gz] [bib]
Rolf Backofen and Sebastian Will.
Excluding symmetries in concurrent constraint programming. In Workshop on Modeling and Computing with Concurrent Constraint Programming, 1998.
[ps.gz] [bib]

Sparsification of RNA Structure Prediction Including Pseudoknots

Mathias Möhl, Raheleh Salari, Sebastian Will, Rolf Backofen, S. Cenk Sahinalp

In: Proc. of the 10th Workshop on Algorithms in Bioinformatics (WABI), 2010, 12

Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem. In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We show that sparsification, when applied to the fastest, as well as the most general pseudoknotted structure prediction methods available, - respectively the Reeder-Giegerich algorithm and the Rivas-Eddy algorithm - reduces the number of "candidate" substructures to be considered significantly. In fact, experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.

Publication note:: Accepted
Available:: BibTeX Entry ( moehl_wabi10:Sparsification )

A Propagator for Maximum Weight String Matching with Arbitrary Pairwise Dependencies

Alessandro Dal Palu, Mathias Möhl, Sebastian Will

In: Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming (CP-2010), 2010, 8

The optimization of weighted string matchings is a well studied problem recurring in a number of application domains and can be solved efficiently. The problem becomes MAX-SNP-hard as soon as arbitrary pairwise dependencies among the matching edges are introduced. We present a global propagator for this problem which is based on efficiently solving a relaxation of it. In the context of bioinformatics, the problem is known as alignment of arc-annotated sequences, which is e.g. used for comparing RNA molecules. For a restricted version of this alignment problem, we show that a constraint program based on our propagator is on par with state of the art methods. For the general problem with unrestricted dependencies, our tool constitutes the first available method with promising applications in this field.

Publication note:: Accepted
Available:: BibTeX Entry ( palu_moehl_will_cp2010 )

Alignnment of RNA with Structures of Unlimited Complexity

Alessandro Dal Palu, Mathias Möhl, Sebastian Will

In: Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB 2010), 2010, 7

Sequence-structure alignment of RNA with arbitrary secondary structure is Max-SNP-hard. Therefore, the problem of RNA alignment is commonly restricted to nested structure, where dynamic programming yields efficient solutions. However, nested structure cannot model pseudoknots or even more complex structural dependencies. Nevertheless those dependencies are essential and conserved features of many RNAs. Only a few existing approaches deal with crossing structures. Here, we present a constraint approach for alignment of structures in the even more general class of unlimited structures. Our central contribution is a new RNA alignment constraint propagator. It is based on an efficient O(n²) relaxation of the RNA alignment problem. Our constraint-based approach Carna solves the alignment problem for sequences with given input structures of unlimited complexity. Carna is implemented using Gecode.

Publication note:: Accepted
Available:: BibTeX Entry ( palu_moehl_will_wcb2010 )

Freiburg RNA Tools: A Web Server Integrating IntaRNA, ExpaRNA and LocARNA

Cameron Smith, Steffen Heyne, Andreas S. Richter, Sebastian Will, Rolf Backofen

In: Nucleic Acids Research, 2010

No Abstract available

Publication note:: Accepted
Available:: pdf (471 KB) BibTeX Entry ( Smith:Heyne:Richter:Freib_RNA_tools:NAR2010 )

Time and space efficient RNA-RNA interaction prediction via sparse folding

Raheleh Salari, Mathias Möhl, Sebastian Will, S. Cenk Sahinalp, Rolf Backofen

In: Proc. of RECOMB 2010, 2010

In the past few years, a large set of new regulatory ncRNAs have been identifi ed, but the number of experimentally verified targets is considerably low. Thus, computational target prediction methods are on high demand. All previous approaches for predicting a general joint structure have a complexity of O(n⁶) running time and O(n⁴) space. However, a more time and space efficient interaction prediction that is able to handle complex joint structures is necessary for genome-wide target prediction problems. In this paper we show how to reduce both the time and space complexity of the RNA-RNA interaction prediction problem as described by Alkan et al. (JCB, 2006) by a linear factor via dynamic programming sparsification - which allows to discard large portions of DP tables without loosing optimality. Applying sparsification techniques reduces the complexity of the original algorithm to O(n⁴ψ(n)) in time and O(n²ψ(n)+n³) in space for some function ψ(n), which turns out to have small values for the range of n that we encounter in practice. By the use of polymer-zeta pro perty for RNA-structures, we demonstrate that ψ(n) = O(n) on average. We evaluate our sparsified algorithm for RNA-RNA interaction prediction throug h total free energy minimization, based on the energy model of Chitsaz et al. (2009), on a set of known interactions. Our results confirm the significant reduction of time and space requirements in practice.

Publication note:: Accepted
Available:: BibTeX Entry ( Salari:etal:_time_and_space_effic_rna:RECOMB2010 )

Lifting Prediction to Alignment of RNA Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Journal of Computational Biology, 2010

Many computational problems concerning RNA pseudoknot structures, most prominently their prediction and alignment, are NP-hard. For structure prediction, several algorithms have been proposed that are restricted to certain classes of pseudoknots in order to run efficiently. We present a general scheme that yields an efficient alignment algorithm for arbitrary classes of pseudoknots that can be predicted efficiently by dynamic programming. Moreover, we show that such an alignment algorithm benefits from restrictions to certain structure classes in the same way as structure prediction algorithms do. We look at five of these classes in greater detail. Compared to the respective structure prediction algorithm, the time and space complexity of the obtained alignment algorithm is increased by only a linear factor. For four of the classes, there do not exist alignment algorithms so far. For the fifth, most general class, our algorithm improves the time complexity compared to the best algorithm known so far, from O(n⁵m⁵) time to O(nm⁶), where n and m are the length of the two aligned sequences. Finally, we apply the fastest of the generated algorithms with complexity O(nm⁴) time and O(nm²) space to comparative de-novo prediction of pseudoknots.

Publication note:: Accepted
Available:: BibTeX Entry ( Moehl:Will:Backofen:PKalign:JCB2010 )

Lifting Prediction to Alignment of RNA Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Serafim Batzoglou, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), LNBI, 2009, 5541, 285--301

Available:: pdf (290 KB) Proofs.pdf (78 KB) BibTeX Entry ( Moehl:Will:Backofen:PKalign:RECOMB2009 )

Simultaneous Alignment and Folding of Protein Sequences

Jérôme Waldispühl, Charles W. O'Donnell, Sebastian Will, Srinivas Devadas, Rolf Backofen, Bonnie Berger

In: Serafim Batzoglou, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), LNBI, 2009, 5541, 339--355

One of the central challenges in computational biology is to develop accurate tools for protein structure analysis. Particularly difficult cases of this are sequence alignment and consensus folding of low-homology proteins. In this work, we present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align additionally exploits sparsity in the set of likely super-secondary structure pairings and alignment candidates for each amino acid to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane beta-barrel proteins, an important yet difficult class of proteins with very few available three-dimensional structures. In tests on sequence alignments derived from structure alignments, partiFold-Align is significantly more accurate than current best approaches for pairwise sequence alignment in the difficult case of low sequence homology and improves secondary structure prediction when current approaches fail. Importantly, partiFold-Align does not require training on transmembrane beta-barrel proteins. The generality of these techniques should allow them to be applied to a wide variety of protein structures.

Available:: pdf (337 KB) BibTeX Entry ( Waldispuehl:etal:TMBalign:RECOMB09 )

Equivalence Classes of Optimal Structures in HP Protein Models Including Side Chains

Martin Mann, Rolf Backofen, Sebastian Will

In: Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009

Lattice protein models, as the Hydrophobic-Polar (HP) model, are a common abstraction to enable exhaustive studies on structure, function, or evolution of proteins. A main issue is the high number of optimal structures, resulting from the hydrophobicity-based energy function applied. We introduce an equivalence relation on protein structures that correlates to the energy function. We discuss the efficient enumeration of optimal representatives of the corresponding equivalence classes and the application of the results.

Available:: pdf (273 KB) arXiv:0910.3848 BibTeX Entry ( Mann:Backofen:Will:_equivalence_classes:WCB09 )

Lightweight Comparison of RNAs Based on Exact Sequence-Structure Matches

Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen

In: Bioinformatics, 2009, 25(16), 2095-2102

MOTIVATION: Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence-structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method ExpaRNA (exact pattern of alignment of RNA) computes the longest collinear sequence of substructures common to two RNAs in O(H*nm) time and O(nm) space, where H << n.m for real RNA structures. Applied to different RNAs, our method correctly identifies sequence-structure similarities between two RNAs. RESULTS: We have compared ExpaRNA with two other alignment methods that work with given RNA structures, namely RNAforester and RNA_align. The results are in good agreement, but can be obtained in a fraction of running time, in particular for larger RNAs. We have also used ExpaRNA to speed up state-of-the-art Sankoff-style alignment tools like LocARNA, and observe a tradeoff between quality and speed. However, we get a speedup of 4.25 even in the highest quality setting, where the quality of the produced alignment is comparable to that of LocARNA alone. AVAILABILITY: The presented algorithm is implemented in the program ExpaRNA, which is available from our website (http://www.bioinf.uni-freiburg.de/Software).

Available:: pdf (425 KB) supplementary_data.pdf (632 KB) doi:10.1093/bioinformatics/btp065 pmid:19189979 BibTeX Entry ( Heyne:Will:Beckstette:Backofen:BI_ExpaRNA_2009 )

CPSP-web-tools: a server for 3D lattice protein studies

Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, Rolf Backofen

In: Bioinformatics, 2009, 25(5), 676-7

Studies on proteins are often restricted to highly simplified models to face the immense computational complexity of the associated problems. Constraint-based protein structure prediction (CPSP) tools is a package of very fast algorithms for ab initio optimal structure prediction and related problems in 3D HP-models [cubic and face centered cubic (FCC)]. Here, we present CPSP-web-tools, an interactive online interface of these programs for their immediate use. They include the first method for the direct prediction of optimal energies and structures in 3D HP side-chain models. This newest extension of the CPSP approach is described here for the first time. AVAILABILITY AND IMPLEMENTATION: Free access at http://cpsp.informatik.uni-freiburg.de

Available:: pdf (83 KB) doi:10.1093/bioinformatics/btp034 pmid:19151096 BibTeX Entry ( Mann_CPSPweb_2009 )

Simultaneous Alignment and Folding of Protein Sequences

Jérôme Waldispühl, Charles W. O'Donnell, Sebastian Will, Srinivas Devadas, Rolf Backofen, Bonnie Berger

In: Serafim Batzoglou, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), LNBI, 2009, 5541, 339--355

Available:: pdf (337 KB) BibTeX Entry ( Waldispuehl:etal:TMBalign:RECOMB09 )

Lifting Prediction to Alignment of RNA Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Serafim Batzoglou, Proc.of the 13^th Annual International Conferences on Computational Molecular Biology (RECOMB'09), LNBI, 2009, 5541, 285--301

Available:: pdf (290 KB) Proofs.pdf (78 KB) BibTeX Entry ( Moehl:Will:Backofen:PKalign:RECOMB2009 )

RNAalifold: improved consensus structure prediction for RNA alignments

Stephan H. Bernhart, Ivo L. Hofacker, Sebastian Will, Andreas R. Gruber, Peter F. Stadler

In: BMC Bioinformatics, 2008, 9, 474

Background: The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach. Results: We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets. Conclusions: The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.

Available:: pdf (398 KB) doi:10.1186/1471-2105-9-474 pmid:19014431 BibTeX Entry ( Bernhart:RNAalifold:BMC2008 )

Fixed Parameter Tractable Alignment of RNA Structures Including Arbitrary Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM 2008), LNCS, 2008, 69-81

We present an algorithm that computes the edit distance of two RNA structures with arbitrary kinds of pseudoknots. A main benefit of the algorithm is that, despite the problem is NP-hard, the algorithmic complexity adapts to the complexity of the RNA structures. Due to fixed parameter tractability, we can guarantee polynomial run-time for a parameter which is small in practice. Our algorithm can be considered as a generalization of the algorithm of Jiang ηl [Jiang2002] to arbitrary pseudoknots. In their absence, it gracefully degrades to the same polynomial algorithm. A prototypical implementation demonstrates the applicability of the method.

Available:: pdf (257 KB) BibTeX Entry ( Moehl:Will:Backofen:CPM2008 )

Efficient Sequence Alignment with Side-Constraints by Cluster Tree Elimination

Sebastian Will, Anke Busch, Rolf Backofen

In: Constraints Journal, 2008, 13(1), 110-129

Aligning DNA and protein sequences is a core technique in molecular biology. Often, it is desirable to include partial prior knowledge and conditions in an alignment. Going beyond prior work, we aim at the integration of such side constraints in free combination into alignment algorithms. The most common and successful technique for efficient alignment algorithms is dynamic programming (DP). However, a weakness of DP is that one cannot include additional constraints without specifically tailoring a new DP algorithm. Here, we discuss a declarative approach that is based on constraint techniques and show how it can be extended by formulating additional knowledge as constraints. We take special care to obtain the efficiency of DP for sequence alignment. This is achieved by careful modeling and applying proper solving strategies. Finally, we apply our method to the scanning for RNA motifs in large sequences. This case study demonstrates how the new approach can be used in real biological problems. A prototypic implementation of the method is available at http://www.bioinf.uni-freiburg.de/Software/CTE-Alignment.

Available:: pdf (397 KB) doi:10.1007/s10601-007-9032-x BibTeX Entry ( Will:Busch:Backofen:_effic_sequen_align_side_const:Constraints2008 )

Structure Local Multiple Alignment of RNA

Wolfgang Otto, Sebastian Will, Rolf Backofen

In: Proceedings of German Conference on Bioinformatics (GCB'2008), Lecture Notes in Informatics (LNI), 2008, P-136, 178-188

Today, RNA is well known to perform important regulatory and catalytical function due to its distinguished structure. Consequentially, structure is conserved in evolution and state-of-the art RNA multiple alignment algorithms consider structure as well as sequence information. However, existing tools neglect the important aspect of locality. Notably, locality in RNA occurs as similarity of subsequences as well as similarity of only substructures.

We present a novel approach for multiple alignment of RNAs that deals with both kinds of locality. The approach extends LocARNA by structural locality and delegates the construction of multiple alignments to T-Coffee.

The paper systematically investigates structural locality in known RNA families. Benchmarking multiple alignment tools on structurally local families shows the need for algorithmic support of this locality. The improvement in accuracy in special cases is achieved while staying competitive with state-of-the-art alignment tools across the whole Bralibase.

LocARNA and its T-Coffee extended variant LocARNATE are freely available at http://www.bioinf.uni-freiburg.de/Software/LocARNA/.

Available:: pdf (164 KB) BibTeX Entry ( Otto:Will:Backofen:_struc_local_multip_align_of_rna:CGB08 )

Lightweight comparison of RNAs based on exact sequence-structure matches

Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen

In: Proceedings of the German Conference on Bioinformatics (GCB'2008), Lecture Notes in Informatics (LNI), 2008, P-136, 189-198

Specific functions of RNA molecules are often associated to different motifs in the RNA structure. The key feature is that the combination of sequence and structure properties form such an RNA motif. In this paper we introduce a new RNA sequence-structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method computes the longest colinear sequence of substructures common to two RNAs in O(n^2m^2) time and O(nm) space. Applied to different RNAs, our method correctly identifies sequence-structure similarities between two RNAs. The results of our experiments are in good agreement with existing alignment-based methods, but can be obtained in a fraction of running time, in particular for larger RNAs. The proposed algorithm is implemented in the program expaRNA, which is available from our website (www.bioinf.uni-freiburg.de/Software).

Available:: pdf (1263 KB) BibTeX Entry ( Heyne:Will:Beckstette:Backofen:GCB08 )

CPSP-tools - Exact and Complete Algorithms for High-throughput 3D Lattice Protein Studies

Martin Mann, Sebastian Will, Rolf Backofen

In: BMC Bioinformatics, 2008, 9, 230

Background: The principles of protein folding and evolution pose problems of very high inherent complexity. Often these problems are tackled using simplified protein models, e.g. lattice proteins. The CPSP-tools package provides programs to solve exactly and completely the problems typical of studies using 3D lattice protein models. Among the tasks addressed are the prediction of (all) globally optimal and/or suboptimal structures as well as sequence design and neutral network exploration. Results: In contrast to stochastic approaches, which are not capable of answering many fundamental questions, our methods are based on fast, non-heuristic techniques. The resulting tools are designed for high-throughput studies of 3D-lattice proteins utilizing the Hydrophobic-Polar (HP) model. The source bundle is freely available at http://www.bioinf.uni-freiburg.de/sw/cpsp/ Conclusions: The CPSP-tools package is the first set of exact and complete methods for extensive, high-throughput studies of non-restricted 3D-lattice protein models. In particular, our package deals with cubic and face centered cubic (FCC) lattices.

Available:: pdf (438 KB) doi:10.1186/1471-2105-9-230 BibTeX Entry ( Mann:Will:Backofen:CPSP-tools:BMCB:2008 )

Variations on RNA folding and alignment: lessons from Benasque

A. F. Bompfunewerer, R. Backofen, S. H. Bernhart, J. Hertel, I. L. Hofacker, P. F. Stadler, S. Will

In: Journal of Mathematical Biology, 2008, 56(1-2), 129-144

Dynamic programming algorithms solve many standard problems of RNA bioinformatics in polynomial time. In this contribution we discuss a series of variations on these standard methods that implement refined biophysical models, such as a restriction of RNA folding to canonical structures, and an extension of structural alignments to an explicit scoring of stacking propensities. Furthermore, we demonstrate that a local structural alignment can be employed for ncRNA gene finding. In this context we discuss scanning variants for folding and alignment algorithms.

Available:: pdf (188 KB) doi:10.1007/s00285-007-0107-5 pmid:17611759 BibTeX Entry ( Bompfuenewerer:_variat_rna_foldin_align:JMathB07 )

Inferring Non-Coding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, Rolf Backofen

In: PLOS Computational Biology, 2007, 3(4), e65

The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficientto establish homology. In some cases, such as microRNAs, box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist ofmultiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative non-coding RNAs and regulatory RNA signals. For many ofthem, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypo-thetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classesfrom genome-wide surveys for structured RNAs. The LocARNA tool implements a novel variant of theSankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured ornon-conserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seedalignments. Furthermore, we have applied it to a previously published set of 3332 predicted structured elements in the Ciona intestinalis genomes (Missal et al., Bioinformatics 21(S2), i77-i78). In addition torecovering e.g. tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to-date norepresentative has been experimentally characterized.

Available:: pdf (4220 KB) doi:10.1371/journal.pcbi.0030065 pmid:17432929 BibTeX Entry ( Will:etal:_infer_non_codin_rna_famil:PLOS2007 )

RNAs everywhere: genome-wide annotation of structured RNAs

Athanasius F. Bompfünewerer Consortium, Rolf Backofen, Stephan H. Bernhart, Christoph Flamm, Claudia Fried, Guido Fritzsch, Jorg Hackermuller, Jana Hertel, Ivo L. Hofacker, Kristin Missal, Axel Mosig, Sonja J. Prohaska, Dominic Rose, Peter F. Stadler, Andrea Tanzer, Stefan Washietl, Sebastian Will

In: J Exp Zoolog B Mol Dev Evol, 2007, 308(1), 1-25

Starting with the discovery of microRNAs and the advent of genome-wide transcriptomics, non-protein-coding transcripts have moved from a fringe topic to a central field research in molecular biology. In this contribution we review the state of the art of "computational RNomics&,uml; i.e., the bioinformatics approaches to genome-wide RNA annotation. Instead of rehashing results from recently published surveys in detail, we focus here on the open problem in the field, namely (functional) annotation of the plethora of putative RNAs. A series of exploratory studies are used to provide non-trivial examples for the discussion of some of the difficulties.

Available:: pdf (1210 KB) doi:10.1002/jez.b.21130 pmid:17171697 BibTeX Entry ( BompfuenewererConsortium:RNAs_every_genom:2007 )

Constraint-based Methods for Bioinformatics

Alessandro Dal Palù, Agostino Dovier, François Fages, Sebastian Will

In: Frédéric Benhamou, Narendra Jussien, Barry O'Sullivan, Trends in Constraint Programming, May 2007, 129--150

No Abstract available

Available:: BibTeX Entry ( wscp:chap6:bioinfo )

The Energy Landscape Library - A Platform for Generic Algorithms

Martin Mann, Sebastian Will, Rolf Backofen

In: BIRD'07 - 1st international Conference on Bioinformatics Research and Development, 2007, 217, 83-86

The study of energy landscapes of biopolymers and their models is an important field in bioinformatics. For instance the investigation of kinetics or folding simulations are done using methods that are based on sampling or exhaustive enumeration. Most of such algorithms are independent of the underlying landscape model. Therefore frameworks for generic algorithms to investigate the landscape properties is needed. Here, we present the Energy Landscape Library (ELL) that allows such a model-independent formulation of generic algorithms dealing with discrete states. The ELL is a completely object-oriented C++ library that is highly modular, easy to extend, and freely available online. It can be used for a fast and easy implementation of new generic algorithms (possibly based on the provided basic method pool) or as a framework to test their properties for different landscape models, which can be formulated straightforward.

Available:: pdf (85 KB) BibTeX Entry ( Mann_ELL_BIRD07 )

A Constraint-Based Approach to Fast and Exact Structure Prediction in Three-Dimensional Protein Models

Rolf Backofen, Sebastian Will

In: Journal of Constraints, January 2006, 11(1), 5--30

Simplified protein models are used for investigating general properties of proteins and principles of protein folding. Furthermore, they are suited for hierarchical approaches to protein structure prediction. A well known protein model is the HPmodel of Lau and Dill [33], which models the important aspect of hydrophobicity. One can define the HP-model for various lattices, among them two-dimensional and three-dimensional ones. Here, we investigate the three-dimensional case. The main motivation for studying simplified protein models is to be able to predict model structures much more quickly and more accurately than is possible for real proteins. However, up to now there was a dilemma: the algorithmically tractable, simple protein models can not model real protein structures with good quality and introduce strong artifacts.

We present a constraint-based method that largely improves this situation. It outperforms all existing approaches for lattice protein folding in HP-models. This approach is the first one that can be applied to two three-dimensional lattices, namely the cubic lattice and the face-centered-cubic (FCC ) lattice. Moreover, it is the only exact method for the FCC lattice. The ability to use the FCC lattice is a significant improvement over the cubic lattice. The key to our approach is the ability to compute maximally compact sets of points (used as hydrophobic cores), which we accomplish for the first time for the FCC lattice.

Keywords: protein structure prediction, HP-model, face-centered cubic lattice, constraint programming

Available:: pdf (517 KB) ps.gz (382 KB) doi:10.1007/s10601-006-6848-8 BibTeX Entry ( Backofen:Will:Constraints2006 )

Exploring the lower part of discrete polymer model energy landscapes

Michael T. Wolfinger, Sebastian Will, Ivo L. Hofacker, Rolf Backofen, Peter F. Stadler

In: Europhysics Letters, 2006, 74(4), 725-732

We present a generic, problem independent algorithm for exploration of the lowenergy portion of the energy landscape of discrete systems and apply it to the energy landscape of lattice proteins. Starting from a set of optimal and near-optimal conformations derived from a constraint-based search technique, we are able to selectively investigate the lower part of lattice protein energy landscapes in two and three dimensions. This novel approach allows, in contrast to exhaustive enumeration, for an eOEcient calculation of optimal and near-optimal structures below a given energy threshold and is only limited by the available amount of memory. A straightforward application of the algorithm is calculation of barrier trees (representing the energy landscape), which then allows dynamics studies based on landscape theory.

Available:: pdf (233 KB) ps.gz (143 KB) doi:10.1209/epl/i2005-10577-0 BibTeX Entry ( Wolfinger:etal:_explor:EPL2006 )

Counting Protein Structures by DFS with Dynamic Decomposition

Sebastian Will, Martin Mann

In: Proc. of the Workshop on Constraint Based Methods for Bioinformatics. http://www.dimi.uniud.it/dovier/WCB06/WCB06_proceedings.pdf, 2006, 83-90

We introduce depth-first search with dynamic decomposition for counting the solutions of a binary CSP completely. In particular, we use the method for computing the number of minimal energy structures for model proteins.

Available:: pdf (208 KB) ps.gz (138 KB) BibTeX Entry ( Will:Mann:WCB2006 )

Efficient Constraint-based Sequence Alignment by Cluster Tree Elimination

Sebastian Will, Anke Busch, Rolf Backofen

In: Proceedings of the Workshop on Constraint Based Methods in Bioinformatics (WCB05), 2005, 66-74

Aligning DNA and protein sequences has become a standard mehtod in molecular biology. Often, it is desirable to include partial prior knowledge and conditions in an alignment. The most common and successful technique for efficient alignment algorithms is dynamic programming (DP). However, a weakness of DP is that one cannot include additional constraints without specifically tailoring a new DP algorithm. Here, we discuss a declarative approach that is based on constraint techniques and show how it can be extended by formulating additional knowledge as constraints. We take special care to obtain the efficiency of DP for sequence alignment. This is achieved by careful modeling and applying proper solving strategies.

Available:: ps.gz (405 KB) BibTeX Entry ( Will:Busch:Backofen:_effic_const_sequen_align_clust_tree_elimin:WCB05 )

SECISDesign: a server to design SECIS-elements within the coding sequence

Anke Busch, Sebastian Will, Rolf Backofen

In: Bioinformatics, 2005, 21(15), 3312-3

SUMMARY: SECISDesign is a server for the design of SECIS-elements and arbitrary RNA-elements within the coding sequence of an mRNA. The element has to satisfy both structure and sequence constraints. At the same time, a certain amino acid similarity to the original protein has to be kept. The designed sequence can be used for recombinant expression of selenoproteins in Escherichia coli. AVAILABILITY: The server is available at http://www.bio.inf.uni-jena.de/Software/SECISDesign/index.html.

Available:: pdf (53 KB) pmid:15919727 BibTeX Entry ( Busch:Will:Backofen:Bioinformatics2005 )

Local Sequence-Structure Motifs in RNA

Rolf Backofen, Sebastian Will

In: Journal of Bioinformatics and Computational Biology (JBCB), 2004, 2(4), 681-698

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n²m²max(n,m)) and a space complexity of only O(nm). An implementation of our algorithm is available at http://www.bio.inf.uni-jena.de. Its runtime is competitive with global sequence-structure alignment.

Keywords: RNA; local alignment; local sequence-structure alignment; lssa

Available:: pdf (161 KB) ps.gz (99 KB) pmid:15617161 BibTeX Entry ( Backofen:Will:_local_sequence_structure:JBCB2004 )

Constraint Based Protein Structure Prediction Exploiting Secondary Structure Information

Alessandro Dal Palù, Sebastian Will, Rolf Backofen, Agostino Dovier

In: Convegno Italiano di Logica Computazionale 2004 (CILC 2004), 2004, 16-17

The protein structure prediction problem is one of the most studied problems in Computational Biology. It can be reasonably abstracted as a minimization problem. The function to be minimized depends on the distances between the various amino-acids composing the protein and on their types. Even with strong approximations, the problem is shown to be computationally intractable. However, the solution of the problem for an arbitrary input size is not needed. Solutions for proteins of length 100-200 would give a strong contribution to Biotechnology. In this paper, we tackle the problem with constraint-based methods, using additional constraints and heuristics coming from the secondary structure of a protein that can be quickly predicted with acceptable approximation. Our prototypic implementation is written using constraints over finite domains in the Mozart programming system. It improves over any previous constraint-based approach and shows the power and flexibility of the method. Especially, it is well suited for further extensions.

Available:: pdf (304 KB) BibTeX Entry ( DalPalu:etal:CILC2004 )

Breaking of Partial Symmetries in the Photo and Alignment Problem

Sebastian Will, Rolf Backofen

In: Barbara Smith, Ian Gent, Warwick Harvey, Proceedings of the Third International Workshop on Symmetry in Constraint Satisfaction Problems (SymCon 2003), 2003, 187--194

Symmetry breaking by adding symmetric constraints during search usually assumes that symmetric constraints are simple. We identify symmetries with complex symmetric constraints, where the symmetries nevertheless can be handled by a similar method. For this aim, we introduce partial symmetries. We identify those symmetries in two problems. The photo problem is a well known example problem, while the alignment problem is a real world problem from bioinformatics.

Available:: ps.gz (61 KB) BibTeX Entry ( Will:Backofen:SymCon2003 )

A Constraint-Based Approach to Structure Prediction for Simplified Protein Models that Outperforms Other Existing Methods

Rolf Backofen, Sebastian Will

In: Proceedings of the 19th International Conference on Logic Programming (ICLP 2003), Lecture Notes in Computer Science, 2003, 2916, 49--71

Lattice protein models are used for hierarchical approaches to protein structure prediction, as well as for investigating general principles of protein folding. So far, one has the problem that either the lattice does not model real protein conformations with good quality, or there is no efficient method known for finding native conformations. We present a constraint-based method that largely improves this situ- ation. It outperforms all existing approaches in lattice protein folding on the type of model we have chosen (namely the HP-model by Lau and Dill[34], which models the important aspect of hydrophobicity). It is the only exact method that has been applied to two different lattices. Furthermore, it is the only exact method for the face-centered cubic lattice. This lattice is important since it has been shown [38] that the FCC lattice can model real protein conformations with coordinate root mean square deviation below 2 Angstrom. Our method uses a constraint-based approach. It works by first calculating maximally compact sets of points (hydrophobic cores), and then threading the given HP-sequence to the hydrophobic cores such that the core is occupied by H-monomers.

Available:: ps.gz (299 KB) BibTeX Entry ( Backofen:Will:_const_based_approac_struc_predic:ICLP2003 )

Excluding symmetries in constraint-based search

Rolf Backofen, Sebastian Will

In: Constraints, 2002, 7(3), 333-349

We introduce a new method for excluding symmetries in constraint based search. To our knowledge, it is the first declarative method that can be applied to arbitrary symmetries. Our method is based on the notion of symmetric constraints, which are used in our modification of a general constraint based search algorithm. The method does not influence the search strategy. Furthermore, it can be used with either the full set of symmetries, or with an subset of all symmetries. We proof correctness, completeness and symmetry exclusion properties of our method. Then, we show how to apply the method in the special case of geometric symmetries (rotations and reflections) and permutation symmetries. Furthermore, we give results from practical applications.

Available:: pdf (282 KB) ps.gz (110 KB) BibTeX Entry ( Bac:Wil:Constraints2002 )

Constraint-based Hydrophobic Core Construction for Protein Structure Prediction in the Face-Centered-Cubic Lattice

Sebastian Will

In: Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Teri E. Klein, Proceedings of the Pacific Symposium on Biocomputing 2002 (PSB 2002), 2002, 661--672

We present an algorithm for exact protein structure prediction in the FCC-HP-model. This model is a lattice protein model on the face-centered-cubic lattice that models the main force of protein folding, namely the hydrophobic force. The structure prediction for this model can be based on the construction of hydrophobic cores. The main focus of the paper is on an algorithm for constructing maximally and submaximally compact hydrophobic cores of a given size. This algorithm treats core construction as a constraint satisfaction problem (CSP), and the paper describes its constraint model. The algorithm employs symmetry excluding constraint-based search [Backofen, Will: CP99] and relies heavily on good upper bounds on the number of contacts. Here, we use and strengthen upper bounds presented earlier. [Backofen, Will: CPM2001] The resulting structure prediction algorithm (including previous work [Backofen, Will: CPM2001, Backofen, Will: CP2001]) handles sequences of sizes in the range of real proteins fast, i.e. we predict a first structure often within a few minutes. The algorithm is the first exact one for the FCC, besides full enumeration which is impracticable for chain lengths greater than about 15. We tested the algorithm succesfully up to sequence length of 160, which is far beyond the capabilities even of previous heuristic approaches.

Available:: ps.gz (82 KB) pmid:11928518 BibTeX Entry ( Will:PSB2002 )
Links:: supplementary, there is a Poster (PostScript) on the topic

Optimally Compact Finite Sphere Packings --- Hydrophobic Cores in the FCC

Rolf Backofen, Sebastian Will

In: Proc. of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM2001), Lecture Notes in Computer Science, 2001, 2089, 257--272

Lattice protein models are used for hierarchical approaches to protein structure prediction, as well as for investigating principles of protein folding. The problem is that there is so far no known lattice that can model real protein conformations with good quality, and for which there is an efficient method to prove whether a conformation found by some heuristic algorithm is optimal. We present such a method for the FCC-HP-Model [Agarwala et al., JCB 97]. For the FCC-HP-Model, we need to find conformations with a maximally compact hydrophobic core. Our method allows us to enumerate maximally compact hydrophobic cores for sufficiently great number of hydrophobic amino-acids. We have used our method to prove the optimality of heuristically predicted structures for HP-sequences in the FCC-HP-model.

Available:: pdf (274 KB) ps.gz (96 KB) BibTeX Entry ( Bac:Wil:CPM2001 )

Fast, Constraint-based Threading of HP-Sequences to Hydrophobic Cores

Rolf Backofen, Sebastian Will

In: Proc. of the 7th International Conference on Principle and Practice of Constraint Programming (CP'2001), Lecture Notes in Computer Science, 2001, 2239, 494--508

Lattice protein models are used for hierarchical approaches to protein structure prediction, as well as for investigating principles of protein folding. So far, one has the problem that there exists no lattice that can model real protein conformations with good quality and for which an efficient method to find native conformations is known. We present the first method for the FCC-HP-Model [Agarwala et al., JCB 97] that is capable of finding native conformations for real-sized HP-sequences. It has been shown [Park and Levitt, JMB 95] that the FCC lattice can model real protein conformations with coordinate root mean square deviation below 2 \AA. Our method uses a constraint-based approach. It works by first calculating maximally compact sets of points (hydrophobic cores), and then threading the given HP-sequence to the hydrophobic cores such that the core is occupied by H-monomers.

Available:: pdf (238 KB) ps.gz (89 KB) BibTeX Entry ( Bac:Wil:CP2001 )

Algorithmic approach to quantifying the hydrophobic force contribution in protein folding

Rolf Backofen, Sebastian Will, Peter Clote

In: Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Teri E. Klein, Proceedings of the Pacific Symposium on Biocomputing (PSB 2000), 2000, 5, 92-103

Though the electrostatic, ionic, van der Waals, Lennard-Jones, hydrogen bonding, and other forces play an important role in the energy function minimized at a protein's native state, it is widely believed that the \em hydrophobic force is the dominant term in protein folding. In this paper, we attempt to quantify the extent to which the hydrophobic force determines the positions of the backbone α-carbon atoms in PDB data, by applying Monte-Carlo and genetic algorithms to determine the predicted conformation with minimum energy, where only the hydrophobic force is considered (i.e. Dill's HP-model, and refinements using Woese's polar requirement). This is done by computing the root mean square deviation between the normalized distance matrix D = (d_i,j) (d_i,j is normalized Euclidean distance between residues r_i and r_j) for PDB data with that obtained from the output of our algorithms. Our program was run on the database of ancient conserved regions drawn from GenBank 101 generously supplied by W. Gilbert's lab [gilbertData,desouzaGilbert], as well as medium-sized proteins (E. Coli RecA, 2reb, Erythrocruorin, 1eca, and Actinidin 2act). The root mean square deviation (RMSD) between distance matrices derived from the PDB data and from our program output is quite small, and by comparison with RMSD between PDB data and random coils, allows a quantification of the hydrophobic force contribution Keywords: lattice, face-centered-cubic, hydrophobic force, automorphism

Available:: ps.gz (107 KB) BibTeX Entry ( Bac:Wil:Clo:PSB2000 )

Classroom Assignment using Constraint Logic Programming

Slim Abdennadher, Matthias Saft, Sebastian Will

In: Proceedings of the Second International Conference and Exhibition on The Practical Application of Constraint Technologies and Logic Programming (PACLP 2000), 2000

The Classroom Assignment problem consists of scheduling a set of courses into a fixed number of rooms, given a fixed timetable. At the University of Munich, a classroom plan has to be created each semester after collecting timetables of all departments and wishes of teachers. This planning is very complex since a lot of constraints have to be met, e.g. room size and equipment. Using constraint-based programming, we developed a prototype, called RoomPlan, that supports automatic creation of classroom plans. With RoomPlan a schedule can be created interactively within some minutes instead of some days. RoomPlan was presented at the Systems'99 Computer exhibition in Munich.

Available:: ps.gz (137 KB) BibTeX Entry ( Abdennadher:Saft:Will:PACLP00 )

Application of Constraint Programming Techniques for Structure Prediction of Lattice Proteins with Extended Alphabets

Rolf Backofen, Sebastian Will, Erich Bornberg-Bauer

In: Bioinformatics, 1999, 15(3), 234-242

Predicting the ground state of biopolymers is a notoriously hard problem in biocomputing. Model systems, such as lattice proteins are simple tools and valuable to test and improve new methods. Best known are HP-type models with sequences composed from a binary (hydrophobic and polar) alphabet. Major drawback is the degeneracy, i.e. the number of different ground state conformations. Here we show how recently developed constraint programming techniques can be used to solve the structure prediction problem efficiently for a higher order alphabet. To our knowledge it is the first report of an exact and computationally feasible solution to model proteins of length up to 36 and without resorting to maximally compact states. We further show that degeneracy is reduced by more than one order of magnitude and that ground state conformations are not necessarily compact. Therefore, more realistic protein simulations become feasible with our model.

Available:: ps.gz (210 KB) BibTeX Entry ( Bac:Wil:Bor:BioInfo99 )

Excluding Symmetries in Constraint-Based Search

Rolf Backofen, Sebastian Will

In: Joxan Jaffar, Proceedings of 5^{\it th} International Conference on Principle and Practice of Constraint Programming (CP'99), Lecture Notes in Computer Science, 1999, 1713, 73-87

We introduce a new method for excluding symmetries in constraint based search. To our knowledge, it is the first declarative method that can be applied to arbitrary symmetries. Our method is based on the notion of symmetric constraints, which are used in our modification of a general constraint based search algorithm. The method does not influence the search strategy. Furthermore, it can be used with either the full set of symmetries, or with an subset of all symmetries. We proof correctness, completeness and symmetry exclusion properties of our method. We then show how to apply the method in the special case of geometric symmetries (rotations and reflections) and permutation symmetries. Furthermore, we give results from practical applications.

Available:: ps.gz (97 KB) BibTeX Entry ( Bac:Wil:CP99 )

Algorithmic approach to quantifying the hydrophobic force contribution in protein folding

Rolf Backofen, Sebastian Will, Peter Clote

In: Proceedings of the German Conference on Bioinformatics (GCB'99), 1999, 93-106

Though the electrostatic, ionic, van der Waals, Lennard-Jones, hydrogen bonding, and other forces play an important role in the energy function minimized at a protein's native state, it is widely believed that the \em hydrophobic force is the dominant term in protein folding. In this paper, we attempt to quantify the extent to which the hydrophobic force determines the positions of the backbone α-carbon atoms in PDB data, by applying Monte-Carlo and genetic algorithms to determine the predicted conformation with minimum energy, where only the hydrophobic force is considered (i.e. Dill's HP-model, and refinements using Woese's polar requirement). This is done by computing the root mean square deviation between the normalized distance matrix D = (d_i,j) (d_i,j is normalized Euclidean distance between residues r_i and r_j) for PDB data with that obtained from the output of our algorithms. Our program was run on the database of ancient conserved regions drawn from GenBank 101 generously supplied by W. Gilbert's lab [gilbertData,desouzaGilbert], as well as medium-sized proteins (E. Coli RecA, 2reb, Erythrocruorin, 1eca, and Actinidin 2act). The root mean square deviation (RMSD) between distance matrices derived from the PDB data and from our program output is quite small, and by comparison with RMSD between PDB data and random coils, allows a quantification of the hydrophobic force contribution The final version of this paper will appear in the proceedings of PSB'2000. Keywords: lattice, face-centered-cubic, hydrophobic force, automorphism

Publication note:: http://www.bioinfo.de/isb/gcb99
Available:: ps.gz (107 KB) BibTeX Entry ( Bac:Wil:Clo:GCB99 )

Structure Prediction in an HP-type Lattice with an Extended Alphabet

Rolf Backofen, Sebastian Will

In: Proc of German Conference on Bioinformatics (GCB'98), 1998

The protein structure prediction problem is one of the most important problems in computational biology. This problem consists of finding the conformation of a protein (given by a sequence of amino-acids) with minimal energy. Because of the complexity of this problem, simplified models like Dill's HP-lattice model have become a major tool for investigating general properties of protein folding. Even for this simplified model, the structure prediction problem has been proven to be NP-complete. A disadvantage of the HP-problem is its high degeneracy. I.e., for every sequence there are a lot of conformations having the minimal energy. For this reason, extended alphabets have been used in the literature. One of these alphabets is the HPNX-alphabet, which considers hydrophobic amino acids as well as positive and negative charged ones. In this paper, we describe an exact algorithm for solving the structure prediction problem for the HPNX-alphabet. To our knowledge, our algorithm is the first exact one for finding the minimal conformation of an lattice protein in a lattice model with an alphabet more complex than HP. We also compare our results with results as given for the HP-model.

Available:: ps.gz (106 KB) BibTeX Entry ( Bac:Wil:GCB98 )
Links:: long version as Technical Report 9810, LMU M�nchen

Excluding Symmetries in Concurrent Constraint Programming

Rolf Backofen, Sebastian Will

In: Workshop on Modeling and Computing with Concurrent Constraint Programming, 1998

In many problems, non-determinism naturally arises in an concurrent constraint-based formulation in form of disjunctive constraints. To check consistency of disjunctive constraints, one needs to perform search by splitting the constraint store into local computation stores. An important problem there is to eliminate symmetries, which give rise to many different but ''similar'' solutions found by the search procedure. Since symmetries will give rise to an (often exponential) amplification of the search space, symmetry exclusion promises to efficiently prune the search tree. Hence, many constraint programmers have constructed mechanisms to exclude at least some of the symmetries of their problem. Unfortunately, symmetry exclusion was often not straightforward to implement and had to be redesigned for every special problem or enumeration strategy, which leaded to an inflexibility in the program structure once it was introduced. Often symmetry exclusion was not done at all or only for a small set of symmetries. In other cases, where symmetry exclusion was implemented it distracted the programmers attention from more important tasks. The contribution of our work is to give a generally applicable method for the exclusion of arbitrary symmetries. To our knowledge, this is the first general and declarative method for excluding symmetries in concurrent constraint programming.

Available:: ps.gz (72 KB) BibTeX Entry ( Bac:Wil:WS-CP98 )