Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model

Jérôme Waldispühl and Peter Clote.

An RNA secondary structure is saturated if no base pairs can be added without violating the definition of secondary structure. Here we describe a new algorithm, RNAsat, which for a given RNA sequence a, an integral temperature 0 < T < 100 in degrees Celsius, and for all integers k, computes the Boltzmann partition function Z(T,k)(a) = ∑ exp(- E(S)/RT ), S in SAT(k)(a), where the sum is over all saturated secondary structures of a which have exactly k base pairs, R is the universal gas constant and E(S) denotes the free energy with respect to the Turner nearest neighbor energy model. By dynamic programming, we compute Z(T,k) simultaneously for all values of k in time O(n^5) and space O(n^3). Additionally, RNAsat computes the partition function Q(T,k)(a) = ∑ exp(- E(S)/RT ), S in S(k)(a), where the sum is over all secondary structures of a which have k base pairs; the latter computation is performed simultaneously for all values of k in O(n^4) time and O(n^3) space. Lastly, using the partition function Z(T,k) [resp. Q(T,k)] with stochastic backtracking, RNAsat rigorously samples the collection of saturated secondary structures [resp. secondary structures] having k base pairs; for Q(T,k) this provides a parametrized form of Sfold sampling (Y. Ding & C. Lawrence, 2002). Using RNAsat, (i) we compute the ensemble free energy for saturated secondary structures having k base pairs, (ii) show cooperativity of the Turner model, (iii) demonstrate a temperature-dependent phase transition, (iv) illustrate the predictive advantage of RNAsat for precursor microRNA cel-mir-72 of C. elegans and for the pseudoknot PKB 00152 of Pseudobase, (v) illustrate the RNA shapes (R. Giegerich, 2005) of sampled secondary structures [resp. saturated structures] having exactly k base pairs. A web server for RNAsat is under construction at bioinformatics.bc.edu/clotelab/RNAsat/