Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model
Jérôme Waldispühl and Peter Clote.
An RNA secondary structure is saturated if no base pairs can be added
without violating the definition of secondary structure. Here we describe a new algorithm,
RNAsat, which for a given RNA sequence a, an integral temperature
0 < T < 100 in degrees Celsius, and for all integers
k, computes the Boltzmann partition function
Z(T,k)(a) = ∑ exp(- E(S)/RT ), S in SAT(k)(a), where
the sum is over all saturated secondary structures of a which have
exactly k base pairs, R is the universal gas constant and
E(S) denotes the free energy with respect to the
Turner nearest neighbor energy model.
By dynamic programming, we compute Z(T,k) simultaneously for
all values of k in time O(n^5) and space O(n^3).
Additionally, RNAsat computes the partition function
Q(T,k)(a) = ∑ exp(- E(S)/RT ), S in S(k)(a), where
the sum is over all secondary structures of a
which have k base pairs;
the latter computation is performed simultaneously for all values of k
in O(n^4) time and O(n^3) space.
Lastly, using the partition function Z(T,k)
[resp. Q(T,k)] with stochastic backtracking, RNAsat
rigorously samples the collection of saturated secondary structures
[resp. secondary structures] having k base pairs; for Q(T,k) this
provides a parametrized form of Sfold sampling (Y. Ding & C. Lawrence, 2002).
Using RNAsat,
(i)
we compute the ensemble free energy
for saturated secondary structures having k base pairs,
(ii)
show cooperativity of the Turner model,
(iii)
demonstrate a temperature-dependent
phase transition,
(iv)
illustrate the predictive advantage of RNAsat for precursor
microRNA cel-mir-72 of C. elegans and for the
pseudoknot PKB 00152 of Pseudobase,
(v)
illustrate the RNA shapes (R. Giegerich, 2005)
of sampled secondary structures [resp. saturated structures] having
exactly k base pairs. A web server for RNAsat
is under construction at bioinformatics.bc.edu/clotelab/RNAsat/