Combinatorial Problems on Strings with Applications to Protein Folding

Alantha Newman and Matthias Ruhl

Latin American Theoretical INformatics (LATIN '04)
Buenos Aires, April 2004

[Postscript, 357KB]

[PDF, 195KB]


We consider the problem of protein folding in the HP model on the 3D square lattice. This problem is combinatorially equivalent to folding a string of 0's and 1's so that the string forms a self-avoiding walk on the lattice and the number of adjacent pairs of 1's is maximized. The previously best-known approximation algorithm for this problem has a guarantee of 3/8 = .375. In this paper, we first present a new 3/8-approximation algorithm for the 3D folding problem that improves on the absolute approximation guarantee of the previous algorithm. We then show a connection between the 3D folding problem and a basic combinatorial problem on binary strings, which may be of independent interest. Given a binary string in {a,b}*, we want to find a long subsequence of the string in which every sequence of consecutive a's is followed by at least as many consecutive b's. We show a non-trivial lower-bound on the existence of such subsequences. Using this result, we obtain an algorithm with a slightly improved approximation ratio of at least .37501 for the 3D folding problem. All of our algorithms run in linear time.

Back to publications