The difficulty and reliability in determining the geometry or spatial structure of a molecule depends greatly on the type of available data. X-ray crystallography is currently the primary source of structural information, followed next by magnetic resonance imaging (MRI). To derive molecular structure from MRI data, one must resolve atomic positions given a set of approximate pairwise distances. The traditional tools for this problem come from distance geometry [40].
Lacking X-ray and MRI data, one can only resort to structural analysis of unknown proteins by looking for patterns in the amino acid residue sequence that match those of known proteins. For evolutionary reasons, the many proteins occurring in nature share a limited number of common internal structures and folds. Recognizing such patterns and threading the unknown protein onto it greatly simplifies structure determination when X-ray or MRI data are available. Modeling based on such analysis may also be valuable.
The most difficult version of the problem is also the one with the largest
potential benefit.
It is the ab initio determination of structure from the amino acid
residue sequence [127].
While one can in principle use Newtonian mechanics to simulate the natural
folding of the molecule, the sheer scale of the calculation is daunting.
A modest-size protein folding simulation with current algorithms
would require in the neighborhood of floating-point operations.
A variety of heuristic methods are used to find the minimum energy
configuration, including simulated annealing, Monte Carlo,
and search with reduced degrees of freedom.
As yet, none of these methods have come close to a general solution.
Practical methods fail in most cases because the target
function (say, the sum of energy potentials)
has a large number of local maxima at any level of detail.