structural protein comparison http://people.csail.mit.edu/jaffer/GTOL/protein

The Evolution of Proteins


Evolution in the Nucleotide Game argues that the first synthesized non-genetic polymers to evolve will be capsid components. For the rest of this article, DNA and RNA will be the nucleotides; and proteins will be the non-genetic polymers.

Capsids are protective shields. There are likely to be many amino acids substitutions which don't affect the performance of capsid proteins. In enzymatic proteins, most of the protein serves only to hold the active sites in position.

Big Game Theory -- Evolutionary Breakthroughs argues that the development of the first protein and the ribosomal-RNA to synthesize it is a lengthy process which will not be repeated. But with mutation causing continual variability in protein sequences, the somewhat conserved regions of existing proteins will, over time, produce a great variety of amino sequence. Somewhat conserved refers to sequences which can substitute one amino acid for another, but will not substitute a stop codon, which would corrupt the protein.

The duplication or splicing of chromosome partway through a gene provides the opportunity for new genes to be synthesized. So we should expect the RNA sequences of new genes to be similar to the sequences of the genes from which they were derived. Constructing a phylogeny of proteins would be very interesting. Unfortunately, the origins of most genes are so ancient that the sequence similarity signal will be very weak. Feature-Based Phylogeny is a project to derive the history of protein.

In prokaryotes, most of whose DNA codes for protein, the duplications and splices which occur will often be in coding regions, giving the hidden gene an opportunity for expression. In eukaryotes, most DNA is non-coding; there will be no selective pressure against stop codons in non-coding DNA. Each codon has a 3/64 chance of being a stop codon. The chance of n successive codons not being stop codons is:
(61/64)n
The expected length of a random sequence of DNA is then:
0.5 = (61/64)n
n = log 0.5
log (61/64)
= 14.4 codons.
So few random codon sequences will be long enough to encode interesting proteins. As non-coding DNA accumulates point mutations, it will approach having 5 % stop codons. That DNA is then useless for evolving proteins.

Eukaryotes, with their large non-coding DNA allocations, required a mutation method which didn't pollute DNA sequences with stop codons. That method is genetic recombination.


Copyright © 2004, 2007 Aubrey Jaffer

I am a guest and not a member of the MIT Computer Science and Artificial Intelligence Laboratory.  My actions and comments do not reflect in any way on MIT.
The Game Theory of Life
agj @ alum.mit.edu
Go Figure!