As we've mentioned
before, amino acid chains are very bendy. The carbon-to-carbon
and carbon-to-nitrogen bonds make a very sharp turn (109°)
at each link in the peptide chain, and there is a great deal of
rotational flexibility at the bonds (unless bulky side chains prevent
it).
Proteins can form into rigid structures,
but they usually do that by wandering around and looping back and
forth in 3D space, and then holding the structure together with
linkages between molecules in different parts of the chain (either
by hydrophobic bonding, polar bonding or cysteine sulphur bonds).
In general, protein enzymes work by having
an occasional polar amino acid that sticks out and interacts with
some other amino acid further down the chain (the two molecules just
happen to be physically close because the chain looped around and
back). Enzymes rarely have more than one or two amino acids in a
row that are part of an active group, simply because it's difficult
to get the branch chains from three consecutive amino acids to be
physically close to each other.
Problems with Fred
What that flexibility means for protein transcription
is that proteins are not the ideal structures for reading genetic
chains. Fred and Roscoe managed to do it, but it was going to be
a design challenge for them to ever expand into wider reading frames
that were sufficient to code for proteins with a larger variety of
amino acids.
It probably could be done, but once Cassius
started using complementary nucleotides, there was a better way.
Chains Reading Chains
From a structural point of view, the ideal
way to match up with a short stretch of backbone chain is to use
another chain. It's equally compact, and rigid, and thanks
to complementary pairing, just the right shape to line up with the
first chain. So this might (finally!) be the right time to start
thinking about protein replication that is closer to the modern system.
A few chapters back we already looked at
Fatcat, which managed versions of Fred that were each a 'transfer
molecule' that brought in one amino acid.
What if Fatcat managed RNA-based carriers
instead? And what if those carriers matched up to more than one
chain molecule at a time? Matching two base pairs at a time would
allow proteins to include up to 15 different amino acids (plus
a stop codon), while a triplet system would allow up to 63 amino
acid choices.
That sounds great, but there was one big
problem. For a while at least, the new system still needed to coexist
with the older single-reading-frame system. Cassius still needed
its 'legacy' genes such as Sofia and Sorrel, and any
new triplet system that interfered with the transcription of Fred
and Roscoe would be lethal.
Triplet Transition
There are two possible ways that Cassius
could have switched to three-molecule reading frames, with no clear
evidence for which way it actually went.
One possibility is that the transition happened
very early, right after the assimilation of a Caleb with an alt-Caleb.
There was a brief 'window of opportunity' when a new
system could have slipped in, with reduced risk of interfering with
legacy proteins.
Another, and probably more likely option,
is that the shift happened later, with the triplet coding arising
from an entirely different pathway, that was unrelated to the old
Fatcat/Fred transcription system.
We'll consider these two paths in more detail, now.
Gradual Triplets
If the transition to complementary base pairs
and triplet coding happened soon enough after the assimilation of
a Caleb and alt-Caleb, there was a relatively easy way to start using
a few triplets to add a few new amino acids-- similar to the
method that we mentioned in the previous chapter, when looking at
double-wide Fred.
The 'legacy' genes that came from the original Calebs
would have contained just the two nucleotides from Sofia, while the
alt-Caleb genes would have contained just the two complementary nucleotides
from alt-Sofia. So any new triplet-Freds that matched with a mixture
of nucleotides and alt-nucleotides would have been guaranteed not
to interfere with any of the original genes.
Let's look at how a fifth amino acid might end up in a four-molecule
protein, via a combination of triplet reading frame, and complementary
base pairing.
1. We'll start with the usual 'alternating Freds' transcription
of a protein from a chain. Fatcat and Fred transcribe a bunch of
amino acids using one chain molecule at a time, just like normal.
2. After a while, Fatcat comes to the just
right chain spot, but a triplet Fred pops into the spot instead of
a regular Fred. This particular Fred contains three chain molecules
which happen to match up with the next three base pairs . It also
contains a knee which attracts a new, fifth amino acid. Fred bonds
complementarily, and adds the brand new amino acid to the protein.
3. Fatcat and an old Fred then continue with
normal transcription.
4. The result is a chain that contains a
new, fifth amino acid.
Chain Results
Of course, as with all the previous changes
in protein transcription, this new protein
was probably not beneficial, and may even
have been lethal.
However, after many false attempts, the triplet-coding
system eventually might have produced a useful
new protein. So the organism that contained
it would have thrived.
Over a long period of time, the beneficial
effect of having more amino acid choices
would have given a selective advantage to
any Cassius which was capable of using the
new triplet system. Because of that, the
system would become established.
Triple Wide Origins
The jump from single-wide Fred to triple-wide
Fred might seem like an implausibly large
change, but there is a bit of existing cell
chemistry that would make it much more possible.
A few chapters back, we mentioned the use
of 'carrier' molecules that would
bring in a small, raw material molecule,
and link it up to a blueprint chain via complementary
pairing.
Those carrier molecules had a function that
was almost identical to Fred's, it
it's quite possible that they might
have intruded accidentally into a protein
transcription, and then had a beneficial
function.
Sixth to Twentieth Amino Acids
Once Fatcat and Fred established the fifth
amino acid (and the first to be coded from
triplets), the same system would have gradually
become established in more and more polypeptides,
introducing more and more new amino acids.
Some triplet Freds might have picked up a
different chain triplet, and added additional
triplets that would code for the same protein.
Some of those new triplets may have mutated
at the knee end, and started bringing in
new amino acids.
Presumably, at some point the whole system
gradually evolved into our current 20 amino
acids coded from 61 of the 64 possible triplet
combinations, with a few leftovers which
we'll discuss later.
Replacing Fred
Over time, the triplet-based amino acids
would have gradually become the dominant
components of proteins in early organisms,
and the original single-frame transcriptions
would have become less and less important.
At some point the old Fred system would have
probably become so non-essential that it
could simply disappear.
The question is, what happened to the four
original amino acids from the original Fred
and alt-Fred polypeptides?
The odds are good that triplet coding would
have already been introduced for those amino
acids-- having two different systems
coding for some presumably essential amino
acids would not be harmful in any way.
It's also possible that one or more
of the very original amino acids were not
quite as useful to newer organisms, for whatever
reason. In that case, they would have simply
disappeared when the triplet system became
the only form of protein coding. By then
they would have been contained in only a
few 'legacy' proteins anyhow.
Mixed Reading Frames
Unfortunately, the gradual approach to triplet
encoding still has most of the disadvantages
of the double-wide Fred system. The biggest
problem is that there would sometimes be
ambiguous reads, since the three base pairs
for a triplet would sometimes be read by
three old Freds, producing an entirely different
protein with three 'old' amino
acids substituting for the one new one.
There may have been clever ways for cells
to prevent that, but it's also possible
that the modern triplet-reading system evolved
from a completely different pathway. Let's
take a look at that now.
Independent Triplets
A while back, we mentioned the 'blueprint' chains,
which used a short length of RNA and complementary
base pairing to position several enzymes
into a supercatalyst. We also talked about
the use of RNA as a 'carrier' for
bringing in coenzymes and raw materials for
synthetic reactions, and positioning them
via complementary base pairing. And we mentioned
that the raw material carriers probably used
a fairly short RNA match sequence, since
they needed to 'pop' out of the
complementary pairing, once they had delivered
a molecule of raw material.
As we mentioned in the previous section,
a blueprint carrier molecule might have introduced
a new triplet-coded amino acid into the existing 'legacy' Fred/Fatcat
system. But it's also possible that
the blueprint system could have turned into
a triplet-based transcription system, completely
independently of Fred.
Let's take a look at how it would work.
First Triplet Transcription
The very first triplet blueprint might have
looked something like this:
1. A Fatcat protein matches to the beginning
of a blueprint chain. Some carrier molecules
link up to the remainder of the blueprint
chain, bringing in a series of amino acids.
2. Fatcat moves down the line of amino acids,
and assembles them into a short polypeptide.
Mission accomplished.
First Useful Triplet Proteins
The first polypeptide synthesized by the
new triplet system was probably very short-- it
may have been composed of just two or three
amino acids . That wouldn't have been
long enough to make an enzyme, but it might
have served as a coenzyme, or a messenger
molecule, or it may have been a component
in some longer, structural polymer.
As a temporary expedient, cells may have
also created short polypeptides with new
amino acids, and then added them to the regular
four-molecule proteins created by Fred and
Fatcat.
For example, cells probably already had carriers
for glycine, since it's very common,
and it has some useful properties. They would
have probably gained a huge advantage from
adding glycine to existing proteins since
a Gly-xx-Gly-xx-xx-Gly sequence is very good
at linking to the phosphate portion of an
ATP molecule. Glycine is a little too small
and simple to make a good ingredient for
a two-molecule or four-molecule protein chain,
but it is so common that it probably was
a part of early cell metabolism, and would
have been very beneficial to manage.
Once the triplet system was building a few
useful short polypeptides, it could have
gradually started reading longer and longer
chains, and building longer proteins.
There would have been a period of overlap,
when Cassius was building some proteins from
the old four-molecule Fred/Fatcat system,
and some multiple-molecule proteins using
triplets. However, eventually the triplet
system would have eventually taken over completely,
simply because it was able to build proteins
from a much larger range of amino acid components.
Amino Acid Repertoire
It's quite possible that 8-molecule
versions of Cassius were already using some
additional amino acids-- as coenzymes,
as a component in cell membranes, or for
some other minor function.
In that case, Cassius probably already had
carriers for those additional molecules.
So the first triplet-based protein synthesis
may have started right out with more than
four amino acids to build with.
After that, to add a new amino acid to its
repertoire, all Cassius really needed to
do was develop a carrier for that new molecule,
along with a different match sequence for
it, in the genetic chains.
Each new amino acid would provide a significant
advantage to the cells that could build it
into proteins, but some would have given
more of a boost than others. For example,
amino acids with aromatic rings would have
allowed proteins to manage electrons, which
would have expanded their use as enzymes.
The sulfur-containing amino acids would have
added new chemistry that was not present
in the early amino acids, nor in the nucleic
acids.
Because there were 64 possible permutations
of triplets, the new triplet system could
handle a much wider range of amino acids
than Fred ever could.
Then a cellular Cassius just had to wait
for a genetic chain to appear that created
a useful polypeptide from that new amino
acid, and one more molecule would be on its
way to becoming the next fad in protein-building.
Triplets and Fred
The new triplet system could have started
out creating just a few obscure, small polypeptides,
since the most basic life proteins were still
being create by Fred and Fatcat from the
four original amino acids.
As the triplet system increased the number
of amino acids it could handle, eventually
it would have started producing a wider and
wider range of useful proteins. At some point,
it could have completely replace the Fred-based
transcription system. However, there was
no need for that to occur until the triplet
system had a full repertoire of proteins
that was sufficient to take over from Fred
and Roscoe.
Reading Frame
We were vague about the length of the match
sequences for the blueprint chains and carrier
molecules. There's no specific length
that is necessary for either task, and it
is possible that organisms developed protein-coding
systems that used different reading frame
lengths.
Organisms with a reading frame of four or
more would have been able to uniquely identify
a larger number of amino acid components,
but that would only help if there was a selective
advantage to using a larger choice of molecules,
when building proteins.
The wider reading frame organisms would have
had some mild disadvantages, as well. Their
genetic chains would have to be longer, consuming
more energy and raw materials than their
shorter-chained cousins. They also would
have had a more difficult time finding matches
for all possible permutations (there are
256 choices for a reading frame of four,
and 1024 choices for a reading frame of five).
On the other hand, a reading frame of two
molecules would have only allowed 16 different
amino acids (possibly fewer if some permutations
were duplicates, or used for other tasks).
Since the triplet reading frame is 'just
right', any organisms that used it
would have had an eventual selective advantage,
which would have established that system.
Modern Transcription
The new triplet-coding system was basically
the same as the modern transcription process.
Later generations would refine many of the
details of the process, but we're now
close enough to the final version that we
won't need any more creative molecular
evolution, at least not for protein transcription.
Presumably the introduction of the 'final
20' amino acids took quite a while,
and it's likely that there were many
alternative organisms that used different
sets of amino acids that eventually proved
inferior.
The triplet system would have had some interesting
times in its early days, since some RNA sequences
would not match with any carrier molecules.
But we won't try to follow the details
of that chemistry in this story.
Transfer RNA
Whether triplet coding for proteins began
as a gradual substitution or an independent
process, it still needed some sort of carrier
molecules to bring in each amino acid, and
add them to a new protein chain.
The carriers could have been proteins, but
because of the tight fit at the triplet end,
RNA chains would have worked much better.
That would have been easy to arrange, since
this was the peak of the 'RNA world',
and there were probably already plenty of
RNA-based carriers for other molecules, that
could have shifted into the role of amino
acid carriers.
So it seems likely that the new triplet system
would have started right out using some precursor
to tRNA, the modern RNA chains which still
carry amino acids as part of the transcription
process.
Whenever a new tRNA sequence introduced a
new amino acid, its Cassius would have either
prospered (if it created useful new proteins)
or died (if it interfered lethally with existing
metabolism). Over time, that means that organisms
using the 'best' amino acids
would have increased in number, along with
the tRNA genes that brought them.
Eventually, proteins would have bulked out
to their full complement of 20 amino acids,
with a full set of tRNA molecules to match
any triplet in the genes. However, that process
may have been extremely gradual, possibly
extending over many millions of years.
t-RNA Components
It is interesting that t-RNA uses several
non-standard nucleic acids (pseudouricil,
inosine and hypoxanthine). Modern cells first
build their t-RNA from normal RNA components,
and then modify some of the molecules later.
There is some change that the alternate RNA
molecules represent vestiges of some earlier
alt-Caleb chains that were assimilated prior
to the final DNA/RNA system, or some 'RNA
world' chains that were absorbed from
some separate, more RNA-based organisms,
and then used to help make the transition
from Fred to modern ribosomes.
However, it seems more likely that the non-standard
nucleic acids are a later modification that
allow some of the t-RNA carriers to more
effectively carry their amino acid load.
The Protein World
The first Cassius was probably built from
just four amino acids-- and, most likely,
fairly simple amino acids at that. So it
had to rely heavily on its nucleic acids
to handle most of the chemistry in its enzymatic
actions.
However, once cells started to use triplets
to carry amino acids, then the chemical repertoire
of proteins could gradually expand, until
they took over much of the enzymatic chemistry
that was formerly handled by the aromatic
nucleotides.
Better Protein Chemistry
Once proteins included aromatic components
(histidine, phenylalanine, tryptophan and
tyrosine), they could start to take over
the 'electron management' and 'proton
management' that was necessary for
most enzymatic reactions, and that formerly
had been handled by the nucleic acids.
The addition of the two sulfur-containing
amino acids (cysteine and methionine) added
some extra chemical repertoire that wasn't
available in the nucleic acids. Cysteine
also provided better positioning and structural
possibilities, since it's capable of
forming strong cross-link 'bridges' between
different parts of the protein chain.
Adding proline and glycine made it easier
to 'design' proteins with a specific
structural shape. And the eight remaining
amino acids provided 'design flexibility' for
the evolution of new proteins, since they
had chemical properties that were similar
to existing amino acids, but in a slightly
larger or smaller size.
Protein Domination
In many cases, a purely protein enzyme could
have been more effective than either a ribozyme,
or a combination of a protein with a chain.
That would happen since it is easier for
a protein to precisely position each catalytic
molecule in an active group, thanks to the
extreme flexibility of the peptide bonds.
So it seems likely that proteins would have
become increasingly dominant, as cells were
built from a larger and larger selection
of amino acids. Some ribozymes and helper
chains continued to exist, but the 'RNA
world' gradually was replaced with
the modern 'protein world'.
Amino Acid Evolution
As organisms added new amino acid components,
there would have been plain old Darwinian
selection between them and the older organisms
that were based on fewer components. There
may also have been competition between different
classes of cells that used different 'new' amino
acids.
It seems likely that some cell lines used
other compounds which simply didn't
provide a sufficient advantage, so they were
eventually eliminated.
At some unknown point in the early evolution
of life, cells fixed on the current twenty
amino acids that are used by all modern life
forms (with a few minor exceptions). And
modern, protein-based organisms took their
place in the primordial seas.
Farewell to Helper Chains?
Once proteins started to include amino acids
with aromatic side chains, it would have
been possible to create fully functional
enzymes without the use of RNA helper chains
to provide their chemical oomph.
As enzymes grew larger and more sophisticated,
it also would have stopped being so important
to use RNA chains to position multiple small
enzymes. Cells could use a plain old amino
acid chain to contain multiple active groups
and position them properly, rather than relying
on several short chains positioned with RNA.
Likewise, cells could have started to replace
ribozymes with proteins for many metabolic
functions-- though not completely. RNA
still remains as a carrier for amino acids
in protein synthesis (called tRNA), and it's
still used as an enzyme to this day in a
few special roles.
Although it's unlikely that intron-based
helper chains would have disappeared completely,
it's also probable that they became
less vital for cell functioning, and probably
declined in number .
Farewell to Roscoe
Eventually, of course, it was time for cells
to replace the 'alternating Roscoe' system
with a modern system of DNA replication and
RNA transcription, using complementary base
pairs.
Most likely that change happened in parallel
during the switch to triplet pairing, since
many of the same enzyme systems could have
worked for both transcription and replication.
However, there was no real pressure to make
the switch, and it's possible that
Roscoe still continued for eons later. Since
replication happens just one molecule at
a time, there was never need for a dramatic
shift in the process, as happened with protein
transcription.
|