As cells evolved,
it's pretty certain that diseases and parasites evolved right
along with them. Many chapters back, we looked at how Nathaniel
and Serena invaded the very early Fred/Roscoe system. It's
very likely that they were just the
first of many attackers that took over a living system, had their
moment of triumph, and then gradually found themselves
assimilated .
Viruses and 'selfish genes' are one type of attacker
that works on the genetic chain itself, and they probably appeared
on the scene right from the beginning, even in the days of Clem and
Caleb. Some would have killed off the life forms in their neighborhood,
and some would have been a severe problem for many generations of
cells. But some of the attackers would have coevolved, and eventually
provided something useful for the emerging life forms.
One particular type of 'selfish gene' infection was much
more interesting than most, and those were the transposons. They
probably started out doing no good to cells, but eventually became
an extremely vital part of genetic expression.
Transposons, or 'jumping genes', are stretches of DNA
which can either duplicate or remove themselves from one place in
the genetic chain, and then insert themselves somewhere else. To
do that, they use a variety of methods, all of which require protein
enzymes (often transcribed from genes within the transposon itself).
Let's take a look at how transposons work, and then explore
a couple of different ways in which they would have helped early
cells.
Self Splicing 2.0
In some ways, transposons are similar to
introns, since they can remove themselves from the genetic chain,
and leave no trace behind. However, when transposons leave, it's
more permanent, since they 'pop' out of the DNA chain,
rather than the mRNA.
That exiting would result in a quick end
to the transposon's life, except that the transposon can also
pop back into the genome at some different location, with the help
of a transposase enzyme. In fact, many transposons can replicate
themselves, and pop back into the genome multiple times.
Transposons carry 'direct repeats' at either end of their
genetic sequence.
The transposase enzyme splits the complementary
pairs within each of the direct repeats,
and removes the transposon.
The tranposase enzyme than binds the two
overlapping direct repeats together, so the
main genetic chain is left intact.
Plasmids
While outside the main DNA chain, the two
ends of the transposon can link up and form
a small loop of DNA. Because it has no free
ends, it's more durable than a loose
chain.
These short loops of DNA are called plasmids.
Reinsertion
Unlike introns, transposons also have the
ability to insert themselves into the main
DNA chain. They use a process that is exactly
the reverse of their removal.
First, the transposase carrying the transposon
links up with the direct repeat that it is
designed for. It breaks the complementary
pairing in the repeat region, and cuts the
main genetic chain.
Then the transposase enzyme inserts the transposon
into the gap. The transposon fills the space
seamlessly, since its 'loose ends' match
up with the direct repeating regions on the
main chain.
Transposons probably started out as just
another bit of selfish DNA, attempting to
take advantage of the cell's chemical
machinery to replicate themselves. But their
insertion method provided a convenient 'handle' which
allowed some early cells to take advantage
of their presence, and they soon became valuable
as a 'gene shifter'.
Script Carriers
Instead of thinking of the transposon as
a parasite, let's make a radical shift,
and think of it as a convenient delivery
method for DNA.
We mentioned earlier that it wasn't
a good idea to carry long Foxy scripts within
introns, since that made the protein-coding
exons too far apart. But what if a clever
cell could store the script elsewhere, and
then insert it immediately before it was
needed?
Let's take a closer look at how a transposon
could do that, step by step:
- A cell prepares to create a Foxy
protein. As the DNA transcriptase
starts transcribing
the protein, it hits an intron,
which pops out of the RNA
chain.
- The intron triggers the production
of a transposase enzyme (or
perhaps it carries
a protein-coding stretch
of base pairs to actually
create the transposase from scratch).
- The transposase diffuses to a section
of DNA which contains the
transposon's
inverted repeats, surrounding
a script.
- The transposase cuts the DNA chain,
and removes the section of
DNA that contains the Foxy script. It
then
uses the direct
repeats to patch together
the remainder of the DNA
chain.
- The transposase diffuses back to
the Foxy gene, carrying the
script. It links up to
its match sequence within
a different intron that is
further down the gene.
- The transposase inserts the transposon
into the 'later' intron that
has not yet been converted to RNA
by the transcriptase .
- The transcriptase resumes transcribing
the DNA chain into RNA. When
it reaches the script-containing
intron, it makes an RNA
copy of the script. The intron
then pops out of the RNA
chain, and delivers the script
to Foxy.
- After the transcriptase has passed
the script-containing intron,
the transposase cuts the script sequence
out of that intron,
restoring the gene to its
original sequence.
- The transposase carries the transposon
back to the satellite area
of the genome. When it finds the matching
inverted repeat,
it cuts the satellite DNA,
and inserts the transposon. This could
happen at the original
location, or at some other
place in the DNA chain.
- Meanwhile, Foxy grabs the intron that
contains its script, and
starts to read it.
The Satellite Parking Lot
You might think of the transposon as a system
of valet parking.
When not needed, the script is stored in
a remote portion of the DNA, in a safe place
that doesn't interrupt any protein-coding
genes.
When it is needed, a transposon fetches the
script, and delivers it to an intron, right
where it's needed. It sits there just
long enough to be transcribed into mRNA,
and when done, it's back to satellite
parking.
By using a transposase protein to do the
shifting, cells can be fairly 'smart' about
the transport and delivery. That means that
the transposon action can be managed by the
same regulatory systems that guide the expression
of the primary gene.
Transposon IDs
There is one problem to consider with this
new system of 'parking' long
scripts in the satellite parking lot, and
that's finding the darned thing!
The first transposon that carried a script
solved this problem temporarily by using
its own match sequence to locate the script
(and also the place in the gene to deliver
it). As long as the genome wasn't too
huge, the transposase could read along the
DNA chain until it reached its own identifying 'marker' sequence,
and grab what it needed.
The first cells using satellite scripts could
have added tew transposases with different
sequences, and use them to fetch different
scripts. But they couldn't keep adding
more transposon scripts indefinitely.
The problem is that each transposon sequence
introduced the potential for conflicts with
other parts of the genome. If some other
random gene contained the same sequence used
by the transposon, then there might sometimes
be an accidental removal of some important
gene segment, probably with fatal results.
Cells could 'drift' the sequence
in any given gene to avoid conflict with
a couple of transposons, but if there were
too many separate transposon sequences, eventually
it would become impossible to code anything
in the real genes.
Clearly, cells needed a way to use only a
small number of different transposon match
sequences, and yet still be able to handle
the 'inventory control' of larger
numbers of scripts.
Script ID
Most likely, at some point some lucky cell
stumbled upon a slightly more clever transposase,
which worked with two different ID sequences.
Such a protein would have given its cell
an enormous selective advantage, since it
would have been a much better manager for
multiple scripts.
This new style of transposase would have
both its own match sequence (the direct repeats,
as usual), plus a separate ID sequence that
was unique to each specific script.
It wouldn't have been hard for cells
to develop this system, since they probably
already had some sort of ID marker sequence
at the beginning of each script, to make
it easier to deliver it properly to Foxy,
and so Foxy would know where to begin reading
it. So all the transposase needed to do was
to carry along a relatively short sequence
of DNA (perhaps 10 to 30 base pairs?) and
then use it to match up with the ID sequence
for a specific script. Of course, it also
was still matching also with its own match
sequence.
Once a transposase started to match both
its own direct repeat sequence and the script
ID, then it could also start to transport
more than one script. Because its total match
sequence was longer, it also had less risk
of accidentally matching with the same sequence
that was located in an incorrect portion
of the genome.
You might think of the combination of transposon
match sequence and 'script ID' sequence
as an address for the script. Modern computers
use 32-bit or 64-bit addresses to locate
their data in RAM or on a hard drive, and
you might say that DNA uses a similar system
to locate its own data. A total transposon-plus-script-ID
match sequence of perhaps 30 or 40 base pairs
would result in an 'address space' that
could handle plenty of scripts, without much
risk of conflcting with protein-coding gene
sequences.
Transposon Families
Having multiple scripts 'managed' by
each transposase protein meant that cells
could develop 'families' of scripts-- each
carried by the same transposase, but marked
with a different script ID. It probably would
have been easier for cells to develop new
scripting systems, if they could just take
an existing transposon system, and then simply
change the ID of the script it delivered.
There may have been other benefits to having
a group of scripts all managed by the same
transposase. For example, a mutation in the
script ID in a gene might switch it to a
different script, which might create an interesting
new mutation with some sort of survival value.
Transposon Stacking
When a transposon returns its script to remote
parking, you might notice that it doesn't
necessarily return it to the same exact place.
There's no real reason it would have
to, since it can locate the DNA from anywhere
on the genome, thanks to its combination
of transposon marker sequence and script
ID.
By coincidence, the transposon would probably
tend to store groups of scripts together.
That's because each script would have
one copy of the transposon marker at each
end, increasing the odds that a transposon
would add yet another script in the same
neighborhood.
It makes for an orderly system, not unlike
a library-- with similar bits of genetic
material grouped together, automatically.
Script ID Saturation
Earlier, we talked about how protein-coding
triplets use up all of the 64 possible combinations
of base pairs. That's very advantageous,
from an evolutionary point of view, since
it means that any mutation will result in
some amino acid fitting into that space.
There are no 'dud' permutations
that might stop the protein transcription
process, for lack of a suitable match.
It's possible that script ID's
are also 'saturated' within a
particular transposon family, using up all
possible values in the script ID portion
of the sequence. That would not be hard to
arrange if the script ID sequence is relatively
short.
Having some script matching each possible
sequence means that a Foxy gene with a mutation
in its script ID would still fetch some sort
of script. It might be dramatically differerent
than the original script, but at least the
Foxy protein would be able to create something.
Over enough generations, it's possible
that each scripted gene would link up with
a transposon whose 'family' of
scripts would all be relatively reasonable
to use, for that function.
In that case, mutations in the script ID
would rarely be lethal, and would more often
result in an improvement for the organism.
From Intron to Transposon
We've just described a rather complex
and well-regulated set of steps that uses
transposons to deliver scripts when needed.
Of course this system is way too complicated
to have evolved in one swell foop, so it
must have developed in some more gradual
way.
Introns were the natural storage area for
the first scripts, and as scripts grew in
size, cells would have still kept them there
for quite a while, despite the disadvantages.
At the same time, transposons were probably
lurking in the genome already, as 'selfish
DNA' that merely propagated themselves.
So it's possible that they got together
via a process something like this:
1. A long intron-based script accidentally
picks up the inverted repeats that made it
a target for a transposase-- either
as an accidental mutation of base pairs,
or by crossing over.
2. The transposase pops the script in and
out of its 'home intron' randomly.
3. When the Foxy gene is expressed, sometimes
it has the script handy, and sometimes it
doesn't. When it doesn't, Foxy
does nothing, and the cell regulators keep
trying to express the Foxy gene. Eventually
the script pops back into the intron, and
Foxy works correctly.
4. The cells with this new system survive
slightly better than the original version,
since having the exons closer together is
enough of an advantage, to offset the nuisance
of having an unreliable script.
5. Later generations of the cell add better
management to the transposon delivery process,
and definitely turn it into an evolutionary
advantage.
Long Term Scripts
So far we've talked about transposons
that splice a script into an intron, so it
will be transcribed into RNA and used as
a script. That system could have evolved
easily from the first parasitic transposon.
Unfortunately, there were disadvantages to
using RNA as a script, particularly for longer
ones that might take a while to read. For
one thing, such a long RNA chain could easily
get tangled up with itself, or with other
RNA chains. Even worse, it might be digested
prematurely as part of the cell's general
tidying up of unused mRNA.
Fortunately, there was a convenient solution
available. Since DNA is so much more stable,
it would have been a much better storage
device for longer scripts. The double helix
doesn't tangle with other chains, and
it's much less likely to be destroyed
as part of the cell's maintenance of
excess RNA.
So we can imagine the appearance of a slightly
different scripting system that might go
something like this:
-
A cell prepares to create a Foxy
protein. As the DNA transcriptase starts
transcribing
the protein, it hits an intron, which
pops out of the RNA chain.
- The intron carries an ID match
sequence, plus a trigger
for a transposase enzyme.
The intron and the transposase
link up, and diffuse to a
section of DNA which both the
transposon's inverted repeats,
and the correct match sequence from the
intron.
Meanwhile, the transcriptase
stops its action temporarily.
- The transposase cuts the DNA chain,
and removes a section of
DNA that contains the
Foxy script.
- The transposase acts as a carrier
for the transposon, and diffuses
back to the
Foxy gene.
- Rather than inserting the script
into an intron, the transposase
binds the script's
inverted repeats to each
other, forming a plasmid
(a small loop of DNA). It binds the
plasmid to the Foxy protein.
- Foxy gradually reads the DNA chain.
- When it's finished, the transposase
carries the script back to
a stretch of satellite DNA with the right
marker sequence, and parks
the script.
Retrotransposons
Unfortunately, there is a big disadvantage
to the use of DNA chains as scripts. Because
the script spends so much time disconnected
from the main genetic chain, there is always
a chance that it will be lost, somehow. For
example, if the cell divides while the script
is still in use, the script DNA will not
be replicated by mitosis, so one of the daughter
cells will not get a copy of it.
Fortunately, there's a relatively easy
way to solve that problem. Instead of removing
the script from the genome, a cell could
simply duplicate it, and then work with a
copy of the script (inside its 'carrier' transposon).
Retrotransposons are a variety of transposon
that accomplish just that. They start by
making a regular RNA transcription from the
DNA chain. Then they use a reverse transcriptase
enzyme to make a second DNA copy from that
RNA chain.
Retrotransposons almost certainly started
out as a virus or as 'selfish' DNA,
but it would have been relatively easy for
the cell to subvert their self-propagating
system, and use it as a delivery system for
scripts, instead.
Here's how it might work:
- A cell prepares to create a Foxy protein.
As the DNA transcriptase starts transcribing
the protein, it hits an intron, which
pops out of the RNA chain.
- The intron carries an ID match
sequence, plus a trigger
for a transposase enzyme.
The intron and the transposase
link up, and diffuse to a
section of DNA which has both
the transposon's match sequence,
and the correct match sequence from the
intron.
- The transposase copies a section
of DNA that contains the
Foxy script into RNA, and
then makes a DNA copy from
the RNA.
- The transposase acts as a carrier
for the DNA sequence, and
diffuses back to the
Foxy gene.
- Rather than inserting the script
into an intron, the transposase
binds the script's
inverted repeats to each
other, forming a plasmid (a small loop
of DNA). It binds the
plasmid to the Foxy protein.
- Foxy gradually reads the DNA chain.
- When Foxy is finished, the DNA
chain is released, and
destroyed.
Controlled Mutations
Unfortunately, there was one big problem
with using transposons. There was no guaranteed
way that a transposon could always return
to the same location in the genome, since
the only thing it left behind is its repeat
sequence. That means that a transposon will
pretty much just insert itself at any instance
of that sequence . The problem is even worse
for retrotransposons, since they make a second
copy, which might insert itself at some random,
new place.
If a regular, protein-coding gene happens
to have the same sequence as the transposon's
match, the transposase might very well end
up inserting a hunk of new DNA there. That
means that some important protein will suddenly
have an extra chunk of whatever amino acids
in some random place. The odds are very good
that will not be an improvement.
Even worse, if the transposon's length
is not an even multiple of 3, there will
be a frame shift. When the cell reads the
triplets that are further downstream from
the transposon, everythin gwil lb eshifted
by one place in the triplet sequence, and
the triplet for every single amino acid will
be read incorrectly.
The end result is almost certain to be a
non-functional protein, and a very unhappy
and possibly dead cell.
Beneficial Shutoffs
Fortunately, disabling a gene is not always
a bad thing. Sometimes, there is an advantage
to the cell when a transposon jumps in and
temporarily zaps a gene.
For example, consider the first transposons,
as discovered by Barbara McClintock back
in the 1940's. While the kernels (seeds)
of maize (Indian corn) are forming, transposons
sometimes 'jump' into the middle
of a pigment-producing gene, causing it to
malfunction and create a section of the seed
coat that has a different color. The result
is a random, variegated pattern on the seed.
Having a mottled color probably made it more
difficult for birds to see the seeds, back
in some earlier parent of maize, which would
have given those plants a definite survival
advantage.
You might say that transposons offer a 'randomizer',
for any genes that would be more beneficial,
if applied irregularly.
Controlled Disabling
There are 64 possible permutations of nucleotide
triplets, and only 20 different amino acids.
Three of the triplets code for a 'stop' codon,
but that still leaves an average of about
three triplets coding for each amino acid.
That means that a cell can modify a gene
sequence slightly, and create the same exact
protein sequence.
As cells mutated, natural selection would
have gradually created cells with the 'just
right' amount of transposon action
in its genes.
If it was advantageous to randomly disable
a particular gene, cells that included a
transposon match sequence within the protein-coding
portion of the gene could count on some random
transposons that would link to the sequence,
insert some additional DNA, and probably
disable the gene.
If the gene was vital, the cell simply needed
to avoid the match sequences for transposons,
within its genetic sequence, and it would
never suffer transposon attack.
Protein Changes
So far we have only talked about transposons
that deliver scripts, and those that deliver
something useless that disables a gene. However,
transposons could also move sequences of
nucleotides that coded for the amino acids
in a plain old protein. It would worked via
the same processes that we've already
described, just with a different 'texture' to
the DNA carried in the transposon.
Chameleon Proteins
For example, a cell that occasionally changed
the proteins on the exterior of its cell
membrane might avoid diseases or predators
that would otherwise cue in on its chemical
scent. The cell could include a static, stable
sequence of DNA that built the main part
of the protein, and then randomly change
another part of the protein via transposons.
If the change happened quickly and frequently,
the cell could have a mixture of several
different proteins on the membrane, all built
from the same primary gene.
If the change was rarer, then each cell might
have just one protein, but different generations
of the cell might be different from each
other.
During this stage of life, cells already
had well developed mechanisms for controlling
gene expression. That means that they could
have controlled the frequency of protein
changes, simply by regulating the number
of transposase enzymes that were produced.
Cells probably would have used different
match sequences and transposases for different
functions-- since the protein-coding
fragments would have a different 'texture' than
scripts, and it was unlikely to be beneficial
to insert a protein sequence into a script,
or vice-versa.
Mix & Match Enzymes
When a cell developed a useful protein sequence
that added a new function, transposons would
have been a great way to expand that innovation
into other cell proteins. All it would take
is the right marker sequence, and a retrotransposon
would duplicate the DNA coding for a useful
enzyme segment, and then insert it into other
genes.
That would have been a great way for cells
to 'design' new enzymes, by adding
functional groups from other enzymes into
a different base protein that might give
it an entirely different function.
It could also work for structural proteins.
Script Mutations
We've already mentioned replication
slippage as a way for cells to gradually 'fine
tune' the size and shape of cell structures
that were formed via a script. New cells
might have slightly longer or shorter scripts,
which would result in a slightly larger or
smaller structure. Then natural selection
would favor whichever size was optimal, and
the species would gradually drift into relative
perfection.
Replication slippage was fine for small changes,
but there would also have been some advantage
to cells that would sometimes make larger
changes to scripts. Even though it was more
likely to prove lethal to a cell, it might
also result in a larger-scale innovation
that wouldn't have been possible to
reach via replication slippage.
Transposons could have served as a 'scrambler' of
scripts to keep up the pace of evolutionary
change in the safest possible way.
Moving a copy of an existing script into
a different script would have an important
advantage over random mutations or insertions-- it
would be more likely to introduce some sort
of 'useful' pattern that was
already used for specifying some other beneficial
structure.
What makes a 'useful' pattern?
Well, simple straight repetitions (AAAAA)
or alternating structures (ATATATAT) would
frequently create a useful pattern, but there
might be other combinations that would also
increase survival value in a particular type
of structure.
In a sense, plain old natural selection chooses
the 'useful' patterns, whatever
they happen to be-- and then retrotransposons
occasionally duplicate them and take them
somewhere else, in the hope that the same
pattern will be useful in an entirely different
structure.
Script ID
Once cells starting using transposons to
deliver scripts, they had another potential
way to mutate, and change a script. That
would be if the ID sequence used to retrieve
the script changed, and resulted in a transposon
that brought in an entirely different script,
instead.
It would be a very drastic change-- probably
lethal, but occasionally something different
and better.
Satellite Tricks
Satellite scripts offer some advantages over
intron scripts, since they are independently
accessible from more than one gene.
Multiple Scripts, one ID
For example, there might be more than one
script in the satellite DNA that would be
marked with the same ID number. If each of
the scripts were part of a different operon,
repressors that worked with some other genes
could turn on different scripts at different
times. That would allow a Foxy protein to
have different actions at different times
in the life cycle of the organism, or when
there were different metabolic conditions.
There could also be multiple scripts with
the same ID that would both be on at the
same time. In that case, sometimes one script
would be used and sometimes the other, more
or less randomly. That might be advantageous
in some cases-- for example, having
random positioning of surface molecules might
help a protozoan to be less visible to predators
or parasites.
One Script, Multiple Foxys
It's also possible that more than one
Foxy protein would use the same script, simply
by referencing the same 'script ID' sequence.
Expressing two different functions from one
script would be a powerful way to link two
different evolutionary items that need to
be linked-- for example, the pattern
of surface proteins on a cell, and a matching
recognition site that would allow cells to
recognize others of their species.
Such genetic linkages would not be easy to
set up, but once they were in place, a species
could diverge much more quickly, since changes
would happen simultaneously in two complementary
structures at the same time.
Transposons and Gene Headers
Since the movement of transposons is controlled
by the transposase protein, cells had the
potential to tie them in with the regular
machinery of gene control.
For example, a transposase might carry a
short RNA 'gene ID' match sequence
that would first link it to a specific gene.
The transposase protein would then be ideally
positioned to physically deliver the transposon
to the correct part of the gene, where it
would match up with its transposon match
sequence, and then insert into the proper
place within the proper gene.
The cell could manage the chemical activity
of the transposase, exactly the same way
as it managed other regulatory proteins.
For example, it might release the transposon
at exactly the right time during protein
transcription, and then retrieve it when
transcription was finished.
Mutation regulation
It's also possible that transposase
proteins interacted with gene headers as
part of the process of 'controlled
mutations'.
For example, an organism could mark some
genes for frequent mutation by using a sequence
of base pairs in the gene header that a transposase
would consider 'attractive'.
Likewise, it could mark other genes with
a different sequence that would prevent transposons
from changing them.
Transposons could also increase the chances
of beneficial mutations by simply incrementing
some kind of 'counter' within
each gene that they change (perhaps an intron
with a repeating sequence that acted as a
counter). Over time, genes that had frequent
beneficial mutations would show a high count,
while sensitive genes that couldn't
change successfully would show a zero count
(since any organisms with changes there would
die). Transposons could then weight their
changing action, based on the counts of past
successes for each gene.
Of course, having transposases interacting
with gene headers is a much more complex
system than just relying on the transposon
match sequences that we talked about earlier.
Cells may never have ever developed such
a system, or it may not have appeared until
much later. However, fitting mutability into
the gene header system would certainly have
made it easier for early organisms to evolve.
Perhaps we'll know more about some
of the details of 'controlled mutations',
as we learn more about the chemistry of transposons,
and the transposase proteins that manage
them.
|