Evolution of DNA


Introduction
First Protein Transcription
First Genetic Replication
First Feedback
Puddle Evolution
First Dispersal & Evolution
First Parasite
First Organism
First Cell Metabolism
First Self-Sufficiency
Aromatic Assistants
First Assimilation
First Transfer Molecules
Eight Molecule Life
Complementary Base Pairs
Energy Sources
Conquering the Oceans
First Cells
Cellular Explosion
Gene Regulation
Chromosomes
First DNA
Introns
Wider Reading Frames
Complementary Triplets
Cellular Scripts
The Spread of Foxy
Another Parasite-- Transposons
First Schism
Improved Gene Regulation
Cell Structures
Eukaryote Explosion
Multi-Cellular Scripts
Cambrian Explosion
Epilog
Appendix 1-- Prebiotic Earth
Appendix 2-- Primordial Puddles
Appendix 3-- Primordial Catalysts
Appendix 4-- C Value Enigma
Cast of Characters

Transposons

As cells evolved, it's pretty certain that diseases and parasites evolved right along with them. Many chapters back, we looked at how Nathaniel and Serena invaded the very early Fred/Roscoe system. It's very likely that they were just the first of many attackers that took over a living system, had their moment of triumph, and then gradually found themselves assimilated .

Viruses and 'selfish genes' are one type of attacker that works on the genetic chain itself, and they probably appeared on the scene right from the beginning, even in the days of Clem and Caleb. Some would have killed off the life forms in their neighborhood, and some would have been a severe problem for many generations of cells. But some of the attackers would have coevolved, and eventually provided something useful for the emerging life forms.

One particular type of 'selfish gene' infection was much more interesting than most, and those were the transposons. They probably started out doing no good to cells, but eventually became an extremely vital part of genetic expression.

Transposons, or 'jumping genes', are stretches of DNA which can either duplicate or remove themselves from one place in the genetic chain, and then insert themselves somewhere else. To do that, they use a variety of methods, all of which require protein enzymes (often transcribed from genes within the transposon itself).

Let's take a look at how transposons work, and then explore a couple of different ways in which they would have helped early cells.

Self Splicing 2.0

In some ways, transposons are similar to introns, since they can remove themselves from the genetic chain, and leave no trace behind. However, when transposons leave, it's more permanent, since they 'pop' out of the DNA chain, rather than the mRNA.

That exiting would result in a quick end to the transposon's life, except that the transposon can also pop back into the genome at some different location, with the help of a transposase enzyme. In fact, many transposons can replicate themselves, and pop back into the genome multiple times.

Transposons carry 'direct repeats' at either end of their genetic sequence.

The transposase enzyme splits the complementary pairs within each of the direct repeats, and removes the transposon.

The tranposase enzyme than binds the two overlapping direct repeats together, so the main genetic chain is left intact.

Plasmids

While outside the main DNA chain, the two ends of the transposon can link up and form a small loop of DNA. Because it has no free ends, it's more durable than a loose chain.

These short loops of DNA are called plasmids.

Reinsertion

Unlike introns, transposons also have the ability to insert themselves into the main DNA chain. They use a process that is exactly the reverse of their removal.

First, the transposase carrying the transposon links up with the direct repeat that it is designed for. It breaks the complementary pairing in the repeat region, and cuts the main genetic chain.

Then the transposase enzyme inserts the transposon into the gap. The transposon fills the space seamlessly, since its 'loose ends' match up with the direct repeating regions on the main chain.

Transposons probably started out as just another bit of selfish DNA, attempting to take advantage of the cell's chemical machinery to replicate themselves. But their insertion method provided a convenient 'handle' which allowed some early cells to take advantage of their presence, and they soon became valuable as a 'gene shifter'.

Script Carriers

Instead of thinking of the transposon as a parasite, let's make a radical shift, and think of it as a convenient delivery method for DNA.

We mentioned earlier that it wasn't a good idea to carry long Foxy scripts within introns, since that made the protein-coding exons too far apart. But what if a clever cell could store the script elsewhere, and then insert it immediately before it was needed?

Let's take a closer look at how a transposon could do that, step by step:

  1. A cell prepares to create a Foxy protein. As the DNA transcriptase starts transcribing the protein, it hits an intron, which pops out of the RNA chain.
  2. The intron triggers the production of a transposase enzyme (or perhaps it carries a protein-coding stretch of base pairs to actually create the transposase from scratch).
  3. The transposase diffuses to a section of DNA which contains the transposon's inverted repeats, surrounding a script.
  4. The transposase cuts the DNA chain, and removes the section of DNA that contains the Foxy script. It then uses the direct repeats to patch together the remainder of the DNA chain.
  5. The transposase diffuses back to the Foxy gene, carrying the script. It links up to its match sequence within a different intron that is further down the gene.
  6. The transposase inserts the transposon into the 'later' intron that has not yet been converted to RNA by the transcriptase .
  7. The transcriptase resumes transcribing the DNA chain into RNA. When it reaches the script-containing intron, it makes an RNA copy of the script. The intron then pops out of the RNA chain, and delivers the script to Foxy.
  8. After the transcriptase has passed the script-containing intron, the transposase cuts the script sequence out of that intron, restoring the gene to its original sequence.
  9. The transposase carries the transposon back to the satellite area of the genome. When it finds the matching inverted repeat, it cuts the satellite DNA, and inserts the transposon. This could happen at the original location, or at some other place in the DNA chain.
  10. Meanwhile, Foxy grabs the intron that contains its script, and starts to read it.

The Satellite Parking Lot

You might think of the transposon as a system of valet parking.

When not needed, the script is stored in a remote portion of the DNA, in a safe place that doesn't interrupt any protein-coding genes.

When it is needed, a transposon fetches the script, and delivers it to an intron, right where it's needed. It sits there just long enough to be transcribed into mRNA, and when done, it's back to satellite parking.

By using a transposase protein to do the shifting, cells can be fairly 'smart' about the transport and delivery. That means that the transposon action can be managed by the same regulatory systems that guide the expression of the primary gene.

Transposon IDs

There is one problem to consider with this new system of 'parking' long scripts in the satellite parking lot, and that's finding the darned thing!

The first transposon that carried a script solved this problem temporarily by using its own match sequence to locate the script (and also the place in the gene to deliver it). As long as the genome wasn't too huge, the transposase could read along the DNA chain until it reached its own identifying 'marker' sequence, and grab what it needed.

The first cells using satellite scripts could have added tew transposases with different sequences, and use them to fetch different scripts. But they couldn't keep adding more transposon scripts indefinitely.

The problem is that each transposon sequence introduced the potential for conflicts with other parts of the genome. If some other random gene contained the same sequence used by the transposon, then there might sometimes be an accidental removal of some important gene segment, probably with fatal results.

Cells could 'drift' the sequence in any given gene to avoid conflict with a couple of transposons, but if there were too many separate transposon sequences, eventually it would become impossible to code anything in the real genes.

Clearly, cells needed a way to use only a small number of different transposon match sequences, and yet still be able to handle the 'inventory control' of larger numbers of scripts.

Script ID

Most likely, at some point some lucky cell stumbled upon a slightly more clever transposase, which worked with two different ID sequences. Such a protein would have given its cell an enormous selective advantage, since it would have been a much better manager for multiple scripts.

This new style of transposase would have both its own match sequence (the direct repeats, as usual), plus a separate ID sequence that was unique to each specific script.

It wouldn't have been hard for cells to develop this system, since they probably already had some sort of ID marker sequence at the beginning of each script, to make it easier to deliver it properly to Foxy, and so Foxy would know where to begin reading it. So all the transposase needed to do was to carry along a relatively short sequence of DNA (perhaps 10 to 30 base pairs?) and then use it to match up with the ID sequence for a specific script. Of course, it also was still matching also with its own match sequence.

Once a transposase started to match both its own direct repeat sequence and the script ID, then it could also start to transport more than one script. Because its total match sequence was longer, it also had less risk of accidentally matching with the same sequence that was located in an incorrect portion of the genome.

You might think of the combination of transposon match sequence and 'script ID' sequence as an address for the script. Modern computers use 32-bit or 64-bit addresses to locate their data in RAM or on a hard drive, and you might say that DNA uses a similar system to locate its own data. A total transposon-plus-script-ID match sequence of perhaps 30 or 40 base pairs would result in an 'address space' that could handle plenty of scripts, without much risk of conflcting with protein-coding gene sequences.

Transposon Families

Having multiple scripts 'managed' by each transposase protein meant that cells could develop 'families' of scripts-- each carried by the same transposase, but marked with a different script ID. It probably would have been easier for cells to develop new scripting systems, if they could just take an existing transposon system, and then simply change the ID of the script it delivered.

There may have been other benefits to having a group of scripts all managed by the same transposase. For example, a mutation in the script ID in a gene might switch it to a different script, which might create an interesting new mutation with some sort of survival value.

Transposon Stacking

When a transposon returns its script to remote parking, you might notice that it doesn't necessarily return it to the same exact place. There's no real reason it would have to, since it can locate the DNA from anywhere on the genome, thanks to its combination of transposon marker sequence and script ID.

By coincidence, the transposon would probably tend to store groups of scripts together. That's because each script would have one copy of the transposon marker at each end, increasing the odds that a transposon would add yet another script in the same neighborhood.

It makes for an orderly system, not unlike a library-- with similar bits of genetic material grouped together, automatically.

Script ID Saturation

Earlier, we talked about how protein-coding triplets use up all of the 64 possible combinations of base pairs. That's very advantageous, from an evolutionary point of view, since it means that any mutation will result in some amino acid fitting into that space. There are no 'dud' permutations that might stop the protein transcription process, for lack of a suitable match.

It's possible that script ID's are also 'saturated' within a particular transposon family, using up all possible values in the script ID portion of the sequence. That would not be hard to arrange if the script ID sequence is relatively short.

Having some script matching each possible sequence means that a Foxy gene with a mutation in its script ID would still fetch some sort of script. It might be dramatically differerent than the original script, but at least the Foxy protein would be able to create something.

Over enough generations, it's possible that each scripted gene would link up with a transposon whose 'family' of scripts would all be relatively reasonable to use, for that function.

In that case, mutations in the script ID would rarely be lethal, and would more often result in an improvement for the organism.

From Intron to Transposon

We've just described a rather complex and well-regulated set of steps that uses transposons to deliver scripts when needed. Of course this system is way too complicated to have evolved in one swell foop, so it must have developed in some more gradual way.

Introns were the natural storage area for the first scripts, and as scripts grew in size, cells would have still kept them there for quite a while, despite the disadvantages.

At the same time, transposons were probably lurking in the genome already, as 'selfish DNA' that merely propagated themselves.

So it's possible that they got together via a process something like this:

1. A long intron-based script accidentally picks up the inverted repeats that made it a target for a transposase-- either as an accidental mutation of base pairs, or by crossing over.

2. The transposase pops the script in and out of its 'home intron' randomly.

3. When the Foxy gene is expressed, sometimes it has the script handy, and sometimes it doesn't. When it doesn't, Foxy does nothing, and the cell regulators keep trying to express the Foxy gene. Eventually the script pops back into the intron, and Foxy works correctly.

4. The cells with this new system survive slightly better than the original version, since having the exons closer together is enough of an advantage, to offset the nuisance of having an unreliable script.

5. Later generations of the cell add better management to the transposon delivery process, and definitely turn it into an evolutionary advantage.

Long Term Scripts

So far we've talked about transposons that splice a script into an intron, so it will be transcribed into RNA and used as a script. That system could have evolved easily from the first parasitic transposon.

Unfortunately, there were disadvantages to using RNA as a script, particularly for longer ones that might take a while to read. For one thing, such a long RNA chain could easily get tangled up with itself, or with other RNA chains. Even worse, it might be digested prematurely as part of the cell's general tidying up of unused mRNA.

Fortunately, there was a convenient solution available. Since DNA is so much more stable, it would have been a much better storage device for longer scripts. The double helix doesn't tangle with other chains, and it's much less likely to be destroyed as part of the cell's maintenance of excess RNA.

So we can imagine the appearance of a slightly different scripting system that might go something like this:

  1. A cell prepares to create a Foxy protein. As the DNA transcriptase starts transcribing the protein, it hits an intron, which pops out of the RNA chain.
  2. The intron carries an ID match sequence, plus a trigger for a transposase enzyme. The intron and the transposase link up, and diffuse to a section of DNA which both the transposon's inverted repeats, and the correct match sequence from the intron. Meanwhile, the transcriptase stops its action temporarily.
  3. The transposase cuts the DNA chain, and removes a section of DNA that contains the Foxy script.
  4. The transposase acts as a carrier for the transposon, and diffuses back to the Foxy gene.
  5. Rather than inserting the script into an intron, the transposase binds the script's inverted repeats to each other, forming a plasmid (a small loop of DNA). It binds the plasmid to the Foxy protein.
  6. Foxy gradually reads the DNA chain.
  7. When it's finished, the transposase carries the script back to a stretch of satellite DNA with the right marker sequence, and parks the script.

Retrotransposons

Unfortunately, there is a big disadvantage to the use of DNA chains as scripts. Because the script spends so much time disconnected from the main genetic chain, there is always a chance that it will be lost, somehow. For example, if the cell divides while the script is still in use, the script DNA will not be replicated by mitosis, so one of the daughter cells will not get a copy of it.

Fortunately, there's a relatively easy way to solve that problem. Instead of removing the script from the genome, a cell could simply duplicate it, and then work with a copy of the script (inside its 'carrier' transposon).

Retrotransposons are a variety of transposon that accomplish just that. They start by making a regular RNA transcription from the DNA chain. Then they use a reverse transcriptase enzyme to make a second DNA copy from that RNA chain.

Retrotransposons almost certainly started out as a virus or as 'selfish' DNA, but it would have been relatively easy for the cell to subvert their self-propagating system, and use it as a delivery system for scripts, instead.

Here's how it might work:

  1. A cell prepares to create a Foxy protein. As the DNA transcriptase starts transcribing the protein, it hits an intron, which pops out of the RNA chain.
  2. The intron carries an ID match sequence, plus a trigger for a transposase enzyme. The intron and the transposase link up, and diffuse to a section of DNA which has both the transposon's match sequence, and the correct match sequence from the intron.
  3. The transposase copies a section of DNA that contains the Foxy script into RNA, and then makes a DNA copy from the RNA.
  4. The transposase acts as a carrier for the DNA sequence, and diffuses back to the Foxy gene.
  5. Rather than inserting the script into an intron, the transposase binds the script's inverted repeats to each other, forming a plasmid (a small loop of DNA). It binds the plasmid to the Foxy protein.
  6. Foxy gradually reads the DNA chain.
  7. When Foxy is finished, the DNA chain is released, and destroyed.

Controlled Mutations

Unfortunately, there was one big problem with using transposons. There was no guaranteed way that a transposon could always return to the same location in the genome, since the only thing it left behind is its repeat sequence. That means that a transposon will pretty much just insert itself at any instance of that sequence . The problem is even worse for retrotransposons, since they make a second copy, which might insert itself at some random, new place.

If a regular, protein-coding gene happens to have the same sequence as the transposon's match, the transposase might very well end up inserting a hunk of new DNA there. That means that some important protein will suddenly have an extra chunk of whatever amino acids in some random place. The odds are very good that will not be an improvement.

Even worse, if the transposon's length is not an even multiple of 3, there will be a frame shift. When the cell reads the triplets that are further downstream from the transposon, everythin gwil lb eshifted by one place in the triplet sequence, and the triplet for every single amino acid will be read incorrectly.

The end result is almost certain to be a non-functional protein, and a very unhappy and possibly dead cell.

Beneficial Shutoffs

Fortunately, disabling a gene is not always a bad thing. Sometimes, there is an advantage to the cell when a transposon jumps in and temporarily zaps a gene.

For example, consider the first transposons, as discovered by Barbara McClintock back in the 1940's. While the kernels (seeds) of maize (Indian corn) are forming, transposons sometimes 'jump' into the middle of a pigment-producing gene, causing it to malfunction and create a section of the seed coat that has a different color. The result is a random, variegated pattern on the seed.

Having a mottled color probably made it more difficult for birds to see the seeds, back in some earlier parent of maize, which would have given those plants a definite survival advantage.

You might say that transposons offer a 'randomizer', for any genes that would be more beneficial, if applied irregularly.

Controlled Disabling

There are 64 possible permutations of nucleotide triplets, and only 20 different amino acids. Three of the triplets code for a 'stop' codon, but that still leaves an average of about three triplets coding for each amino acid.

That means that a cell can modify a gene sequence slightly, and create the same exact protein sequence.

As cells mutated, natural selection would have gradually created cells with the 'just right' amount of transposon action in its genes.

If it was advantageous to randomly disable a particular gene, cells that included a transposon match sequence within the protein-coding portion of the gene could count on some random transposons that would link to the sequence, insert some additional DNA, and probably disable the gene.

If the gene was vital, the cell simply needed to avoid the match sequences for transposons, within its genetic sequence, and it would never suffer transposon attack.

Protein Changes

So far we have only talked about transposons that deliver scripts, and those that deliver something useless that disables a gene. However, transposons could also move sequences of nucleotides that coded for the amino acids in a plain old protein. It would worked via the same processes that we've already described, just with a different 'texture' to the DNA carried in the transposon.

Chameleon Proteins

For example, a cell that occasionally changed the proteins on the exterior of its cell membrane might avoid diseases or predators that would otherwise cue in on its chemical scent. The cell could include a static, stable sequence of DNA that built the main part of the protein, and then randomly change another part of the protein via transposons.

If the change happened quickly and frequently, the cell could have a mixture of several different proteins on the membrane, all built from the same primary gene.

If the change was rarer, then each cell might have just one protein, but different generations of the cell might be different from each other.

During this stage of life, cells already had well developed mechanisms for controlling gene expression. That means that they could have controlled the frequency of protein changes, simply by regulating the number of transposase enzymes that were produced.

Cells probably would have used different match sequences and transposases for different functions-- since the protein-coding fragments would have a different 'texture' than scripts, and it was unlikely to be beneficial to insert a protein sequence into a script, or vice-versa.

Mix & Match Enzymes

When a cell developed a useful protein sequence that added a new function, transposons would have been a great way to expand that innovation into other cell proteins. All it would take is the right marker sequence, and a retrotransposon would duplicate the DNA coding for a useful enzyme segment, and then insert it into other genes.

That would have been a great way for cells to 'design' new enzymes, by adding functional groups from other enzymes into a different base protein that might give it an entirely different function.

It could also work for structural proteins.

Script Mutations

We've already mentioned replication slippage as a way for cells to gradually 'fine tune' the size and shape of cell structures that were formed via a script. New cells might have slightly longer or shorter scripts, which would result in a slightly larger or smaller structure. Then natural selection would favor whichever size was optimal, and the species would gradually drift into relative perfection.

Replication slippage was fine for small changes, but there would also have been some advantage to cells that would sometimes make larger changes to scripts. Even though it was more likely to prove lethal to a cell, it might also result in a larger-scale innovation that wouldn't have been possible to reach via replication slippage.

Transposons could have served as a 'scrambler' of scripts to keep up the pace of evolutionary change in the safest possible way.

Moving a copy of an existing script into a different script would have an important advantage over random mutations or insertions-- it would be more likely to introduce some sort of 'useful' pattern that was already used for specifying some other beneficial structure.

What makes a 'useful' pattern? Well, simple straight repetitions (AAAAA) or alternating structures (ATATATAT) would frequently create a useful pattern, but there might be other combinations that would also increase survival value in a particular type of structure.

In a sense, plain old natural selection chooses the 'useful' patterns, whatever they happen to be-- and then retrotransposons occasionally duplicate them and take them somewhere else, in the hope that the same pattern will be useful in an entirely different structure.

Script ID

Once cells starting using transposons to deliver scripts, they had another potential way to mutate, and change a script. That would be if the ID sequence used to retrieve the script changed, and resulted in a transposon that brought in an entirely different script, instead.

It would be a very drastic change-- probably lethal, but occasionally something different and better.

Satellite Tricks

Satellite scripts offer some advantages over intron scripts, since they are independently accessible from more than one gene.

Multiple Scripts, one ID

For example, there might be more than one script in the satellite DNA that would be marked with the same ID number. If each of the scripts were part of a different operon, repressors that worked with some other genes could turn on different scripts at different times. That would allow a Foxy protein to have different actions at different times in the life cycle of the organism, or when there were different metabolic conditions.

There could also be multiple scripts with the same ID that would both be on at the same time. In that case, sometimes one script would be used and sometimes the other, more or less randomly. That might be advantageous in some cases-- for example, having random positioning of surface molecules might help a protozoan to be less visible to predators or parasites.

One Script, Multiple Foxys

It's also possible that more than one Foxy protein would use the same script, simply by referencing the same 'script ID' sequence.

Expressing two different functions from one script would be a powerful way to link two different evolutionary items that need to be linked-- for example, the pattern of surface proteins on a cell, and a matching recognition site that would allow cells to recognize others of their species.

Such genetic linkages would not be easy to set up, but once they were in place, a species could diverge much more quickly, since changes would happen simultaneously in two complementary structures at the same time.

Transposons and Gene Headers

Since the movement of transposons is controlled by the transposase protein, cells had the potential to tie them in with the regular machinery of gene control.

For example, a transposase might carry a short RNA 'gene ID' match sequence that would first link it to a specific gene. The transposase protein would then be ideally positioned to physically deliver the transposon to the correct part of the gene, where it would match up with its transposon match sequence, and then insert into the proper place within the proper gene.

The cell could manage the chemical activity of the transposase, exactly the same way as it managed other regulatory proteins. For example, it might release the transposon at exactly the right time during protein transcription, and then retrieve it when transcription was finished.

Mutation regulation

It's also possible that transposase proteins interacted with gene headers as part of the process of 'controlled mutations'.

For example, an organism could mark some genes for frequent mutation by using a sequence of base pairs in the gene header that a transposase would consider 'attractive'. Likewise, it could mark other genes with a different sequence that would prevent transposons from changing them.

Transposons could also increase the chances of beneficial mutations by simply incrementing some kind of 'counter' within each gene that they change (perhaps an intron with a repeating sequence that acted as a counter). Over time, genes that had frequent beneficial mutations would show a high count, while sensitive genes that couldn't change successfully would show a zero count (since any organisms with changes there would die). Transposons could then weight their changing action, based on the counts of past successes for each gene.

Of course, having transposases interacting with gene headers is a much more complex system than just relying on the transposon match sequences that we talked about earlier. Cells may never have ever developed such a system, or it may not have appeared until much later. However, fitting mutability into the gene header system would certainly have made it easier for early organisms to evolve.

Perhaps we'll know more about some of the details of 'controlled mutations', as we learn more about the chemistry of transposons, and the transposase proteins that manage them.

Previous

   

 

Site launched 8/7/07

Contact the Author