|Evolution of DNA -
Back in the days of Caleb and Cassius, we mentioned that there was an ongoing problem caused by helper chains-- since they needed to be replicated by Roscoe, rather than transcribed into proteins by Fred. We mentioned that a special 'header' on the helper chains might have allowed Fred to avoid them.
Introducing a 'master copy' of the genes, and then using throw-away copies, was a also good thing for the helper chains. Since the mRNA was being replicated anyhow, as part of the gene expression, it didn't take any extra effort to get the helper chains replicated for daily use, any more.
Unfortunately, combining genes into operons made the problem worse again, since all of the protein-coding genes and helper chain genes were all slammed together in a single sequence, with no simple way to distinguish them. Cells needed some way to separate the helpers from the protein-coding RNA, and then position them, before use.
Fortunately, some lucky cell stumbled upon an interesting RNA sequence which allowed cells to mix two different types of coding within a single gene. This new feature couldn't have arisen before complementary pairing appeared, but it probably developed as soon as cells started combining genes into operons, and using DNA for the main genetic sequence.
We've already talked about the ability of RNA strands to act as enzymes. As it happens, some just-right RNA sequences are capable of using a small bit of catalysis to exactly solve the problem of separating helper chains from protein-coding RNA. Under the right conditions, those RNA chains can do the following:
1. We start with an mRNA chain, before it is transcribed into a protein. The striped nucleotides are the intron, a self-splicing portion of the gene which includes a helper chain, plus a few clever nucleotides on either side.
2. The beginning of the intron bends around and binds chemically to a section near the end of the intron.
3. That exposes some active groups on the two ends of the intron, which catalyze a split between the first part of the gene and the intron.
4. The intron then merges the two parts of the protein-coding gene, and removes itself entirely from the genetic chain.
5. Now the intron (dark nucleotides) can 'carry' the helper chain to wherever it is needed. Meanwhile, Fred and Fatcat transcribe the protein-coding portion of the mRNA (white nucleotides) into a protein.
There are actually two different self-splicing methods used by modern introns . Type I introns use a guanine nucleotide as an intermediary, and produce a loop of RNA. Type II introns need no outside help, and produce a characteristic 'lariat' shape with the beginning of the intron bonded a short distance from the end of the intron.
Interestingly, both forms of intron removal require no net energy loss (the energy it takes to break the two bonds is returned when two new bonds form). That means that intron self-splicing could have occurred even in early forms of Caleb or Cassius, despite their lack of energy management skills. Because of that, it's possible that cells found introns useful for management and delivery of helper chains very early on, even before genes started to consolidate into a single chain.
Early cells may have stumbled upon both types of introns, purely by accident. Or the two alternate forms may have evolved separately in different organisms, and them passed on to other organisms by assimilation, or by 'horizontal transfer' of genetic material.
What are introns good for?
Well, between the required beginning and ending sequences which perform the self-splicing action, there is a length of RNA which can contain any sequence at all. So you might think of an intron as a convenient way to pop out a helper chain, right when and where it is needed.
Fatcat and Fred simply need to initiate the self-splicing reaction, and they then can continue along with the main transcription of the protein-coding portions of the gene, automatically releasing each helper chain at just the right time and place to be useful.
Introns and DNA
Self-splicing introns would have actually been a serious problem, back when RNA was the main genetic chain. Since they could pop out but not pop back, they would have permanently removed themselves from future generations-- and that would hardly have been beneficial for Cassius. That means that introns were a good solution only after the appearance of a 'master copy' of the genetic material.
In fact, the 'self removing' properties of intron sequences would have given early cells a strong incentive to switch to DNA as their main genetic chain, as early as possible. Because the DNA chain is more rigid, the same self-splicing sequence is not capable of self-removing, when expressed as DNA rather than RNA.
In other words, introns probably pre-date DNA, and may have been the primary driving force for its creation.
Origins of Introns
How did introns first appear?
It is tempting to think they might originally have been some sort of RNA parasites (and in fact early drafts of this book used that notion, since it made such a cute sequence to have both Nathaniel and introns as successive parasites). Later on we will talk about some additional (and equally important) 'mobile' DNA which probably was introduced by a gene parasite.
Unfortunately, neither of the self-splicing introns have any way to insert themselves into genes-- all they can do is remove themselves. It would be a rather incompetent parasite that could only exit its host, without ever infecting it in the first place.
Since it's unlikely that introns somehow started out with an insertion sequence and then lost it, it seems simpler to conclude that the self-splicing RNA sequences appeared strictly by accident, and just happened to 'wrap' a backbone chain so it could be used more effectively.
Speaking of intron insertions, there is still an awkward transition to consider.
Cassius almost certainly contained a goodly number of helper chains before the consolidation of its separate RNA genes into a single DNA strand, and the question is, what happened to them during the transition period when RNA genes switched to DNA?
Helper RNA chains could have floated around individually for a while after DNA became a plasmid, but at some point they needed to join the main genetic chain for all the reasons discussed earlier in this chapter.
Probably the best way to do that would have been an 'intron inserter' protein that would wrap a chain with the self-spicing sequence, and then insert it into the DNA chain via a reverse polymerase.
Of course such a protein would have no way to 'know' where to put any given chain. However, inserting intron chains into random locations was a relatively low-risk move, since the intron would pop itself out of the mRNA before being transcribed, and would never affect any protein sequences.
About the worst that could happen is that a helper chain would be imbedded in the wrong enzyme, and somehow cause havoc during the folding of that enzyme, or during the construction of an enzyme complex from a protein and short helper sequences.
On the other hand, any Cassius that managed to pop the right chain into the right gene would have gained an enormous advantage. Not only were its helper chains now securely stored in the genome, but they were also in just the right place to help with protein folding or assembly. That would have given them quite an advantage over any earlier Cassius that had to rely on simple diffusion to bring in the right 'helper' chains from somewhere else.
An 'intron inserter' protein would have been beneficial for Cassius even after it had tucked away its 'legacy' chains into the right parts of its genome.
Adding a new intron to a gene would have continued to be a fairly low-risk mutation, as compared to changing the sequence of the protein-coding portion of the gene.
When Fatcat ran into a newly inserted stretch of RNA, it would have popped out a new chain sequence, which generally would do nothing lethal to the functionality of the main protein (unlike the probable consequences of a change in its amino acid sequence). Frequently the chain would have no effect, but occasionally it just might link up with the protein to add some functionality, or bend it into a more productive form.
Of course, over the course of many generations, natural selection works on introns just the same as on regular protein-coding genes. Lethal introns would disappear quickly as their carriers died. Deleterious introns would still fade away over the course of a few generations as their hosts survived less well than their neighbors. And even neutral introns would eventually become less common, since they still imposed a metabolic cost on the cells that contained them.
A few chapters back, we mentioned the development of gene ID, and the appearance of a regulatory header on each gene.
As cells grew more complex, the advantage to 'linking' genes to each other would have also grown. For example, a gene that managed the splitting of a cell would need to trigger several other processes before it could initiate the split-- replicating the DNA, adding some extra cell membrane, building a 'splitter' version of Nathaniel, and so on.
The easiest way for a gene to send off a messenger to activate some other, related gene would be to include it in an intron. Part of the intron sequence would be the ID 'number' of the other gene, so the intron messenger could find the right gene to activate. And part would be whatever it took to get the cell to recognize the RNA sequence as a trigger meant to 'fire' some other gene's action .
These 'cell messengers' may have been pure RNA, possibly folded up into a more compact shape via internal complementary pairing. They may also have been some sort of protein-RNA combination.
We've come pretty far with our simple, eight-molecule organisms. They're using DNA, and creating cells that are capable of splitting and metabolizing. They can do quite a bit with their simple four-molecule proteins, helper chains, and ribozymes.
But it's time to consider the next stage of life-- organisms using fancier proteins, that are built from more than four amino acids. As usual, it's not a trivial transition, but let's go back to the protein transcription process for a while, and consider how to make the switch.