Cassius and the Fred/Roscoe
system were pretty cool, but as the number of genes increased,
they also had a problem.
In the early days of Caleb and before, there
weren't that many genetic chains, so it wasn't very hard
to regulate them. Fred could pretty much just diffuse around and
copy polypeptides randomly from whichever chains happened to meet.
Likewise, Roscoe could just randomly replicate
chains, and produce more or less what was needed to assemble more
Calebs.
However, as Caleb turned to Cassius and the
number of genes increased, there was more and more incentive to manage
exactly when each gene was expressed. Any Cassius that could control
the timing of protein transcription and gene replication would have
gained an enormous selective advantage over its less organized neighbors.
For example, when a Cassius first got to
a new puddle, its best strategy was to first create enzymes to create
raw materials, and only start making Fatcat, Fred and Roscoe proteins
after they had plenty of amino acids and nucleotides to build with.
A Cassius that could do things in that order would survive much better
than a Cassius that wasted resources on enzymes for protein transcription
before there were raw materials.
Of course to do that, Cassius needed a way
to distinguish a specific gene from all the other genetic chains.
How would that happen?
Promoters and Repressors
We have already talked about having a complementary
header or 'landing site' at the beginning of each polypeptide
chain, which would help Fred avoid replication of chains that coded
RNA enzymes or helper chains rather than proteins.
In a sense, the header provided a permanent 'on/off' switch
that helped Cassius avoid the replication of genes that were expressed
in RNA form rather than as proteins. So it wouldn't have been
hard to convert it to a more temporary switch so Fred would know
when to transcribe each chain into proteins.
Cassius may have used any of several approaches
to gene regulation in early gene headers: it may have changed a sequence, 'blocked' the
header with a short stretch of complementary RNA, or inserted some
sort of blocking molecule to prevent transcription .
Of course Cassius also needed a way to turn
genes on when they were needed. It may have done that with a promoter
protein which removed the temporary blocking from the gene header.
For example, a promoter protein for expression
of the Sofia gene might have responded to concentrations of amino
acid raw materials-- when they were high it would remove any
repressors, and when low, it would restore them.
Gene ID
Before Cassius could turn its genes 'on' or 'off',
it needed a way to identify them.
Genetic UPC
One possible way to 'mark' each gene is to install an
ID string at the beginning of each genetic chain-- a unique
sequence of nucleotides that an incoming Fatcat can match up with,
to make sure it is making proteins from the correct gene. You might
think of it as a genetic 'UPC code' for an RNA 'scanner' to
read.
The ID sequence might be a fixed length,
or it could have a variable length, just as long as it had some way
to locate where the ID tag ended and where the actual gene began
.
Fatcat and Gene ID
How would gene ID work?
With the help of complementary base pairs,
it would be easy to create a complementary match sequence that would
bind to the ID sequence and put a Fatcat in the right place to begin
transcription.
When a regulator protein wanted to start
production of an enzyme, it might do something like the following:
1. When the regulatory protein is created,
it includes the regular 'landing site' sequence, plus
a short RNA sequence that is complementary to the regulated gene.
2. When the regulatory protein senses that
conditions require more enzymes, it links
to a Fatcat, and then attaches the RNA sequence
to the RNA-connecting portion of the Fatcat.
3. The Fatcat complex diffuses until it contacts
the proper RNA chain. Its sequence matches,
and Fatcat is positioned in just the right
place to begin transcription.
How Large was the ID?
It's possible to distinguish between
1,000 different genes with an ID sequence
of only 5 RNA nucleotides (45 = 1024). An
organism with 30,000 genes would require
a sequence of 8 nucleotides to give each
gene a unique ID code (48 = 65,536).
However, those quantities assume that there
is some kind of numbering system to make
sure each gene has a unique ID sequence.
It's hard to imagine any sort of biochemical
system that could keep track of unused ID
sequences before assigning one to a new gene.
If each new gene used an ID sequence that
was selected randomly, then the ID sequence
would need to be much longer, to reduce the
odds of conflicting with another gene. Reducing
the odds of duplication down to one in a
million for any new gene sequence would require
10 additional nucleotides.
So a good guess is that a minimum gene ID
size would probably be somewhere between
15 and 18 base pairs long.
Where is the ID sequence?
It certainly seems most logical to put the
gene ID into the 'landing site' region
just before the beginning of actual protein-coding
base pairs.
The bacterial promoter includes a highly
conserved TTGACA sequence 35 base pairs upstream
of the start of each gene, with a 'spacer' of
16 to 18 base pairs, then a highly conserved
TATAAT sequence that is 10 base pairs upstream
of the gene.
Since the 'spacer' size is a
close match to the ideal gene ID size, and
since it is marked so conspicuously on either
side, it seems highly likely that this portion
of the bacterial promoter serves as a unique
ID sequence.
Eukaryotes include a seven-pair 'basal
promoter' marker (usually TATAAAAA)
that is 30 base pairs upstream from the start
of each gene, along with an 'upstream
promoter' GGCCAATCT or GGCCAATCT sequence
that is 50 to 130 base pairs upstream. That
probably provides a similar space for an
ID sequence, most likely on the upstream
side of the 'TATA box', or the
same relative location as the bacterial promoter's
ID marker . We'll talk later about
some possible uses that eukaryotes may have
for that extra data in the header.
What would Gene IDs look like?
With tens of thousands of genes to manage,
it seems likely that modern cells would include
many gene ID chains (in RNA form) as part
of their day to day metabolism. What would
they look like?
The main requirement for an ID sequence would
be uniqueness-- so it would probably
not be repetitive, and would also not code
for any 'sensible' sequence of
amino acids. Its overall 'information
density' would be about the same as
protein-coding portions of the gene.
Transient RNA carriers of gene ID might include
a base pair sequence marking them as an ID
(an ID ID, so to speak). On the other hand,
they might also be linked with a distinctive
carrier protein so they would only need the
approximately 17 base pair ID sequence.
Operons and Stop Codons
Some of Cassius's proteins worked together
in groups-- for example, Fred and Fatcat
worked together until Fred was replaced by
tRNA, and many enzymes would have used several
different proteins and some helper chains
to put together a supercatalyst.
Once genes came under the control of promoters
and repressors, it also would had made sense
for Cassius to group related proteins together
into a single backbone chain. That way one
gene promoter could manage more than one
protein at the same time.
The technical term for a group of genes linked
with a single promoter is an 'operon'.
End of Gene Markers
For Cassius to be able to combine genes for
more than one protein in a RNA chain, it
needed a way to mark the boundaries between
one gene and the next.
Since operons probably started to evolve
before all 64 triplet permutations were snapped
up by proteins, the natural solution would
be to reserve one or more triplets as a 'stop
marker' to indicate that Fatcat should
stop creating a protein from a gene.
Modern genes contain three 'stop codons':
UAA, UAG and UGA (when translated to RNA)
.
Operons and Nathaniel
Combining several genes on one chain made
life much easier for Nathaniel and for Roscoe,
and probably added some serious survival
value to the first Caleb or Cassius that
started using the system.
When cells divided, Nathaniel would have
had an easier time assembling a complete
set of genetic chains, since there were fewer
chains to find.
Meanwhile, it would have taken fewer passes
by Roscoe to replicate all the genes in a
Caleb or Cassius, improving its chances of
creating sufficient genes to fill in complete
sets of new Caleb or Cassius to send out
into the world.
Gene Messengers
Once genes were marked with an ID sequence
and consolidated into operons, it would have
been much easier for cells to regulate their
action, and start to have more of a modern
metabolism.
When a cell wanted to accomplish something,
it would create a 'messenger' that
consisted of a Fatcat and a short RNA chain
that acted as an ID match. The messenger
would diffuse until it ran into the complementary
sequence. At that point, it would initiate
a protein synthesis.
With gene ID, cells started having the potential
to coordinate genes, and live their lives
in a more regulated and orderly fashion.
Helper Chain Maintenance
We've already talked about the use
of backbone chain for uses other than protein
coding-- for guidance in the tertiary
folding of enzymes, as an aid in placement
of multiple enzymes in complexes, and as
direct enzymes (ribozymes).
Unfortunately, these non-coding forms of
RNA would not have fit into a multi-gene
world of operons quite as tidily as the protein-coding
genes.
The problem is that they had to be replicated
by Roscoe to become useful, rather than being
transcribed by Fred.
Back when every chain was on its own, they
probably could have just attached close to
a Roscoe, and be pretty sure they'd
be replicated in sufficient quantity to fill
in when needed as helpers or enzymes.
However it probably would not have worked
to insert them into an operon, even if they
were used with the other genes included there.
As Fred ran along the chain, it couldn't
do anything with the helper gene. And Roscoe
would have had no way to know it needed to
jump in and create an RNA chain, when the
proteins were coded.
Presumably the helper chains stayed on their
own near Roscoe, but it sure would have been
convenient for them to be linked together
with their protein-coding genes, so a promoter
would work on them at the same time.
As it turns out, there was a solution to
this problem, but first, let's look
at one other problem that was also facing
early cells. Nothing like building up some
dramatic tension!
|