One of the many mysteries
in modern genetics is called the
C-value enigma (formerly called the
C-Value Paradox). In a nutshell, the total amount of DNA in organisms
is not directly related to their
complexity, as you might think it
should be.
For example, the genome size among different
animals varies as follows, with all measurements in billion base
pairs :
Group Smallest Largest
Prokaryotes
Archaea .0005 (Nanoarchaeum) .002 (Archaeoglobulus)
Bacteria .0006 (Mycoplasma) .007 (Rhizobacteria)
Inveterbrates
Protozoa .003 (Encephalitozoon) .02 (malaria)
Nematodes .03 (root-knot nematode) 1.9 (horse
roundworm)
Molluscs .4 (owl limpet) 5.7 (Antarctic whelk)
Echinoderms .5 (sea star) 5.3 (sea cucumber)
Crustaceans .2 (water flea) 36.7 (deep-sea
shrimp)
Insects .1 (Braconid wasp) 16.3 (mountain
grasshopper)
Vertebrates
Fish .4 (puffer fish) 128.3 (lung fish)
Amphibians .9 (burrowing frog) 115.8 (mudpuppies)
Reptiles 1.1 (skink) 5.2 (Greek tortoise)
Birds 1.0 (cut-throat weaver) 2.1 (ostrich)
Mammals 1.6 (bent-wing bat) 6.1 (echimyid
rodents)
Plants
Algae .01 (phytoplankton) 19.6 (stonewort)
Mosses .2 (guiana moss) 2.0 (horn calcareous
moss)
Ferns .06 (spikemoss) 72.7 (whisk fern)
Angiosperms 2.3 (gnetum vine) 36.0 (Mexican
white pine)
Gymnosperms .1 (creamy strawberry) 127.4
(fritillaria)
Overall, the minimum genome size seems to
increase with complexity as you would expect. However, the average
and largest genome sizes are not at all logical. For example, they
are much larger in many of the ‘simpler’ organisms (particularly
in some fish and amphibians) than in the more ‘complex’ birds
and mammals.
Humans are approximately in the midpoint
for mammals, with 3.3 billion base pairs.
Most of the excess genome length in the ‘big genome’ organisms
is made up of repetitive DNA, which we have already theorized is
generally used for scripting data. So the question is, why does the
C-value vary so much, and how might Foxy and Moxy be to blame?
Duplication and Debris
Before we look at the impact of script data
on genome size, let’s first consider some alternate explanations
for the variation between species.
Polyploidy
One condition which can add to the genome
size is polyploidy— extra chromosomes which result when two
species merge their entire genetic code. Polyploidy is rare in animals,
but common in plants (for example, wheat is a hexaploid species which
contains the merged genome from three different species of grasses,
and maize is probably a tetraploid formed from the merger of two
precursor species).
Over a long period of time, most polyploid
species gradually lose some chromosomes or duplicated genes, but
for a while they will contain two or three times the normal amount
of DNA.
Duplicated Genes
Another condition which adds to genome size
is simple duplication of genes, caused by transposons or replication
accidents.
Many species include multiple copies of important
genes. The advantage of that is that it helps reduce lethality if
one copy is accidentally disabled by a mutation or replication accident.
The disadvantage is the higher metabolic cost of the extra DNA, plus
the slower removal of bad copies of the genes (since individuals
with a mutation in one copy don’t drop dead so promptly).
Duplicating a few protein-coding genes would
probably not cause a huge increase in genome size, but it would have
some effect.
Junk DNA
So far we have talked about the use of most
repetitive or ‘junk’ DNA as data for Foxy or Moxy scripts.
However it’s also likely that some of the DNA in most organisms
really is junk— sequences inserted randomly by transposons
or gene parasites, dud genes that are no longer functional, or any
other genetic debris that hasn’t been weeded out by natural
selection yet.
It is possible that organisms going through
rapid evolutionary change will accumulate more junk than normal,
since they’ll be changing scripts and enzymes at a more rapid
rate than usual. Organisms in a cushy ecological niche may also be
able to accumulate more ‘junk’, since they are under
less competitive pressure.
Truly junk DNA probably can’t account for the huge variations
in genome size, but it may account for some of it.
Script Sizing
Now let’s take a look at the impact of Foxy and Moxy scripts
on genome size.
Number of Scripts
One way that the amount of script data might
vary between species is in the number of scripts used by the organism.
That could vary quite a bit, depending on the ‘algorithm’ used
to develop the structures in each organism within a species.
For example, a ‘simple’ plant might simply send out new
roots and leaves from a central point, and continue to do so in an
infinite loop for a fixed time interval or until some environmental
condition appeared. Then it might switch to flower and seed production
until its death. It would be a very short script. Arabidopsis thaliana,
a simple plant used in many genetic experiments would have a simple ‘life
script’ like that, and it does have very little repetitive
DNA in its genome.
A ‘complex’ and longer-lived plant might have many types
of tissues, complex branching patterns, and different behavior under
different environmental conditions, all controlled by multiple scripts
felling each type of tissue what to do. That might result in hundreds
or thousands as many scripts as the simple plant.
Script Sizes
Another factor that influences genome size
is the length of each script.
A large and long-lived organism would probably
have longer scripts, on average, though the script lengths would
also depend on the ‘efficiency’ of programming found
in each script. We’ll talk more about script efficiency next.
Alternate Scripts
Yet another possible source of extra genome
size is the possibility that some species might include ‘alternative’ scripts
that are currently not use, but that are still conserved.
For example, many parts of the world suffer
extreme changes of conditions during the ice age cycles. As an adaptation
to that, some species may carry dormant copies of genes that apply
only to a set of conditions in some other part of the cycle. That
way, a simple mutation in a script ID could shift a population drastically
and quickly.
Code Compression
Humans contain a total of about 6 x 1013
(60 quadrillion) cells. With only about 3 x 109 (3 billion) DNA base
pairs, that means there is only one nucleotide for every 2,000 cells.
Based on that math, it might appear that
having full control over the placement of each cell would require
a genome that is several thousand times larger. However it is actually
possible that our repetitive DNA does code for the placement of each
individual cell.
How could it do that?
As with computer programming, there are some ‘tricks of the
trade’ that DNA might use to get maximum value from the data
in Moxy scripts. Let’s take a look at some possibilities now.
Count Compression
In previous chapters we talked about Foxy
and Moxy scripts using lengths of repetitive DNA as a way to set
the length of a structure.
The simplest way to specify a mixture of
cells in a tissue would be to have a long script, with one or more
base pairs specifying each type of cell. It’s a system that
encourages effective evolution, since there is a bit of slippage
at each generation that would ‘nudge’ the length larger
or smaller, so the structure could drift to its optimum size.
Once a species reached its ‘perfect’ size, it could switch
to a different length-coding system which would be less evolvable,
but more compact.
For example, an extremely clever reader of
short scripts might be able to ‘parse’ a number from
a short DNA chain, similar to the way modern computers read numerical
data. In theory, a length of 5 nucleotides could code for up to 45
different lengths, sufficient for any lengths under 1,000. 10 nucleotides
could code for over one million values, and 15 nucleotides could
manage over one billion.
Of course there may not be a protein capable
of reading nucleotide sequences into real binary numbers like that
(it would need to somehow give each possible nucleotide a value between
0 and 3, and then multiply the second nucleotide by 4, the third
by 16, the fourth by 64 and so on).
However an organisms could also code for
lengths compactly by storing a variety of scripts, each with a different
length and a different script ID. Multiple scripts coding for length
could then reference one of the ‘constant’ sequences,
and Moxy could grab a copy, and use it as a traditional length-based
script. That way a short gene ID of maybe 17 or 20 base pairs could ‘represent’ a
script that might be thousands of elements long.
Script Looping
We’ve already talked about Moxy calling other Moxy scripts.
The use of script ‘routines’ calling ‘subroutines’ also
allows organisms to reduce the total length of scripts.
For example, one way to specify a stretch
of 10,000 skin cells would be to use a script that is 10,000 base
pairs long. Doing that offers a high degree of precision, but it
may be much more detail than an organism would really need.
An alternate approach would be to run a master
script that is 100 base pairs long, and use it to call a lesser script
that is also 100 base pairs long— and run it 100 times.
By necessity, the result would have a ‘repeat pattern’ of
100 cells, with no way to vary the pattern individually in each block.
On the other hand, the system would also require only 200 base pairs
to manage the placement of 10,000 cells.
Analog Controls
Still another way to reduce the length of
the genome would be to abandon precise ‘digital’ control,
and go back to a more ‘analog’ system.
For example, to fill in a bunch of skin cells,
an organism might simply repeat a 100-element script indefinitely,
until it runs out of places to put the skin, or until some sort of
timer runs out.
Script Reuse
Another way to reduce the volume of the genome
is to use lengthy scripts for more than one function.
We previously talked about using a single
script from two different Foxy or Moxy proteins as a way to ‘link’ two
properties, but of course, the same thing would also shrink the number
of base pairs required in the genome.
Compression and C-Value
All the script compression techniques have
one thing in common— they reduce the size of the genome, but
they also reduce the ease in which an organism can evolve.
Using a number or a constant instead of a
full script means it’s impossible to throw one or two unusual
cells into the middle of a tissue. In many settings that might actually
be a good thing, but it does reduce an organism’s capacity
to come up with an interesting new structure via random mutations.
Likewise, looping and analog controls mean
that an organism loses some precision in the way it lays out its
tissues.
Using the same script for more than one function
doesn’t take away any details, but it causes its own evolutionary
problems, since the two different tissues will be linked in the evolutionary
sense. Any mutation that is beneficial for one of the tissues may
be detrimental for the others.
Design Tradeoffs
The C-value paradox may simply represent
differences in evolutionary strategy between different organisms,
along with some possibly practical consequences of the varying amounts
of genetic material.
An organism with small amounts of genetic
material would represent a highly optimized species— with highly
efficient coding of its organs and tissues, but with less potential
to evolve quickly into new forms.
On the opposite extreme, organisms with huge
amounts of genetic material would have more of their structural information
coded in simple scripts, which could evolve more quickly if conditions
changed.
Evolutionary Timing
Differences in the C-value may also represent
organisms that are in different stages of evolutionary development.
Shortly after new Hoxy genes are introduced,
a species may start using many new scripts, which might take up large
amounts of genetic material simply because they haven’t had
time to be optimized yet.
Later on, metabolic pressures would reward
individuals that used more efficient coding, particularly after the
optimum scripts and organ sizes had developed for the species.
|