The field of human genetics has seen many breakthroughs in the past few decades. Thanks to the Human Genome Project, we now have a nearly complete sequence of the DNA that makes us human. And thanks to other researchers, we now understand many of the details of how our genes create human cells.

Of course, as often happens in science, learning more about our genetics has raised as many new questions as it has answered. There are still some enormous gaps in our understanding of DNA-- how it works, and how it came to be.

Genetic Questions

Let's take a look at some of the currently unsolved riddles about DNA.

From Soup to DNA

The first question is simple-- how did DNA ever manage to come into existence, in the first place? It's a long and complex molecule, and extremely unlikely to have formed spontaneously on its own, back in the days of the primordial soup.

Even worse, DNA is useless without a complex set of proteins that synthesize it, duplicate it, and then ‘read’ it. Those proteins, in turn, couldn't have formed without the DNA genes that now produce them. It's a 'chicken and egg' problem… so, which came first, the proteins or the DNA?

It is not a problem of raw materials, since there is strong evidence that the Earth of 4 billion years ago contained the chemical ingredients for life. Amino acids, nucleic acids and other 'life friendly' materials are already present on other planets, in comets and in interstellar space, and it's also likely that they were synthesized on Earth by various natural processes.

However, trying to find a pathway from the primordial soup of Oparin and Haldane, to the formation of DNA strands is not so easy. Scientists have proposed many theories for the early origins of life-- from Darwin's 'warm little pond' , to the currently popular 'RNA world' . But so far, nobody has described a full set of chemical steps capable of making the jump from chaos to living organisms.

As a possible answer, the first half of this book will explore a sequence of early, self-replicating molecules that bridge the gap from random organic soup to life. They will start simple enough that they could have formed on their own, even in a world with no true 'natural selection'. Then they gradually become living cells with modern DNA.

We will use some parts of the 'RNA world' theory, but we'll offer a clearer explanation for the formation of the first genetic chains, and their eventual conversion to the DNA that we have today.

Introns and Protein Positioning

Another mystery of modern genetics is the presence of introns-- lengths of non-coding DNA that are smack in the middle of almost every gene. Cells remove the introns before the gene is transcribed into protein-- either via complex enzymes called spliceosomes, or by a clever 'self-splicing' action that lets introns act as a catalyst for their own removal.

To put it mildly, this is a completely ridiculous system! Why on earth would living organisms go to so much trouble to muck up their genetic code with such a Rube Goldbergian contraption?

Since introns are present in nearly all living organisms, the only reasonable conclusion is that they serve some extremely important function-- one that must be just as vital as protein-coding. What could it possibly be?

When we explore the early evolution of DNA, we'll see a possible answer, since it's likely that the earliest genetic chains were used for much more than just genetic code. We'll look at some ways in which our genetic material could have directly helped proteins to function, particularly in the very early days of life's evolution. As it happens, some of those functions are still necessary, and those 'legacy' uses for RNA are preserved in the introns.

A large part of the story to come is about the coevolution of regular protein-coding genes, and the other types of genetic material that help them to work more effectively.

Gene Count and Satellite DNA

Another mystery that has come out of the Human Genome Project is the fact that humans have far fewer genes than was originally expected. In fact, our gene count is about the same as a fruit fly's, and not much more than a simple 959-cell roundworm. How can that be possible?

Intuition would suggest that there are simply too many pieces to a human being, to be specified by a mere 23,000 genes. After all, our body contains dozens of organs, each built from hundreds of different cell types. Each of those cells contain hundreds of specialized structures. If you do the math, that ought to mean hundreds of thousands of genes, merely to specify our cells and organs.

Even worse, let's consider one of our more complicated organs-- our brains. Humans have about 100 billion neuron cells. Those cells are arranged just right so we can do some remarkable and complex tricks. For example, we are hard-wired so we can recognize a familiar face in a crowd almost instantly, and then filter their voice from a dozen others when we talk to them. Or we can learn how to look at patterns on a page, and translate them into abstract concepts such as this one.

To accomplish such subtle tasks, our brains need to have some very clever wiring, which ought to require at least a few genes for each bit of behavior. There are many thousands of complex processing tasks that our brains do routinely, so shouldn't it take a huge number of genes to make that happen?

In fact, to make behavior more 'evolvable', it would be very useful to have some sort of 'programming language' controlling neuron connections, so a good connection that produces a useful new behavior can actually pass along effectively to the next generation. It's hard to imagine how any protein could ever provide that sort of control.

On the other hand, protein-coding genes make up less than 2% of our DNA. The remaining 98% is frequently very repetitive, and currently without any known use. It is often called 'junk DNA', or 'satellite DNA' when scientists are being polite.

Could there be a connection?

The small number of human genes is no longer a mystery if satellite DNA also serves a role in our genetics. In fact, it would be quite logical if the extra 98% of our genome just happened to contain 98% of our genetic coding.

The human genome includes approximately 2 million distinct chunks of satellite DNA that are each enclosed within a 'jumping gene', or transposon. Altogether they make up more than 40% of our DNA. If each piece of satellite DNA functions as a gene, then we'd have plenty of extra information for coding the myriad details of our complex bodies. But, how could such simple, repetitive pieces of genetic code ever manage to do that?

In the last half of this book we'll talk about the evolution of genetic scripts, stored in introns and satellite DNA-- first as a way to manage the size, content and position of structures within cells, and then as a way to specify the details of tissues and organs in multi-cellular organisms.

By their gross appearance, these scripts are much simpler than the protein-coding portions of our genome, but they still carry extremely valuable information. From an evolutionary point of view, they act just like genes, but in a more specialized way.

As it will turn out, satellite DNA is probably the most important part of our genome, and well deserving of its dominant presence in our DNA. You might even say that it's far more clever than the protein-coding parts, since it is data for a much more complex 'programming language' than the one that turns DNA into proteins.

Speed of Evolution and Transposons

One final problem in modern genetics is the speed and reliability of evolution itself.

Simply put, nearly all random changes in a protein-coding gene will be either ineffective, or lethal. Only an extremely small percentage of mutations will ever result in an improvement.

Because of that inefficiency, it seems necessary that there be either an extremely high mutation rate, or an extremely slow rate of evolutionary change.

And yet, in practice, even very complex organisms are able to create functional offspring with more than a 90% success rate, and species still manage to evolve quickly enough that they can handle all but the most cataclysmic of changes.

A simple phenomenon called 'replication slippage' provides most of the answer. It randomly changes the length of repetitive DNA in the genome, making some scripts a bit longer in each generation, and some shorter. That is a safe way to make minor changes in the details of multi-cellular creatures, so a species can ease into the optimum size, shape and position for each structure.

In fact, most evolutionary change in complex organisms probably happens via script changes, or by mutations in the ID sequences that pick which script to use.

A more dramatic part of the answer may result from transposons, or 'jumping genes', which sometimes make more or less random changes to our genetic material. That might seem like a very bad idea-- unless it somehow increases the chances for beneficial changes in some 'directed' way.

In the last half of this book, we'll talk about the use of transposons as a 'delivery vehicle' for scripts stored in the satellite DNA. Most likely, that is their primary function, but it looks like they may have an important fringe benefit, as well.

What if transposons are, in fact, a sophisticated method to create large mutations with a higher than usual probability of success? As we'll see, there are some fairly simple genetic 'tricks' that DNA can use to greatly increase the effectiveness of evolutionary changes, thanks to the scrambling effect of transposons.

It might make sense to consider transposons as a way to cause 'smart' mutations that greatly increase the odds of successful evolution. We'll talk more about their evolutionary role near the end of this book.

What's to Come

You might say that this is a love story.

It starts with two molecules that just happened to meet, about 4 billion years ago. There was some interesting 'chemistry' between them, so they spent some time together, and eventually created Life. It's much like one of those multi-generation romance novels, only the characters are way smaller, and their bodice-ripping passion happens only on a molecular level.

We'll trace the birth of several generations of chemical characters, starting from pure random 'primordial soup', and eventually becoming self-replicating organisms, primitive cells, and then multi-cellular organisms. Along the way, they'll also develop the modern system of DNA and protein transcription.

Most of the characters in this story are rather long and complicated molecules, despite their sub-microscopic size. However, we'll treat them casually enough that they should be understandable, even for readers who can't tell a pyrimidine from a potato.

Our main story line is all about plot and chemistry. However, if you'd like to have a gradual build up to the main action, then you might want to start with the first three chapters in the Appendix. They set the scene, by looking at conditions on Earth four billion years ago. One chapter pays close attention to tidal puddles and pools, since they are such a hospitable location for our story to take place , and an excellent, romantic vacation destination, besides. But you can skip those chapters completely if you only want to read the 'juicy' parts!

We'll make many guesses and use plenty of imagination as we attempt to 'reverse engineer' our DNA heritage. There is no way to make firm conclusions, when speculating about conditions 4 billion years ago, or even 1 billion years ago. It's highly unlikely that every single detail of this story will prove to be correct. But with luck, maybe we'll manage to get a few things right.

Perhaps some of the insights in this story will make it easier to understand how we humans are created from our genes, and how those genes came to be.