Welcome to the Genome. Michael Yudell
Читать онлайн книгу.go to make up living matter and hence understand how these proteins perform their specific functions on which the processes of life depend.” He also hoped that his work “may reveal changes that take place in disease, and that our efforts may be of more practical use to humanity.” (12) This connection between proteins, genes, and medicine, uncovered in part by Sanger and his techniques, is at the heart of what lies ahead in genomics. Fred Sanger died at the age of 95 in 2013. His legacy is immense including an institute in the UK named after him and two Nobel Prizes. He was, as he said about himself, “a chap who messed about in his lab,” but he was also a chap who really made a difference to humankind. (13)
RESEARCH MILESTONE 2: DECIPHERING THE GENETIC CODE
The most basic mechanisms and building blocks of heredity were, by the late 1950s, either solved or theoretically understood. But the link between genes and proteins was still not fully established. After all, nobody had yet explained exactly how DNA could produce a protein. The growing awareness that proteins were linear arrangements of amino acids and that genes were linear arrangements of nucleotides suggested to many scientists that this could mean only one thing—there was some code that connected the information in DNA to the production of proteins. But this was no simple code to crack, and scientists had been working on variations of this problem for at least a decade before the discovery of the structure of the double helix.
The intellectual spark that was a foundation for the solution of the DNA/protein code came from an unlikely source. Soon after the 1953 publication in Nature of their famous paper on the structure of DNA, Watson and Crick received a letter from George Gamow, a theoretical physicist and one of the architects of the big bang theory of the universe. Gamow’s letter sketched out an explanation for how an array of nucleic acids determined an array of amino acids. Gamow’s model, which detailed a list of 25 amino acids, turned out to be wrong. Paring down Gamow’s list to 20, Watson and Crick came up with the correct number of amino acids that make up proteins. (14) Over the next decade scientists conducted experiments that confirmed Watson and Crick’s list of amino acids and uncovered the DNA/protein coding scheme.
In DNA there are four linearly arranged nucleic acids (G, A, T, and C), whereas proteins are constructed from 20 linearly arranged amino acids. It was apparent from basic mathematics that the code was not based on a 1:1 relationship—the connection between DNA and proteins was not one nucleic acid to one amino acid (it would require at least 20 different nucleic acids to make a 1:1 ratio work). The code could also not be solved based on a 2:1 ratio. That is because there are only 16 ways G, A, T, and C can be arranged.
It turned out that the code is based on a 3:1 relationship and is therefore a series of nonoverlapping triplets of nucleic acids that code for single amino acids. Basic mathematics shows that there are 64 different ways to arrange four different bases in triplets. But there are only 20 types of amino acids. This is because some of the triplets, which are called codons, are redundant: they are just different ways to code for the same amino acid. Most amino acids have either two or four synonymous codons, although there are several exceptions. The amino acids methionine and tryptophan have no synonymous codons. Isoleucine has three, and serine, arginine, and leucine all have six.
Deciphering the genetic code allowed scientists to scan stretches of DNA sequences and look for genes. The language spelled out by nucleic and amino acids has rules similar to the rules of punctuation. Just as you can scan this paragraph for capital letters and periods, you can look for the first word in a DNA sentence to find what is called an initiator codon and read on until you find the end of the sentence or period, which in genetic terminology is called the terminator codon. Everything between these points is part of the same gene.
In a genetic sentence the initiator codon is almost always a triplet of the nucleic acids A, T, and G, which codes for the amino acid methionine (also known as Met or M). Thus, when you look at the amino acids that make up proteins, you will, with a few exceptions, always see an M as the first letter in the protein. Experiments by Cambridge University biologists Sydney Brenner and Francis Crick, and by Alan Garen at Yale University, showed that there were three terminator codons or three ways to put a period at the end of a protein sentence—TAG, TAA, TGA. (15)
A sample genetic sentence:
ATG (initiator codon) GCA AGT TCT T … GC ATA AGT TAG (terminator codon)
This sounds easier than it actually is, however. As with the English language, a capital letter does not always indicate the beginning of a sentence. Once an ATG is located, scientists must determine whether the suspected gene is actually a gene at all. The suspected gene is called an open reading frame (ORF) and this process is called annotation.
It took nearly a decade of work for experiments to confirm the triplet model of protein synthesis. In 1961 at the U.S. National Institutes of Health biochemists Johann Heinrich Matthaei and Marshall Nirenberg verified the first word of the genetic code. Matthaei and Nirenberg’s experiment was relatively simple. In a test tube, they provoked nucleic acids they had synthesized to produce a protein. Placing only one type of nucleic acid, all Ts, into a test tube, they were able to produce the protein made up of only the amino acid phenylalanine, or P, meaning that the triplet TTT coded for phenylalanine. (16) Later that year at New York University School of Medicine biochemist Severo Ochoa began similar experiments constructing random strings of nucleotides, placing them in cell extracts, and determining the kind of amino acids that were incorporated into the subsequent protein. (17) By comparing the results of these and other experiments, scientists cracked the entire code of triplets by 1965.
Breaking the genetic code alone couldn’t explain the relationship between genes and proteins. By the late 1950s scientists recognized that some type of intracellular intermediary was bringing genetic information from DNA to ribosomes, which are the cellular mechanisms that assemble proteins. The link between DNA and proteins turned out to be a cellular material known as ribonucleic acid or RNA. (18)
RNA is a versatile molecule; it acts as structural scaffolding, as an enzyme, and as a messenger. Its general structure is the same as that of DNA, but its sugar ring is slightly different, hence the deoxyribo‐ in DNA and just plain ribo‐ in RNA. Also, like DNA, RNA has four kinds of bases. However, instead of T, or thymine, RNA has U, or uracil, which complements A when RNA binds to DNA.
There are two steps in translating genetic instructions into a protein. The first is called transcription. RNA molecules assemble along a stretch of DNA that constitutes a gene. The strand of RNA is complementary to the strand of DNA by the same rules that dictate the formation of a double helix.
Figure 2.3 Proteins are made in two steps. Messenger RNA first assembles along a gene (transcription). The mRNA molecule then moves out of the nucleus to a ribosome (pictured here), where it is translated into a protein (translation).
Credit: Exhibitions Department, American Museum of Natural History
Once formed, this strand of RNA, known as messenger RNA, or mRNA, moves out of the nucleus of a cell to a ribosome, where the genetic sentence is read and translated into a protein. This stage in protein formation is known as translation. This molecular mystery was solved by some of the same scientists working on decoding the genetic code—Sydney Brenner at Cambridge, Francois Jacob and Jacques Monod at the Institute Pasteur in Paris, and Matthew Messelson at Cal Tech. (19) The breaking of the genetic code allowed scientists to interpret DNA information by providing them with an accurate DNA to protein dictionary. This innovation was an important component of the assembly line of technologies that eventually shaped gene sequencing.
RESEARCH MILESTONE 3: SYNTHESIZING DNA
Since the early part of the twentieth century, scientists had been aware of the vital connection between genes and enzymes, a type of protein that usually accelerates chemical reactions in an organism. As early as 1901, Archibald Garrod, a London physician studying metabolic disorders, recognized that patients