concept of dna and rna - upsc success5... · the genetic code transcription ... molecules, which...

37
Concept of DNA & RNA Dr. Satish Kumar Anthropological Survey of India Manav Bhavan, Bogadi, Mysore CONTENTS Nucleic Acids - Introduction Nucleic Acids and Heredity DNA is the Genetic Material of Bacteria DNA is the Genetic Material of Viruses DNA is the Genetic Material of Eukaryotic Cells Composition of Nucleic Acids The DNA, RNA difference: The structure of DNA Physical Properties of DNA Denaturation and Renaturation of DNA; Hybridization Circular DNA Great length versus tiny breadth Entropic stretching behavior Different helix geometries Supercoiled DNA Sugar pucker DNA replication Mutation – the sequence change in DNA Single-base substitutions Missense mutations Nonsense mutations Silent mutations Splice-site mutations Insertions and Deletions (Indels) Duplications Translocations Inversion The structure and function of RNA Messenger RNA (mRNA) Ribosomal RNA (rRNA) Transfer RNA (tRNA) Noncoding RNA (ncRNA) Protein synthesis The genetic code Transcription – the mRNA synthesis

Upload: nguyenminh

Post on 12-May-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Concept of DNA & RNA

Dr. Satish Kumar Anthropological Survey of India Manav Bhavan, Bogadi, Mysore

CONTENTS Nucleic Acids - Introduction Nucleic Acids and Heredity DNA is the Genetic Material of Bacteria DNA is the Genetic Material of Viruses DNA is the Genetic Material of Eukaryotic Cells Composition of Nucleic Acids The DNA, RNA difference: The structure of DNA Physical Properties of DNA Denaturation and Renaturation of DNA; Hybridization Circular DNA Great length versus tiny breadth Entropic stretching behavior Different helix geometries Supercoiled DNA Sugar pucker DNA replication Mutation – the sequence change in DNA Single-base substitutions Missense mutations Nonsense mutations Silent mutations Splice-site mutations Insertions and Deletions (Indels) Duplications Translocations Inversion The structure and function of RNA Messenger RNA (mRNA)

Ribosomal RNA (rRNA) Transfer RNA (tRNA) Noncoding RNA (ncRNA) Protein synthesis

The genetic code Transcription – the mRNA synthesis

Initiation of Transcription Termination of Transcription Fate of synthesized mRNA Translation Nucleic Acids - Introduction The hereditary of all cellular life forms and viruses is defined by its genome, which is a long sequence of nucleic acids that contains the genetic instructions specifying the biological development of the organism. Nucleic Acids are macromolecules in the form of deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). DNA is the molecule of heredity of all the cellular form of life (and most vireses), however some viruses like tobacco mosaic virus and poliomyelitis virus are reported to contain only RNA, which acts as the hereditary material of such viruses. N Nucleic Acids are linear polymer - composed of four different building blocks, the nucleotides. It is in the sequence of the nucleotides in the polymers where the genetic information is located. This information is transmitted by transcription from DNA to RNA molecules, which are utilized in the synthesis of proteins. In fact, the central dogma of modern biology is:

In complex cells (eukaryotes), such as those from plants, animals, fungi and protists, most of the DNA is located in the cell nucleus. By contrast, in simpler cells called prokaryotes (the eubacteria and archaea), DNA is not separated from the cytoplasm by a nuclear envelope. The cellular organelles known as chloroplasts and mitochondria also carry DNA. RNA is found both in the nucleus, where it is synthesized, and in the cytoplasm, where the synthesis of proteins occurs. Background DNA, Chromosomes and Genes. How do these terms relate to one another? Aren't these just different terms for the same thing? Well, yes and no. When I listen to this discussion amongst my younger collogues, I felt it very necessary to give in this chapter, the background of these terms, and how it was discovered that DNA was, in fact, the genetic material. Once it had been accepted that there was genetic transmission of traits, the search began for the factor that carried the information. It was established that the following characteristics were required of genetic material:

• It must contain information for replicating itself, in order to be in each cell of a growing organism.

• It must be able to control expression of traits. As it was known that the enzymes and proteins that act within us determine traits, and it is the unique sequences of the protein that makes it specific to the function, hence the genetic material must be able to encode the sequence of proteins.

• It must be capable of mutational change in a controlled way, in order to ensure evolution and survival of a species in a changing environment.

Hereditary Material is Bound on Chromosomes The identity of Mendel's "factors" remained unsubstantiated until the turn of the century, some forty years after Mendel's painstaking experiments. At that time, two exciting methodological developments - the construction of increasingly powerful microscopes and the discovery of dyes or stains that selectively colored the various components of the cell, made it possible to examine cellular nuclei, which lad to the discovery of long, thin, rod-like structures. These nuclear structures were termed as chromosomes. Many more microscopic observations confirmed the role of chromosomes:

1. A variety of chromosome types, as defined by relative size and shape, were found to be present in the nucleus of each cell. Furthermore, there usually were two copies of each type of chromosome. This cell is called a diploid cell.

2. All of the cells of an organism, excluding sperm cells, egg cells, and red blood cells, and all organisms of the same species, were observed to have the same number of chromosomes.

3. The number of chromosomes in any cell appeared to double immediately prior to the cell division processes of mitosis and cytokinesis, in which a single cell splits to form two identical offspring cells.

4. The sex or germ cells appeared to have exactly half (i.e. just one copy of each chromosome type) of the number of chromosomes as were found in the somatic cells of any organism. Such cells are called haploid cells.

5. The fertilization of an egg with a sperm cell produces a diploid cell called a zygote, which has the same number of chromosomes as the somatic cells of that organism. Suddenly, the implications of Mendel's work became obvious: chromosomes behaved

like the particles or factors that Mendel described. Mendel's hereditary factors were located on the newly discovered chromosomes or were the chromosomes themselves. Proof that the chromosomes were Mendel's hereditary factors did not come until 1905, when the first physical trait was shown to be the result of the presence of specific chromosomal material and, conversely, that the absence of that specific chromosome meant the absence of the particular physical trait. Microscopic observations had discovered the presence of what have come to be called the sex chromosomes. These chromosomes, distinguished from other chromosomes and from each other by their size, were named "X" and "Y." Researchers in 1905 were surprised to observe that somatic cells taken from human female donors always contained two copies of the X chromosome, while somatic cells taken from human male donors always contained one copy of the X chromosome and one copy of the Y chromosome. All of the other chromosomes in the nucleated cells of both male and female donors appeared identical. Though mechanism was not known, it seemed quite clear that the sex of an individuals was directly related to the identity of the chromosomes in that organism's cells. Thus, sex was shown to be the direct result of a

specific combination of chromosomal material, and sex became the first phenotype (physical characteristic) to be assigned a chromosomal location - specifically the X and Y-chromosomes. Chromosomal Subunit that Carries Hereditary Information Quantitatively DNA forms the 40 per cent of a chromosome, whereas proteins accounts for 60 per cent. At first, it seemed that protein must be responsible for carrying hereditary information, since not only is protein present in larger quantities than DNA, but protein molecules are composed of twenty different subunits while DNA molecules are composed of only four. It seemed clear that a protein molecule could encode not only more information, but also a greater variety of information, because it possessed a substantially larger collection of ingredients with which to work. Now it had to be determined which component of the chromosome, DNA or protein, was the genetic material. Many scientists were sure that it was protein. After all, there were so many subunits (20 amino acids) that it seemed obvious that there existed within protein the possibility for much more diversity in expressing the genetic code than in DNA, which only has 4 subunits. DNA was considered a boring molecule.

NUCLEIC ACIDS AND HEREDITY

DNA was first identified in 1868 by Friedrich Miescher, a Swiss biologist, in the nuclei of pus cells obtained from discarded surgical bandages. He called the substance nuclein, noted the presence of phosphorous, and separated the substance into a basic part and an acidic part.

DNA is the Genetic Material of Bacteria

In 1928, Frederick Griffith performed an experiment using pneumonia bacteria and mice. This was one of the first experiments that hinted that DNA was the genetic code material. He used two strains of Streptococcus pneumoniae: a strain which has a polysaccharide coating around it that makes it look smooth when viewed with a microscope, and a strain which does not have the coating, thus looks rough under the microscope. When he injected live S strain into mice, the mice contracted pneumonia and died. When he injected live R strain, a strain, which typically does not cause illness, into mice, as predicted they did not get sick, but lived. Thinking that perhaps the polysaccharide coating on the bacteria somehow caused the illness and knowing that polysaccharides are not affected by heat, Griffith then used heat to kill some of the S strain bacteria and injected those dead bacteria into mice. This failed to infect/kill the mice, indicating that the polysaccharide coating was not what caused the disease, but rather, something within the living cell. Since Griffith had used heat to kill the bacteria and heat denatures protein, he next hypothesized that perhaps some protein within the living cells that was denatured by the heat, caused the disease. He then injected another group of mice with a mixture of heat-killed S and live R, and the mice died! When he did a necropsy on the dead mice, he isolated live S strain bacteria from the corpses. Griffith concluded that the live R strain bacteria must have absorbed genetic material from the dead S strain bacteria, and since heat denatures protein, the protein in the bacterial chromosomes was not the genetic material. This evidence pointed to DNA as being the genetic material. Transformation is the process whereby one strain of a bacterium absorbs

genetic material from another strain of bacteria and turns into the type of bacterium whose genetic material it absorbed. Because DNA was so poorly understood, scientists remained skeptical up through the 1940s.

In 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty revisited Griffith's experiment and concluded the transforming factor was DNA (Fig. 1). Their evidence was strong but not totally conclusive. The then-current favorite for the hereditary material was protein; DNA was not considered by many scientists to be a strong candidate.

Fig. 1: Transforming Principle - DNA Might be the Genetic Material after experiments of Griffith 1928 and Oswald Avery, Colin MacLeod, and Maclyn McCarty 1944 (Modified from http://www.accessexcellence.org/)

The breakthrough in the quest to determine the hereditary material came from the work of Max Delbruck and Salvador Luria in the 1940s. Bacteriophages are a type of virus that attacks bacteria; the viruses that Delbruck and Luria worked with were those attacking Escherichia coli, a bacterium found in human intestines. Bacteriophages consist of protein coats covering DNA. Bacteriophages infect a cell by injecting DNA into the host cell. This viral DNA then "disappears" while taking over the bacterial machinery and beginning to make new virus instead

of new bacteria. After 25 minutes the host cell bursts, releasing hundreds of new bacteriophage. Phages have DNA and protein, making them ideal to resolve the nature of the hereditary material.

DNA is the Genetic Material of Viruses

In 1952, Alfred Hershey and Martha Chase did an experiment, which is so significant; it has been nicknamed the Hershey-Chase Experiment. At that time, people knew that viruses were composed of DNA (or RNA) inside a protein coat/shell called a capsid. It was also known that viruses replicate by taking over the host cell metabolic functions to make more virus. We are used to thinking and talking about viruses, which invade our bodies and make us sick, but there are other, different kinds of viruses that infect other kinds of animals, still other viruses, which infect plants, and even some viruses that infect bacteria. A virus, which infects a bacterium, is called a bacteriophase because the host bacterium cell is killed as the new virus particles leave the bacterial cell. In order to do all this, the virus must inject whatever is the viral genetic code into the host cell. Thus, people realized that the viral genetic code material had to be either its DNA or its protein capsid. Hershey and Chase sought an answer to the question, Is it the viral DNA or viral protein coat (capsid) that is the viral genetic code material which gets injected into a host bacterium cell? To try to answer this question, Hershey and Chase performed an experiment using a bacterium named Escherichia coli, or E. coli for short (named after a scientist whose last name was Escher) and a virus called T2 that is a bacteriophage that infects E. coli. Isolated T2, like other viruses, is just a crystal of DNA and protein, so it must live inside E. coli in order to make more viruses like itself. When the new T2 viruses are ready to leave the host E. coli cell (and go infect others), they burst the E. coli cell open, killing it (hence the name (Bacteriophage). The results that Hershey and Chase obtained indicated that the viral DNA, not the protein, is its genetic code material. Hershey and Chase used radioactive chemicals to distinguish between (label) the protein capsid and the DNA in T2 virus so they could tell which of those molecules entered the E. coli cells. Since some amino acids contain sulfur in their side chains, if T2 is grown in E. coli with a source of radioactive sulfur, the sulfur will be incorporated into the T2 protein coat making it radioactive. Since DNA has lots of phosphorus in its phosphate (PO4) groups, if T2 is grown in E. coli with a source of radioactive phosphorus, the phosphorus will be incorporated into the viral DNA, making that radioactive. Hershey and Chase grew two batches of T2 and E. coli: one with radioactive sulfur and one with radioactive phosphorus to get batches of T2 labeled with either radioactive S or radioactive P. Then, these radioactive T2 were placed in separate, new batches of E. coli, but were left there only 10 minutes. This was to give the T2 time to inject their genetic material into the bacteria, but not reproduce. In the next step, still in separate batches, the mixtures were agitated in a kitchen blender to knock loose any viral parts not inside the E. coli but perhaps stuck on the outer surface. Hopefully, this would differentiate between the protein and DNA portions of the virus. Then, each mixture was spun in a centrifuge to separate the heavy bacteria (with any viral parts that had gone into them) from the liquid solution they were in (including any viral parts that had not entered the bacteria). The centrifuge causes the heavier bacteria to be pulled to the bottom of the tube where they form a pellet, while the lightweight viral leftover stay suspended in the liquid portion called the supernatant. In the subsequent step, the pellet and supernatant from each tube were separated and tested for the presence of

radioactivity. Radioactive sulfur was found in the supernatant, indicating that the viral protein did not go into the bacteria. Radioactive phosphorus was found in the bacterial pellet, indicating that viral DNA did go into the bacteria (Fig. 2).

Fig. 2: Showing DNA is the Genetic Material of Viruses, after Hershey-Chase Experiment 1952 (Modified from http://www.accessexcellence.org/). Based on these results, Hershey and Chase concluded that DNA must be the genetic code material, not protein as many people believed. When their experiment was published and people finally acknowledged that DNA was the genetic material, there was a lot of competition to be the first to discover its chemical structure. DNA is the Genetic Material of Eukaryotic Cells When nucleic acid is added to populations of cells growing in culture, the nucleic acid enters the

Batch of T2 Bacteriophage grown in radioactive sulfur

Batch of T2 Bacteriophage grown in radioactive phosphorus

cells, and in some of them results in the production of new proteins. When a purified DNA is used, its incorporation leads to the production of a particular protein. Although for historical reasons these experiments are described as transfection when performed with eukaryotic cells, they are direct counterpart to bacterial transformation. The DNA that is introduced into the recipient cell becomes part of its genetic material, and is inherited in the same way as any other part. Its expression confers a new trait upon the cells. At first, these experiments were successful only with individual cells adapted to grow in a culture medium. Since then, however, the process of trasfection is a powerful tool used to study and control gene expression. Cloned genes can be transfected into cells for biochemical characterization, mutational analyses, investigation of the effects of gene expression on cell growth, investigation of gene regulatory elements, and to produce a specific protein for purification. Transfection of RNA can be used either to induce protein expression, or to repress it using antisense or RNA interference (RNAi) procedures. Such experiments show directly not only that DNA is the genetic material in eukaryotes, but also that it can be transferred between different species and yet remain functional. The genetic material of all known organisms and many viruses is DNA. However, some viruses use an alternative type of nucleic acid, ribonucleic acid (RNA), as the genetic material. The general principle of the nature of the genetic material, then, is that it is always nucleic acid; in fact, it is DNA except in the RNA viruses. COMPOSITION OF NUCLEIC ACIDS During the 1920s, biochemist P.A. Levene analyzed the components of the DNA molecule. He found it contained four nitrogenous bases: cytosine, thiamine, adenine, and guanine; deoxyribose sugar; and a phosphate group. He concluded that the basic unit (nucleotide) was composed of a base attached to a sugar and that the phosphate also attached to the sugar. He (unfortunately) also erroneously concluded that the proportions of bases were equal and that there was a tetranucleotide that was the repeating structure of the molecule. Erwin Chargaff analyzed the nitrogenous bases in many different forms of life, concluding that the amount of purines does not always equal the amount of pyrimidines (as proposed by Levene). As of now it is very clear that Nucleic acids are formed of a sugar moiety-the pentose (Fig. 3), nitrogenous bases- purines and pyrimidines (Fig. 4) and phosphoric acid. The nucleotides are the monomeric units of the nucleic acid. They result from the covalent bonding of a phosphate and a heterocyclic base to the pentose (Fig. 5). Within the nucleotide, the combination of a base with the pentose constitutes a nucleoside. For example, adenine is a purine base; adenosine (adenine + ribose) is the corresponding nucleoside, and adenosine monophosphate (AMP), the nucleotide. Nucleic acids are linear polymers in which the nucleotides are linked together by means of phosphate-diester bridges with the pentose moiety. These bonds link the 3' carbon in one nucleotide to the 5' carbon in the pentose of the adjacent nucleotide. The backbone of nucleic acids consists, therefore, of alternating phosphates and pentoses (Fig. 6). The nitrogenous bases are attached to the sugars of this backbone.

Ribose (RNA backbone)

Fig3: DNA and RNA Sugars

Deoxy-ribose (DNA backbone)

Fig 4: DNA and RNA bases

Purines

Adenine

Guanine

Pyrimidines

Thymine DNA only

Uracil RNA only

Cytosine

Fig 5: DNA and RNA nucleotides

CG

AT AU

Fig 6: Sugar Backbone of nucleic acids

The phosphoric acid uses two of its three acid groups in the 3', 5' diester links. The remaining negative group confers to the polynucleotide its acid properties and enables the molecule to form ionic bonds with basic proteins. In eukaryotic cells, DNA is associated with histones (i.e., basic proteins rich in arginine or lysine), Forming a nucleoprotein. This anionic group also causes nucleic acids to be highly basophilic, i.e., they stain readily with basic dyes. The DNA, RNA difference: Pentoses are of two types: ribose in RNA, and deoxyribose in DNA. The only difference between these two sugars is that the oxygen in the 2' carbon is lacking in deoxyribose. The bases found in nucleic acids are either pyrimidines or purines. Pyrimidines have a single heterocyclic

3’

5’

ring, whereas purines have two fused ritigs (Fig. 1). In DNA the pyrimidines are thymine (T) and cytosine (C); the purines are adenine (A) and guanine (G) (Fig. 1). RNA contains uracil (U) instead of thymine. Therefore between RNA and DNA there are two main differences: in the pentose moiety (ribose and deoxyribose, respectively) and in the pyrimidine base (uracil instead of thymine). This explains why radioactive thymidine (i.e., the nucleoside) is used to label DNA and radioactive uridine for RNA in various experiments. All the genetic information of a living organism is stored in the linear sequence of the four bases. Therefore, a four-letter alphabet (A, T, C, G) must code for the primary structure of all proteins (i.e., composed of 20 amino acids). All the excitement in molecular biology, leading to the unraveling of the genetic code, began when the structure of DNA was understood THE STRUCTURE OF DNA DNA had been proven as the genetic material by the Hershey-Chase experiments, but how DNA served as genes was not yet certain. DNA must carry information from parent cell to daughter cell. It must contain information for replicating itself. It must be chemically stable, relatively unchanging. However, it must be capable of mutational change. Without mutations there would be no process of evolution. Many scientists were interested in deciphering the structure of DNA; among them were Francis Crick, James Watson, Rosalind Franklin, and Maurice Wilkens. Watson and Crick gathered all available data in an attempt to develop a model of DNA structure. Franklin took X-ray diffraction photomicrographs of crystalline DNA extract, the key to the puzzle. The data known at the time was that DNA was a long molecule, proteins were helically coiled (as determined by the work of Linus Pauling), Chargaff's base data, and the x-ray diffraction data of Franklin and Wilkens. In 1953, based on the available data, Watson and Crick proposed a model for the DNA structure that provided an explanation for the Chargaff's base composition data and for the biological properties of DNA - particularly its duplication in the cell. In the Watson and Crick model there are two right-handed helical polynucleotide chains that form a double helix around a central axis. The two strands are antiparaliel, i.e., their 3'- 5' phosphodiester links are in opposite directions. Furthermore, the bases are stacked inside the helix in a plane perpendicular to the helical axis (Fig.7). The two strands are held together by hydrogen bonds established between the pairs of bases. Since there is a fixed distance (i.e., 1.08 nm) between the two sugar moieties in the opposite strands, only certain base pairs can fit into the structure. As may be seen in Figure 7, the only two pairs that are possible are AT and CG. Two hydrogen bonds are formed between A and T, and three are formed between C and G. In addition to hydrogen bonds, hydrophobic interactions, established between the stacked bases, are also important in maintaining the double helical structure. As per the Watson and Crick model distance between the stacked bases is 3.4 Å (0.34 nm), which corresponds to the primary period demonstrated by x-ray diffraction. Furthermore, a turn

of the double helix is completed in 34 Å (3.4 nm), a length that corresponds to 10 nucleotide residues. This distance corresponds to a secondary period along the axis. The double helix has a mean diameter of ~20 Å (2.0 nm); furthermore, two grooves (a major or deep groove, and a minor or more shallow one) are observed (Fig. 8). Fig 7: Structure of DNA after Watson and Crick 1953

Fig 8: Space filling model of a segment of DNA showing major and minor grooves

surface after Feughelman et al., 1955 (taken from internet)

ÅT

G

C

A

C

G

A T

P

Å

34

P

P

P

P

P

20

on the

MAJOR GROOVE

MINOR GROOVE

The axial sequence of bases along one polynucleotide chain may vary considerably, but on the other chain the sequence must be complementary as in the following example:

sugar phosphatesugarphosphate sugarphosphate sugar… First chain:

3’ T A C G 5’ ¦ ¦ ¦ ¦

A T G C 3’ 5’ Second chain:

sugar phosphate

sugarphosphate

sugarphosphate

sugar...

Because of this property, given an order of bases on one chain, the other chain is exactly complementary. During DNA duplication, the two chains dissociate, and each one serves as a template for the synthesis of two complementary chains. In this way two DNA molecules are produced, each having exactly the same molecular constitution. The varying sequence of the four bases along the DNA chains forms the basis for genetic information. Four bases can produce thousands of different hereditary characters, because DNA molecules are long polymers along which an immense number of combinations may be produced. Physical Properties of DNA Denaturation and Renaturation of DNA; Hybridization The hydrogen bonds between the strands of the double helix are weak enough that they can be easily separated by enzymes. Enzymes known as helicases unwind the strands to facilitate the advance of sequence-reading enzymes such as DNA polymerase. The unwinding requires that helicases chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other. The strands can also be separated by gentle heating, as used in PCR, provided they have fewer than about 10,000 base pairs (10 kilobase pairs, or 10 kbp). The intertwining of the DNA strands makes long segments difficult to separate. Circular DNA When the ends of a piece of double-helical DNA are joined so that it forms a circle, as in plasmid DNA, the strands are topologically knotted. This means they cannot be separated by gentle heating or by any process that does not involve breaking a strand. The task of unknotting topologically linked strands of DNA falls to enzymes known as topoisomerases. Some of these enzymes unknot circular DNA by cleaving two strands so that another double-stranded segment can pass through. Unknotting is required for the replication of circular DNA as well as for various types of recombination in linear DNA. Great length versus tiny breadth The narrow breadth of the double helix makes it impossible to detect by conventional electron microscopy, except by heavy staining. At the same time, the DNA found in many cells can be

macroscopic in length -- approximately 2 meters long for strands in a human chromosome. Consequently, cells must compact or "package" DNA to carry it within them. This is one of the functions of the chromosomes, which contain spool-like proteins known as histones, around which DNA winds. Entropic stretching behavior When DNA is in solution, it undergoes conformational fluctuations due to the energy available in the thermal bath. For entropic reasons, more floppy states are thermally accessible than stretched out states; for this reason, a single molecule of DNA stretches similarly to a rubber band. Using optical tweezers, the entropic stretching behavior of DNA has been studied and analyzed from a polymer physics perspective, and it has been found that DNA behaves like the Kratky-Porod worm-like chain model with a persistence length of about 53 nm. Furthermore, DNA undergoes a stretching phase transition at a force of 65 pN; above this force, DNA is thought to take the form that Linus Pauling originally hypothesized, with the phosphates in the middle and bases splayed outward. This proposed structure for overstretched DNA has been called "P-form DNA," in honor of Pauling. Different helix geometries The DNA helix can assume one of three slightly different geometries, of which the "B" form described by James D. Watson and Francis Crick is believed to predominate in cells. It is 2 nanometres wide and extends 3.4 nanometres per 10 bp of sequence. This is also the approximate length of sequence in which the double helix makes one complete turn about its axis. This frequency of twist (known as the helical pitch) depends largely on stacking forces that each base exerts on its neighbors in the chain. Supercoiled DNA The B form of the DNA helix twists 360° per 10.6 bp in the absence of strain. But many molecular biological processes can induce strain. A DNA segment with excess or insufficient helical twisting is referred to, respectively, as positively or negatively "supercoiled". DNA in vivo is typically negatively supercoiled, which facilitates the unwinding of the double-helix required for RNA transcription. Sugar pucker There are four conformations that the ribofuranose rings in nucleotides can acquire:

1. C-2' endo 2. C-2' exo 3. C-3' endo 4. C-3' exo

Ribose is usually in C-3'endo, while deoxyribose is usually in the C-2' endo sugar pucker conformation. The A and B forms differ mainly in their sugar pucker. In the A form, the C3'

configuration is above the sugar ring, whilst the C2' configuration is below it. Thus, the A form is described as "C3'-endo." Likewise, in the B form, the C2' configuration is above the sugar ring, whilst C3' is below; this is called "C2'-endo." Altered sugar puckering in A-DNA results in shortening the distance between adjacent phosphates by around one angstrom. This gives 11 to 12 base pairs to each helix in the DNA strand, instead of 10.5 in B-DNA. Sugar pucker gives uniform ribbon shape to DNA, a cylindrical open core, and also a deep major groove more narrow and pronounced that grooves found in B-DNA. Conditions for formation of A and Z helices The two other known double-helical forms of DNA, called A and Z, differ modestly in their geometry and dimensions. The A form appears likely to occur only in dehydrated samples of DNA, such as those used in crystallographic experiments, and possibly in hybrid pairings of DNA and RNA strands. Segments of DNA that cells have methylated for regulatory purposes may adopt the Z geometry, in which the strands turn about the helical axis like a mirror image of the B form. Non-helical forms Other, including non-helical, forms of DNA have been described, for example a side-by-side (SBS) configuration. Indeed, it is far from certain that the B-form double helix is the dominant form in living cells. DNA REPLICATION Watson and Crick were particularly excited about their model because the complementary nature of the DNA molecule suggested a way in which it might self-replicate. The two strands could separate from one another, each still containing the complete information, and synthesize a new strand. But it was only in 1957 Matthew Meselson and Franklin Stahl did an experiment to determine, whether, the two strands unwind and each act as a template for new strands - the semiconservative replication as proposed by the Watson and Crick or the strands do not unwind, but somehow generate a new double stranded DNA - the conservative replication. In order to determine this they have labeled DNA strand with the heavy isotope of nitrogen (N-15) and then this DNA was allowed to go through one round of replication with N-14, and then the mixture was centrifuged so that the heavier DNA would form a band lower in the tube, and the intermediate (one N-15 strand and one N-14 strand) and light DNA (all N-14) would appear as a band higher in the tube. With the result of this experiment Meselson and Stahl could prove that the DNA replication is semiconservative, where one strand (old) acts as the template for the synthesis of the new one (Fig. 9). Fig 9: Showing semiconservative and conservative models of DNA replications.

DNA replication is not a passive and spontaneous process; it requires two strands of parental duplex to separate. However the disruption of the structure is only transient and is reversed as the daughter duplex is formed. The process of DNA replication is catalyzed by a number of enzymes. DNA replication begins with the activity of the topoisomerase enzyme, which is responsible for initiation of the unwinding of the DNA by nicking a single strand of DNA and releasing tension holding the helix in its coiled and supercoiled structure. Then an enzyme known as DNA Helicase accomplishes unwinding of the original double strand, once supercoiling has been eliminated by the topoisomerase. The two strands very much want to bind together because of their hydrogen bonding affinity for each other, so the helicase activity requires energy (in the form of ATP) to break the strands apart. The partial unwounded DNA double helix at an area is known as the replication fork (Fig. 10). This unwound section appears under electron microscopes as a "bubble" and is thus also known as a replication bubble. As the two DNA strands separate and the bases are exposed, the enzyme DNA polymerase (III) moves into position at the point where synthesis will begin. The start point for DNA polymerase is a short segment of RNA known as an RNA primer. The very term "primer" is indicative of its role, which is to "prime" or start DNA synthesis at certain points. The primer is "laid down" complementary to the DNA template by an enzyme known as RNA polymerase or Primase. The DNA polymerase (once it has reached its starting point as indicated by the primer) then adds nucleotides one by one in an exactly complementary manner, A to T and G to C. DNA polymerase is described as being "template dependent" in that it will "read" the sequence of bases on the template strand and then "synthesize" the complementary strand. The template strand is always read in the 3' to 5' direction. The new DNA strand (since it is complementary) must be synthesized in the 5' to 3' direction (as both strands of a DNA molecule are described as being antiparallel). DNA polymerase catalyzes the formation of the hydrogen bonds between

each arriving nucleotide and the nucleotides on the template strand. In addition to catalyzing the formation of Hydrogen bonds between complementary bases on the template and newly synthesized strands, DNA polymerase also catalyzes the reaction between the 5' phosphate on an incoming nucleotide and the free 3' OH on the growing polynucleotide forming a phosphodiester bond. As a result, the new DNA strands can grow only in the 5' to 3' direction, and strand growth must begin at the 3' end of the template. Because the original DNA strands are complementary and run antiparallel, only one new strand can begin at the 3' end of the template DNA and grow continuously as the point of replication (the replication fork) moves along the template DNA. The other strand must grow in the opposite direction because it is complementary, not identical to the template strand. The result of this side's discontiguous replication is the production of a series of short sections of new DNA called Okazaki fragments (after their discoverer). To make sure that this new strand of short segments is made into a continuous strand, the sections are joined by the action of an enzyme called DNA ligase, which ligates the pieces together by forming the missing phosphodiester bonds. The last step is for an enzyme to come along and remove the existing RNA primers and then fill in the gaps with DNA. This RNA primer is eventually removed by RNase H and the gap is filled in by DNA polymerase I. Since each new strand is complementary to its old template strand, two identical new copies of the DNA double helix are produced during replication. In each new helix, one strand is the old template and the other is newly synthesized, a result described by saying that the replication is semi-conservative. Fig 10: The DNA replication fork (From http://www.accessexcellence.org/)

Prokaryotes The single molecule of DNA that is the E. coli genome contains 4.7 x 106 nucleotide pairs. DNA replication begins at a single, fixed location in this molecule, the replication origin, proceeds at about 1000 nucleotides per second, and thus is done in no more than 40 minutes. And thanks to the precision of the process (which includes a "proof-reading" function), the job is done with only about one incorrect nucleotide for every 109 nucleotides inserted. In other words, more often than not, the E. coli genome (4.7 x 106) is copied without error! Eukaryotes The average human chromosome contains 150 x 106 nucleotide pairs, which are copied at about 50 base pairs per second. The process would take a month (rather than the hour it actually does) but for the fact that there are many places on the eukaryotic chromosome where replication can begin. Replication begins at some replication origins earlier in S phase than at others, but the process is completed for all by the end of S phase. As replication nears completion, "bubbles" of newly replicated DNA meet and fuse, finally forming two new molecules. MUTATION – THE SEQUENCE CHANGE IN DNA All organisms suffer a certain number of structural changes in their DNA as the result of normal cellular operations (changes results when the DNA polymerase makes a mistake, which happens about once every 100,000,000 bases) or random interactions with the environment factors like ultraviolet light, nuclear radiation, and certain chemicals. The actual numbers of such changes that remain incorporated into the DNA are far lower than their rate of occurrence, as the cells contain special DNA repair proteins that fix many of the changes in the DNA. The changes that have been skipped from the repair mechanism and get incorporated into the DNA are called spontaneous mutations; the rate at which they occur is characteristic for any particular organism and is sometimes called the background level. Mutations are rare events, and of course those that damage a gene are selected against during evolution. It is therefore difficult to obtain large numbers of spontaneous mutants to study from natural populations. Some of these changes occur in cells of the body - such as in skin cells as a result of sun exposure - but are not passed on to children are called somatic mutations. But other errors can occur in the DNA of cells that produce the eggs and sperm. These are called germline mutations and can be passed from parent to child. If a child inherits a germline mutation from their parents, every cell in their body will have this error in their DNA. Germline mutations are what cause diseases to run in families, and are responsible for the kind of hereditary diseases covered by Genetic Health. A gene is essentially a sequence of the bases A, T, G, C and it is in the sequence of these bases lies the information that describes how to make a protein. Any changes in the sequence that can alter the gene's meaning and change the protein that is made, or how or when a cell makes that protein. There are many different ways to alter a gene. Following are the examples of some types of mutations:

Single-base substitutions A single base, say an A, becomes replaced by another. Single base substitutions are also called point mutations. If one purine (A or G) or pyrimidine (C or T) is replaced by the other, the substitution is called a transition. If a purine is replaced by a pyrimidine or vice-versa, the substitution is called a transversion. Missense mutations With a missense mutation, the new nucleotide alters the codon so as to produce an altered amino acid in the protein product. e.g. sickle-cell disease The replacement of A by T at the 17th nucleotide of the gene for the beta chain of hemoglobin changes the codon GAG (for glutamic acid) to GTG (which encodes valine). Thus the 6th amino acid in the chain becomes valine instead of glutamic acid. Nonsense mutations With a nonsense mutation, the new nucleotide changes a codon that specified an amino acid to one of the STOP codons (TAA, TAG, or TGA). Therefore, translation of the messenger RNA transcribed from this mutant gene will stop prematurely. The earlier in the gene that this occurs, the more truncated the protein product and the more likely that it will be unable to function. Silent mutations Most amino acids are encoded by several different codons. e.g. if the third base in the TCT codon for serine is changed to any one of the other three bases, serine will still be encoded. Such mutations are said to be silent because they cause no change in their product and cannot be detected without sequencing the gene (or its mRNA). Splice-site mutations The removal of intron sequences, as pre-mRNA is being processed to form mRNA, must be done with great precision. Nucleotide signals at the splice sites guide the enzymatic machinery. If a mutation alters one of these signals, then the intron is not removed and remains as part of the final RNA molecule. The translation of its sequence alters the sequence of the protein product. Insertions and Deletions (Indels) Extra base pairs may be added (insertions) or removed (deletions) from the DNA of a gene. The number can range from one to thousands. Collectively, these mutations are called indels. Indels involving one or two base pairs (or multiples thereof) can have devastating consequences to the gene because translation of the gene is “frameshifted". This figure shows how by shifting the reading frame one nucleotide to the right, the same sequence of nucleotides encodes a different sequence of amino acids. The mRNA is translated in new groups of three nucleotides and the protein specified by these new codons will be worthless. Frameshifts often create new STOP

codons and thus generate nonsense mutations. Perhaps that is just as well as the protein would probably be too garbled anyway to be useful to the cell. Indels of three nucleotides or multiples of three may be less serious because they preserve the reading frame. However, a number of inherited human disorders are caused by the insertion of many copies of the same triplet of nucleotides. Huntington's disease and the fragile X syndrome are examples of such trinucleotide repeat diseases. Duplications Duplications are a doubling of a section of the genome. During meiosis, crossing over between sister chromatids that are out of alignment can produce one chromatid with a duplicated gene and the other having gene with deletions. Translocations Translocations are the transfer of a piece of one chromosome to a nonhomologous chromosome. Translocations are often reciprocal; that is, the two nonhomologues swap segments. Translocations can alter the phenotype is several ways:

• the break may occur within a gene destroying its function • translocated genes may come under the influence of different promoters and enhancers so

that their expression is altered. The translocations in Burkitt's lymphoma are an example. • the breakpoint may occur within a gene creating a hybrid gene. This may be transcribed

and translated into a protein with an N-terminal of one normal cell protein coupled to the C-terminal of another. The Philadelphia chromosome found so often in the leukemic cells of patients with chronic myelogenous leukemia (CML) is the result of a translocation which produces a compound gene (bcr-abl).

Inversion In an inversion mutation, an entire section of DNA is reversed. A small inversion may involve only a few bases within a gene, while longer inversions involve large regions of a chromosome containing several genes. THE STRUCTURE AND FUNCTION OF RNA Ribonucleic acid, or RNA, gets its name from the sugar group in the molecule's backbone - ribose. The primary structure of RNA is similar to that of DNA. Several important similarities and differences exist between RNA and DNA. Like DNA, RNA has a sugar-phosphate backbone with nucleotide bases attached to it. Like DNA, RNA contains the bases adenine (A), cytosine (C) and guanine (G); however, RNA does not contain thymine, instead, contain uracil (U) base. Unlike the double-stranded DNA molecule, RNA is a single-stranded molecule its base composition does not follow Chargaff’s rule. Nevertheless, there is some degree of secondary structure in the different RNA types, because the molecule can form hairpin loops of hydrogen bonded A-U or G-C pairs. The actual sequence of ribonucleotides in RNA is sometimes called its primary structure. With the loops included it is said to have a secondary structure. It can also fold

into a three dimensional shape referred to as its tertiary structure. RNA is the main genetic material used in the organisms called viruses, and RNA is also important in the production of proteins in other living organisms. RNA can move around the cells of living organisms and thus serves as a sort of genetic messenger, relaying the information stored in the cell's DNA out from the nucleus to other parts of the cell where it is used to help make proteins. There are three main types of RNA molecules:

• messenger RNA (mRNA) • ribosomal RNA (rRNA) • transfer RNA (tRNA)

There are also many other types of RNA molecules that are not directly involved in protein synthesis. They are sometimes called noncoding RNA. Messenger RNA (mRNA) Messenger RNA is the type of RNA familiar to most people, which carry the information from DNA to the site of protein synthesis. The term messenger RNA (mRNA) proposed by Jacob and Monod in 1961, refers to the fact that this is a template molecule copied from DNA and has a rapid turnover. The information stored in mRNA is used to make proteins. When mRNA is first created in eucaryotes it is called precursor mRNA because it needs to be modified before it can pass on the information it has for the formation of protein. The first two modifications are capping and the addition of a poly A tail. The third type of modification involves the removal of introns and the splicing together of exons. Segments of DNA that contain information for the formation of proteins are called exons. Exons typically have other segments DNA separating them from each other. These segments are called introns. The precursor mRNA contains both the exons and introns. The introns need to be cut out and the exons need to be connected back together. Some human genes for proteins are split up into as many as 79 different exons. A spliceosome is a complex of proteins and small RNA molecules, and is where the removal of introns and the splicing together of exons take place. Messenger RNA makes up only about 5% of all RNA in a typical cell and is made up of small amounts of thousands of different mRNA molecules. In bacteria mRNA is modified very little if at all. Since bacteria do not have a nucleus, translation starts before transcription even ends so there is no time for RNA splicing, or a need as prokaryotic genes are not split into separate exons. Ribosomal RNA (rRNA) Ribosomes are made of protein and ribosomal RNA (rRNA) and are where translation of RNA to protein takes place. In E. coli ribosomes contain three kinds of rRNA - 23S, 16S and 5S. In eucaryotes, there are four kinds of rRNA - 18S, 28S, 5.8S, and 5S. One 18S molecule is used to make the small subunit of the ribosome, with the help of several proteins. The 28S, 5.8S, and 5S rRNA molecules are involved with the construction of the large subunit of the ribosome. The 28S, 18S, and 5.8S molecules are made from the processing of a single precursor RNA. Transfer RNA (tRNA) There are at least 32 different kinds of tRNA in an eucaryotic cell. They are relatively small

molecules, each one is made up of only 73-93 ribonucleotides. Although tRNA is a single strand of RNA, it bends around in certain places resulting in some ribonucleotides pairing up with others in the same chain, forming three loops (Fig. 11). Each tRNA molecule has one amino acid attached to its 3' end. Since there are only 20 amino acids and around 32 different kinds of tRNAs, some amino acids are carried by more than one type of tRNA. On one of the three loops is what is called an anticodon. Anticodons are made up of three bases and are involved in translation. The particular amino acid attached to a tRNA molecule is determined by its anticodon sequence. Fig. 11: Diagram of tRNA showing aminoacyle binding site (acceptor); the anticodon loop, which bind to

mRNA at specific codon; the ribosomal recognition site (TψC loop); and the D loop (From http://motif.stanford.edu/thesis/tRNA.html.).

Noncoding RNA (ncRNA) Noncoding RNA is not involved (at least not directly) in protein synthesis. Instead it is involved in many other cell processes including the regulation of transcription, DNA replication and RNA processing and modification. The size of noncoding RNAs can be any where from 21 ribonucleotides long to as much as 10,000 or more ribonucleotides in length. In bacteria ncRNA is sometimes referred to as small RNA (sRNA). Some examples of ncRNA are: XIST RNA - inactivates one of the two X chromosomes in females SnRNA - involved with the processing of larger precursor RNA molecules

SnoRNA - is involved in making ribosomes and telomeres miRNA - is involved with the regulation of the expression of mRNA siRNA - small, bind to complementary RNA sequences targeting them for destruction PROTEIN SYNTHESIS

The genetic code Living organisms are complex systems. Hundreds of thousands of proteins exist inside each one of us to help carry out our daily functions. These proteins are produced locally, assembled piece-by-piece to exact specifications. An enormous amount of information is required to manage this complex system correctly. This information, detailing the specific structure of the proteins inside of our bodies, is stored in DNA molecule. At the molecular level it has been found that the codons, i.e., the hereditary units that contain the information to code for a single amino acid, are made of three nucleotides (a triplet). This information is first transcribed into the messenger RNA (mRNA), which has a sequence of bases complementary with DNA, from which it is copied. In fact, mRNA, like DNA has only four bases, whereas proteins may contain up to 20 amino acids. Permutation of the 4 bases yields 43 or 64 triplets - more than enough to code for 20 amino acids. If the genetic code consisted of doublets, the number of codons would be insufficient (i.e., 42 = 16). The mRNA in turn serves as an intermediary that contains the same genetic information and translates this information into the amino acid sequence of the protein. It is important to remember some of the fundamental experiments that facilitated the discovery of the genetic code. In. 1961 Nirenberg and Matthaei made the basic observation that synthetic polyribonucleotides could act as artificial mRNAs and could stimulate the incorporation of amino acids into polypeptides. The first one used was polyuridylic acid (poly U) and the result was the coding of polyphenylalanine (a peptide chain made of phenylalanine). Thus, it was deduced that the codon for phenylalanine was UUU. Other homopolymers, such as poly A, stimulated the uptake of lysine and poly C of proline. The use of synthetic RNAs of known composition was made possible by a previous discovery by Ochoa that the enzyme poly-nucleotide phosphorylase can link the specific nucleotides added to the medium. By 1963, the experiments with synthetic RNAs done in the laboratories of Nirenberg and Ochoa had established most of the codon sequences. The recognition of codons was later made possible by the use of trinucleotide templates of known base composition." When ribosomes are incubated with 14C-AA-tRNA and such trinucleotides, complexes are formed that can easily be detected by filtration. In the laboratory of Khorana, polyribonucleotides with alternating doublets or triplets of known sequences were synthesized and used in cell free systems. As shown in Table 1, several RNA codons may code for a single amino acid-a fact that is also called degeneracy of the genetic code. Leucine, for example, may be coded by CUU, CUC, and CUA. In most cases the synonymous codons differ only in the base occupying the third position of the triplet. The first two bases of the codon are apparently more important in coding. Since the same amino acid is coded by synonymous codons, it is logical to assume that mutations due to replacement of the third base may go unnoticed.

The initiation signal for the synthesis of a protein is the AUG codon. When the AUG codon is at the beginning of the message (starting codon), in bacteria, it will code for N-formylmethionine. If the AUG codon is in another position, it will code for methionine. The termination signal is provided by the so-called nonsense codons UAG, UAA, and UGA (Table 1). Table 1: The genetic code

2nd Base 1st

Base U C A G

3rd Base

UUU Phe UCU Ser UAU Tyr UGU Cys U

UUC Phe UCC Ser UAC Tyr UGC Cys C

UUA Leu UCA Ser UAA Nonsense UGA Nonsense A

U

UUG Leu UCG Ser UAG Nonsense UGG Trp G

CUU Leu CCU Pro CAU His CGU Arg U

CUC Leu CCC Pro CAC His CGC Arg C

CUA Leu CCA Pro CAA Gln CGA Arg A

C

CUG Leu CCG Pro CAG Gln CGG Arg G

AUU Ile ACU Thr AAU Asn AGU Ser U

AUC Ile ACC Thr AAC Asn AGC Ser C

AUA Ile ACA Thr AAA Lys AGA Arg A

A

AUG Met F-Met

ACG Thr AAG Lys AGG Arg G

GUU GCU Ala GAU Asp GGU Gly U

GUC Val GCC Ala GAC Asp GGC Gly C

GUA Val GCA Ala GAA Glu GGA Gly A

G

GUG Val GCG Ala GAG Glu GGG Gly G Although most of our knowledge about the genetic code comes from experiments with E. coli, essentially similar results have been obtained with other system such as amphibian, mammalian liver, and plant tissue. It may be said that the genetic code is largely universal, i.e., there is a single code for all living organisms. As Nirenberg has pointed out, the genetic code may have developed at the same time as the first bacteria, some three billion years ago, and since then it has changed relatively little throughout evolution of living organisms.

Transcription – the mRNA synthesis

The process of converting the information contained in a DNA segment into proteins begins with the synthesis of mRNA molecules containing anywhere from several hundred to several thousand ribonucleotides, depending on the size of the protein to be made. Each of the 100,000 or so proteins in the human body is synthesized from a different mRNA that has been synthesized in the cell nucleus by transcription of DNA (gene), a process highly analogous to DNA replication. Of course, there are different effectors, or proteins, that direct transcription. Primary among these is the RNA polymerase holoenzyme, an agglomeration of many different factors that together direct the synthesis of mRNA on a DNA template. An mRNA molecules may contain anywhere from several hundred to several thousand ribonucleotides, depending on the size of the protein to be made. Initiation of Transcription RNA polymerase must be able to recognize the beginning of a gene so that it knows where to start synthesizing an mRNA. It is directed to the start site of transcription by one of its subunits' affinity to a particular DNA sequence that appears at the beginning of genes. Such unidirectional sequence on one strand of DNA is called as promoter site. These sites are recognized by a factor called "SIGMA". It is sigma's job to recognize the promoter sites and "tell" the DNA dependent RNA polymerase both where to start and in which direction (that is, on which strand) to continue synthesis. Once the RNA polymerase has been directed to the start point of the gene by sigma, the sigma factor is released and the RNA polymerase carries out the process of transcription. The bacterial promoter almost always contains some version of the elements shown in figure 12. The RNA polymerase then stretches open the double helix at that point in the DNA and begin synthesis of an RNA strand complementary to one of the strands of DNA. The DNA strand from which it copies RNA is called antisense or template strand, and the other strand, to which it is identical, is called sense or coding strand. The RNA polymerase recruits rNTPs (ribonucleic nucleotides triphosphates) in the same way that DNA polymerase recruits dNTPs. However, since synthesis is single stranded and only proceeds in the 5' to 3' direction, there is no need for Okazaki fragments.

Fig. 12: Transcription initiation site showing promoter sequences.

Termination of Transcription How does RNA polymerase know when to stop transcribing a gene? Like the promoter sequence, there are other base sequences at the end of a gene that signal a stop to mRNA synthesis. Just as there is a sigma factor to help signal the beginning of a gene, another factor called "Rho" aids in terminating the process of transcription. When the end of the gene is near, the “Rho” factor binds to the mRNA and interacts with the RNA polymerase. The interaction of Rho with the RNA polymerase causes the enzyme to "fall off" the DNA template strand, thus stopping transcription. Fate of synthesized mRNA The average life span of some of the mRNAs in E. coli is about two minutes, after which the molecules are broken down by ribonuleases. In fact in bacteria mRNA may be read on one hand while the other end is still being transcribed. It may also disintegrate at the starting end, while the reading is terminating in the other. In contrast origin and fate of mRNA in eukaryotic cells is much more complex. In eukaryotes the formation of a functionally active mRNA is the consequence of a complex series of steps that comprises (1) The actual transcription of DNA into mRNA precursors (2) The intracellular processing of these precursors and (3) The transport of the mRNAs into cytoplasm and there association with the ribosomes to initiate the process of translation or protein synthesis. Translation The cellular factory responsible for synthesizing proteins is the ribosome. The ribosome consists of structural RNA and about 80 different proteins. In its inactive state, it exists as two subunits; a large subunit and a small subunit. When the small subunit encounters an mRNA, the process of translation of the mRNA to protein begins. There are two sites in the large subunit, the first site is the site where the growing peptide (another word for protein) will reside, it is known as the P site. Whereas another site just to the 3' direction of the P site; it is known as the A site. This is where the incoming tRNA will attach itself. As discussed previously, the adaptor molecule that acts as a translator between mRNA and protein is a specific RNA molecule, tRNA (transfer RNA). Each tRNA has a specific anticodon and acceptor site. Each tRNA also has a specific charger protein; this protein can only bind to that particular tRNA and attach the correct amino acid to the acceptor site. The energy to make this bond comes from ATP. These charger proteins are called aminoacyl tRNA synthetases. The first AUG codon on the 5' end of the mRNA acts as a "start" signal for the translation machinery and codes for the introduction of a methionine amino acid. Initiation is complete when the methionine tRNA occupies one of the two binding sites on the ribosome. The next incoming tRNA will bind to the A site (next to where the tRNA with the methionine attached is on the P site). ALL available tRNAs will approach the site and try to attach, but the only tRNA which will successfully attach is the one whose anticodon is complementary to the codon of the A site on the mRNA. In order for a protein chain to form, the amino acids must be attached, linked together. The link between amino acids is called a peptide bond. Amino acids continue to

be linked until the protein is finished. This special type of bond is formed by the enzyme peptidase. Once the bond has formed between the two amino acids, the tRNA on the P site leaves and passes its amino acid on to the tRNA on the A site. The tRNA with the two amino acids on it is now sitting on the P site (because it is holding the growing protein). The ribosome slides down three bases (1 codon on the mRNA) exposing a new A site by the action of a translocase The next appropriate tRNA molecule "lands" bringing its amino acid right next to the tRNA holding the two amino acids. At this point, the process repeats itself: a peptide bond forms between the two amino acid molecules already joined together and the newly brought in amino acid; the tRNA on the P site leaves and the chain of amino acids is passed to the tRNA on the A site by the action of translocase (now this site is called the P site because this tRNA now has the growing protein chains). The ribosome slides down another codon and the procedure repeats itself until the termination event occurs. A "stop" codon (UAA, UGA, or UAG) signals the end of the process (Fig. 13). There is no tRNA that is complementary to the Stop Codon, so the process of building the protein stops. An enzyme called the releasing factor then frees the newly made polypeptide chain, also known as the protein, from the last tRNA. The mRNA molecule is released from the ribosome as the small and large subunits fall apart. The mRNA can then be re-translated or it may be degraded, depending on how much of that particular protein is needed. All mRNA messages are eventually degraded when the protein no longer needs to be made. Even though every protein begins with the Methionine amino acid, not all proteins will ultimately have methionine at one end. If the "start" methionine is not needed, it is removed before the new protein goes to work (either inside the cell or outside the cell, depending on the type of protein synthesized) Fig. 13: Showing different stages of protein synthesis (from http://web.mit.edu/esgbio/www/7001main.html)

THE GENETIC REGULATION As we know all the cells of an organism have the same DNA content and the DNA of the cell specify its activities and characteristics, however, that there are different cell types in our bodies, and that the activities of these cells changes with time. The hormone-producing cells in the pituitary gland only produce growth hormone during childhood and adolescence. These same cells remain in the pituitary in adulthood, but they do not function to produce growth hormone. How do they know when they are needed or not needed? This question as it applied to large, complex organisms like humans was very daunting for scientists in the first half of the 20th century. The Lac Operon - A Inducer Francois Jacob and Jacques Monod were the first scientists to postulate the existence of a transcriptionally regulated system, the Operon. While working on the lactose metabolism of E. Coli, they elucidated that the Lac Operon comprises three structural genes-the Lac Z, Lac Y, and Lac A and produces a polycistronic mRNA, which codes for the following enzymes:

• beta-galactosidase: This enzyme hydrolyzes the bond between the two sugars, glucose and galactose. It is coded for by the gene LacZ.

• Lactose Permease: This enzyme spans the cell membrane and brings lactose into the cell from the outside environment. The membrane is otherwise essentially impermeable to lactose. It is coded for by the gene LacY.

• Thiogalactoside transacetylase: The function of this enzyme is not known. It is coded for by the gene LacA.

The structural genes responsible for these three enzymes appear adjacent to each other on the E. Coli genome. A region precedes them is responsible for the regulation of the lactose metabolic genes. It contains Lac I or regulatory gene, the promoter and the operator regions of the lac Operon. Lac i gene code for a repressor, which is a soluble protein and bind specifically to the lac operator region. Each subunit of the repressor has one binding site for the inducer. Whereas promoter segment is the region to which the RNA polymerases first become attached to initiate the transcription of the structural genes (Fig. 14a). When lactose is present, it acts as an inducer of the operon. It enters the cell and binds to the Lac repressor, inducing a conformational change that allows the repressor to fall off the operator segment. Now the RNA polymerase is free to move along the DNA and RNA can be made from the three genes. Lactose can now be metabolized (Fig. 14b). When the inducer (lactose) is removed, the repressor returns to its original conformation and binds to the promoter. No RNA and no protein are made. Note that RNA polymerase can still bind to the promoter though it is unable to move and pass the operator region as repressor has already in the position and blocking the transcriptional path (Fig. 14c). That means that when the cell is ready to use the operon, RNA polymerase is already there and waiting to begin transcription; the promoter does not have to wait for the holoenzyme to bind. When levels of glucose (a catabolite) in the cell are high, a molecule called cyclic AMP is inhibited from forming. So when glucose levels drop, more cAMP forms. cAMP binds to a protein called CAP (catabolite activator protein), which is then activated to bind to the CAP binding site. This activates transcription, perhaps by increasing the affinity of the site for RNA polymerase. This phenomenon is called catabolite repression, a misnomer since it involves activation, but understandable since it seemed that the presence of glucose repressed all the other sugar metabolism operons (Fig. 14d). Fig. 14a: Diagram representing the Lac Operon (modified from http://web.mit.edu/esgbio/www/7001main.html)

: Region coding for protein

: Regulatory region : Deffusable regulatory proteins

Operator (LacO)

: Binding site for repressor

Promoter (LacP)

: Binding site for RNA polymerase

: Gene encoding lac repressor protein Repressor (LacI) : Binds to DNA at operator and blocks binding of RNA

polymerase at promoter Pi : Promoter for LacI CAP : Binding site for cAMP/CAP complex

Fig. 14b: Diagram representing the regulation of Lac Operon in the presence of inducer (modified from http://web.mit.edu/esgbio/www/7001main.html).

Fig. 14c: Diagram representing the regulation of Lac Operon in the absence of inducer (modified from http://web.mit.edu/esgbio/www/7001main.html).

Fig. 14d: Diagram representing the mechanism by which cAMP regulate of Lac Operon (modified from http://web.mit.edu/esgbio/www/7001main.html).

The Tryptophan Operon: A Repressor When should the bacteria be transcribing genes for the synthesis of the amino acid tryptophan? When levels of tryptophan in the cell are low, the bacteria must make its own. However, if tryptophan is abundant in the cell or is being provided in the medium, it is a waste of energy for the bacteria to be synthesizing it. The Trp repressor protein can bind to the operator of the Trp operon, which contains the tryptophan biosynthetic genes. When tryptophan is in abundance, it binds to the repressor and

induces a change so that the repressor can bind to operator DNA segment. When tryptophan levels are low, the tryptophan falls off the repressor, and the repressor goes back to its original conformation, losing its ability to bind to the DNA. The operator is now free for RNA polymerase and transcription proceeds, making tryptophan biosynthetic genes and replenishing the cell's supply of tryptophan. This kind of feedback inhibition of transcription is very common. The Lambda Phage Cycle: Decision Control A bacteriophage can choose between lytic and lysogenic phage cycles. When there are many bacteria around to infect, and they are growing well, the phage wants to take advantage and replicate itself as much as possible. However, when there are few bacteria around and little growth potential, the phage is better off integrating into the bacterial genome and waiting until the pickings are good again so that its progeny will have another bacterium to infect. How does the phage make these decisions? There exist two competing proteins in the lambda bacteriophage. One protein, C1, promotes the lysogenic cycle. The other protein, Cro, promotes the lytic phase. These two proteins are in direct competition to each other for sites on the "right" promoter of lambda. Being synthesized continuously at low levels C1 concentration builds up in short availability of bacteria. It binds with the promoter and inhibit the lytic phase (Fig. 15). Fig. 16: Diagram representing the Lambda Phage Cycle: Decision Control (modified from http://web.mit.edu/esgbio/www/7001main.html).

The promoter and enhancer and transcription control in eukaryotes Initiation of transcription requires the enzyme RNA polymerase and transcription factors. Any protein that is needed for the initiation of transcription, but which is not itself part of RNA polymerase, is defined 'as a transcription factor. Many transcription factors act by recognizing cis-acting sites on DNA. However, binding to DNA is not the only means of action for a transcription factor. A factor may recognize another factor, or may recognize RNA polymerase, or may be incorporated into an initiation complex only in the presence of several other proteins. The ultimate test for membership of the transcription apparatus is functional: a protein must be needed for transcription to occur at a specific promoter or set of promoters. A significant difference between the transcription of eukaryotic and prokaryotic mRNAs is that initiation at a eukaryotic promoter involves a large number of factors that bind to a variety of cis-

acting elements. The promoter is defined as the region containing all these binding sites, that is, which can support transcription at the normal efficiency and with the proper control. So the major feature defining the promoter for a eukaryotic mRNA is the location of binding sites for transcription factors. RNA polymerase itself binds around the start point, but does not directly contact the extended upstream region of the promoter. By contrast, the bacterial promoters discussed early in this section are largely defined in terms of the binding site for RNA polymerase in the immediate vicinity of the start point. Transcription in eukaryotic cells is divided into three classes. Different RNA polymerase transcribes each class: RNA polymerase I transcribes rRNA RNA polymerase II transcribes mRNA RNA polymerase III transcribes tRNA and other small RNAs. Transcription factors are needed for initiation, but are not required subsequently. For the three eukaryotic enzymes, the factors, rather than the RNA polymerases themselves, are principally responsible for recognizing the promoter. This is different from bacterial RNA polymerase, where it is the enzyme that recognizes the promoter sequences. For all eukaryotic RNA polymerases, the factors create a structure at the promoter to provide the target that is recognized by the enzyme. For RNA polymerases I and III, these factors are relatively simple, but for RNA polymerase II they form a sizeable group collectively known as the basal factors. The basal factors join with RNA polymerase II to form a. complex surrouriding the startpoint, and they determine the site of initiation. The basal factors together with RNA polymerase constitute the basal transcription apparatus. The promoters for RNA polymerases I and II are (mostly) upstream of the startpoint, but some promoters for RNA polymerase III lie downstream of the startpoint. Each promoter contains characteristic sets of short conserved sequences that are recognized by the appropriate class of factors. RNA polymerases I and III each recognize a relatively restricted set of promoters, and rely upon a small number of accessory factors. Promoters utilized by RNA polymerase II show more variation in sequence, and have a modular organization. Short sequence elements that are recognized by transcription factors lie upstream of the startpoint. These cis-acting sites usually are spread out over a region of >200 bp. Some of these elements and the factors that recognize them are common: they are found in a variety of promoters and are used constitutively. Others are specific: they identify particular classes of genes and their use is regulated. The elements occur in different combinations in individual promoters. All RNA polymerase II promoters have sequence elements close to the startpoint that are bound by the basal apparatus and that establish the site of initiation. The sequences farther upstream determine whether the promoter is expressed in all cell types or is specifically regulated. Promoters that are constitutively expressed (their genes are sometimes called housekeeping genes) have upstream sequence elements that are recognized by ubiquitous activators. No element/factor combination is an essential component of the promoter, which suggests that initiation by RNA polymerase II may be sponsored in many different ways. Promoters that are expressed only in certain times or places have sequence elements that require activators that are

available only at those times or places. Sequence components of the promoter are defined operationally by the demand that they must be located in the general vicinity of the startpoint and are required for initiation. The enhancer is another type of site involved in initiation. It is identified by sequences that stimulate initiation, but that are located a considerable distance from the startpoint. Enhancer elements are often targets for tissue-specific or temporal regulation. The components of an enhancer resemble those of the promoter; they consist of a variety of modular elements. However, the elements are organized in a closely packed array. The elements in an enhancer function like those in the promoter, but the enhancer does not need to be near the startpoint. However, proteins bound at enhancer elements interact with proteins bound at promoter elements. The distinction between promoters and enhancers is operational, rather than implying a fundamental difference in mechanism. This view is strengthened by the fact that some types of element are found in both promoters and enhancers. Eukaryotic transcription is most often under positive regulation: a transcription factor is provided under tissue-specific control to activate a promoter or set of promoters that contain a common target sequence. Regulation by specific repression of a target promoter is less common. Literature cited

Avery, O. T., MacLeod, C. M., and McCarty, M. (1944). Studies on the chemical nature of the substance inducing transformation of pneumococcal types. J. Exp. Med. 98,451-460.

Benzer, S., and Champe, S. P. (1961). Ambivalent rll mutants of phage T4. Proc. Nat. Acad. Sci. USA 47, 403-416.

Cairns, J., Stent, G., and Watson, J. D. (1966). Phage and the Origins of Molecular Biology. Cold Spring Harbor Symp. Quant. BioI.

Chomet, S. (1994): DNA Genesis of a Discovery, 1994, Newman-Hemisphere Press, London.

Coulondre, C. et al. (1978). Molecular basis of base substitution hotspots in E. coli. Nature 274, 775-780.

Crick, F. H. C., Barnett, L., Brenner, S., and WattsTobin, R. J. (1961). General nature of the genetic code for proteins. Nature 192, 1227-1232.

Delmonte, C. S. and Mann, L. R. B.: a recent research paper summarises some key experimental data which are better explained by SBS models than by the double helixhttp://www.ias.ac.in/currsci/dec102003/1564.pdf

Delmonte, C. S., A detailed study of the experimental results remaining to be explained by the double helix model. http://www.notahelix.com/delmonte/new_struct_mol_biol.pdf

Drake, J. W. (1991). A constant rate of spontaneous mutation in DNA-based microbes. Proc. Nat. Acad. Sci. USA 88,7160-7164.

Drake, J. W., and Balz, R. H. (1976). The biochemistry of mutagnesis. Ann. Rev. Biochem. 45, 11-37.

Drake, J. W., Charleswort_, B., Charlesworth, D., and Crow, J. F. (1998). Rates of spontaneous mutation. Genetics 148, 1667-1686.

Griffith, F. (1928). The significance of pneumococcal types. J. Hyg. 27,113-159.

Grogan, D. W., Carver, G. T., and Drake, J. W. (2001). Genetic fidelity under harsh conditions: analysis of spontaneous mutation in the thermoacidophilic archaeon Sulfolobus acidocaldarius. Proc. Nat. Acad. Sci. USA 98, 7928-7933.

Hershey, A. D., and Chase, M. (1952). Independent functions of viral protein and nucleic acid in growth of bacteriophage. J. Gen. Physiol. 36, 39-56.

Holmes, F. (2001). Yale University Press. Meselson, Stahl, and the Replication of DNA: A History of The Most Beautiful Experiment in Biology.

http://biocrs.biomed.brown.edu/Books/Chapters/Ch%208/DH-Paper.html: Text of the original paper that Watson and Crick published in 1953.

http://darwin.cshl.org/: (Cold Spring Harbor Laboratory) This site has several excellent animations (Shockwave enhanced) as well as information about their favorite molecule, DNA.

http://web.mit.edu/esgbio/www/7001main.html

http://www.accessexcellence.org/

http://www.gdb.org/Dan/DOE/prim6.html: (DOE) Terms peculiar to molecular genetics.

Judson, H. (1978). The Eighth Day of Creation. Knopf, New York.

Maki, H. (2002). Origins of Spontaneous Mutations: Specificity and Directionality of Base-Substitution, Frameshift, and Sequence-Substitution Mutageneses. Ann. Rev. Genet. 36, 279-303.

Meselson, M. and Stahl, F. W. (1958). The replication of DNA in E. coli. Proc. Nat. Acad. Sci" '-USA 44, 671-682.

Millar, C. B., Guy, J., Sansom, O. J., Selfridge, J., MacDougall, E., Hendrich, B., Keightley, P. D., Bishop, S. M., Clarke, A. R., and Bird, A. (2002). Enhanced CpG mutability and tumorigenesis in MBD4-deficient mice. Science 297, 403-405.

Olby, R. (1974). The Path to the Double Helix. MacMillan, Lopdon.

Pamela Peters, from "Biotechnology: A Guide To Genetic Engineering." Wm. C. Brown Publishers, Inc., 1993.

Pellicer, A., Wigler, M., Axel, R., and Silverstein, S. (1978). The transfer and stable integration of the HSV thymidine kinase gene into mouse cells. Cell 14, 133-141.

Richard Dawkins (1990). The Selfish Gene, Oxford University Press.

Roth, J. R. (1974). Frameshift mutations. Ann. Rev.'Genet. 8, 319-346.

Watson, J. D., and Crick, F. H. C. (1953): A structure for DNA. Nature 171, 737-738.

Watson, J. D., and Crick, F. H. C. (1953): Genetic implications of the structure of DNA. Nature 171, 964-967.

Wilkins, M. F. H., Stokes, A. R., and Wilson, H. R. (1953): Molecular structure of DNA. Nature 171, 738-740.

Suggested Readings:

1. Lewin, B (2004): Gene VIII. Pearson Prentice Hall, Pearson Edu. Inc., NJ

Updated Internet version of the book is maintained at www.ergito.com.

2. Alvin S., Laura S., Virginia B. S. (2002): DNA. Twenty-First Century Books, A division of Lemer Publishing Group, USA

3. Kumar A. and Srivastava A.K. (2001): Advanced Topic in Molecular Biology. Horizon Scientific Press

4. Brown T. A. (1992): Genetics: A Molecular Approach. Chapman and Hall, London.

5. Hartl D.L. and Jones E.W. (2001): Genetics: Analysis of Gene and Genomes. Jones and Bartlett Publishers.

6. Strachan T. and Read A.P. (1996): Human Molecular Genetics. John Wiley & Sons Ltd. NY.