why does dna use t instead of u? - harvard universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf ·...

29
Professor David Liu and Brian Tse, Life Sciences 1a page 32 O O P O O O N HO N N O H H O H H N H H H O O P O O O N HO N O O H deoxycytidine C deoxyuridine U Why Does DNA Use T Instead of U? Problem: deaminated C is identical to U (and pairs with A) deamination GC GU G DNA repair machinery removes all Us from DNA DNA repair GC DNA uses T instead of U to avoid mutations arising from damaged (deaminated) C, which is identical to U ~100 per day per cell Why does DNA use thymine instead of uracil? Why did DNA evolve to use T instead of U? Once again, this etiological question may never be definitively answered, but scientists have formulated reasonable hypotheses. Even though the four DNA bases are all quite stable, it turns out that cytosine is the least stable of bases under physiological conditions. Under physiological conditions, about 100 cytosines per day in your genome will undergo a spontaneous chemical reaction called a deamination reaction. The deamination of cytosine is particularly dangerous to life because the product of cytosine deamination is uracil, which— just like T— forms base pairs with A. When DNA containing a C:G base pair undergoes cytosine deamination followed by DNA replication, a U:A base pair can result, creating a possible mutation if the incorrect U:A pair is replicated before it is repaired. Because 100 such mutations introduced every day into your genome would eventually be lethal, cells use a variety of molecular machines to constantly scan your DNA for the presence of uracil. When these machines find a deaminated cytosine (that is, a uracil) in DNA, they remove the uracil, allowing the complementary strand’s guanine to guide the repair of the DNA back to a correct C:G base pair. This repair mechanism would not be possible if DNA used uracil instead of thymine, because it would be impossible for the repair machinery to distinguish a natural uracil in DNA from a uracil arising from cytosine deamination. Fortunately, this repair machinery can distinguish uracil from thymine by virtue of thymine’s methyl group. Therefore DNA’s use of thymine allows the cell to identify deaminated cytosine as a threat to the genome that must be removed. This system is a striking example of a crucial biological process that ultimately relies on the recognition of a very small chemical difference.

Upload: others

Post on 27-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 32

OOPOO

O

N

HO

N

N

O

H H

O

H H

N

H

HH

OOPOO

O

N

HO

N

O

O

H

deoxycytidineC

deoxyuridineU

Why Does DNA Use T Instead of U?

Problem: deaminated Cis identical to U (andpairs with A)

deaminationG–C G–U G

DNA repairmachinery

removes all Usfrom DNA DNA repair

G–C

• DNA uses T instead of U to avoid mutations arising fromdamaged (deaminated) C, which is identical to U

~100 perday per cell

Why does DNA use thymine instead of uracil?

Why did DNA evolve to use T instead of U? Once again, this etiological question may never bedefinitively answered, but scientists have formulated reasonable hypotheses. Even though the fourDNA bases are all quite stable, it turns out that cytosine is the least stable of bases under physiologicalconditions. Under physiological conditions, about 100 cytosines per day in your genome will undergo aspontaneous chemical reaction called a deamination reaction. The deamination of cytosine isparticularly dangerous to life because the product of cytosine deamination is uracil, which— just like T—forms base pairs with A. When DNA containing a C:G base pair undergoes cytosine deaminationfollowed by DNA replication, a U:A base pair can result, creating a possible mutation if the incorrect U:Apair is replicated before it is repaired.

Because 100 such mutations introduced every day into your genome would eventually be lethal,cells use a variety of molecular machines to constantly scan your DNA for the presence of uracil. Whenthese machines find a deaminated cytosine (that is, a uracil) in DNA, they remove the uracil, allowingthe complementary strand’s guanine to guide the repair of the DNA back to a correct C:G base pair.This repair mechanism would not be possible if DNA used uracil instead of thymine, because it would beimpossible for the repair machinery to distinguish a natural uracil in DNA from a uracil arising fromcytosine deamination. Fortunately, this repair machinery can distinguish uracil from thymine by virtueof thymine’s methyl group. Therefore DNA’s use of thymine allows the cell to identify deaminatedcytosine as a threat to the genome that must be removed. This system is a striking example of acrucial biological process that ultimately relies on the recognition of a very small chemical difference.

Page 2: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 33

Lectures 3-5: Nucleic acids & the chemical requirements for replicating information

1. The primary biological roles of nucleic acids

2. The molecular components of DNA and RNA

a. The primary structure of deoxyribonucleic acid

b. The phosphate group in DNA; equilibrium, acidity, and protonation states

c. The sugar group in DNA; strand orientation and macromolecular chirality

d. The bases of DNA

e. The primary structure of ribonucleic acid

f. Why does DNA use deoxyribose? Why T?

3. The factors behind DNA base pairing

a. DNA hybridization as an equilibrium

b. The role of hydrogen bonding

c. The role of the hydrophobic effect and base stacking

4. The molecular basis of DNA replication

a. DNA replication; chemical reactions, substrates, and products

b. The role of DNA polymerase: faster and more accurate DNA replication

c. The polymerase chain reaction (PCR) and its impact on the life sciences

Page 3: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 34

DNA Replication Relies on Base Pairing

A - TT - AG - CG - CT - AC - GA - TG - C

A - TT - AG - CG - CT - AC - GA - TG - C

A - TT - AG - CG - CT - AC - GA - TG - C

DNA replication

3. The factors behind DNA base pairing

DNA replication is made possible by the ability of one strand of DNA to form base pairs witha complementary strand. This process is also called DNA hybridization or DNA annealing. DNAbase pairing enables a strand of DNA to serve as a template for the creation of itscomplementary strand. The process of DNA replication is simply the successive linkage of DNAnucleotides (i.e., the polymerization) against a complementary template such that the correctbase-pairing partner is chosen out of the four possibilities for every base of the template.When this process is carried out using each strand of a molecule of double-stranded DNA as atemplate, the end result is that a DNA double helix has been replicated into two identicaldouble helixes. To understand how this process works requires an understanding of thefactors underlying DNA hybridization.

Page 4: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 35

+

• Under typical conditions, Keq >> 1 and there are many moredouble-stranded molecules than single-stranded molecules

• What causes DNA hybridization to be favorable?

A B C

Keq = [A][B][C]

DNA Hybridization Equilibrium

>> 1 (e.g., 10,000 M-1)

What causes two strands of DNA to hybridize? This question has been the subject of agreat deal of past and current research. The most important factors behind base pairing havebeen identified even though refinements of current models continually appear in the scientificliterature. Consider the equilibrium between unpaired and paired DNA strands in a test tube.When unpaired and paired DNA strands are in equilibrium, the number of unpaired strandsthat become paired strands during any time window is identical to the number of pairedstrands that become unpaired strands.

Why does this happen? Consider the interconversion between the two complementary DNAstrands 5'-GATGGTCA-3' and 5'-TGACCATC-3' and their double-stranded form. DNAhybridization is generally a very favorable process, and therefore under physiological conditionsand ambient temperature the value of Keq for the equilibrium written in the direction in whichthe paired DNA is to the right of the arrows is approximately 10,000 M-1. This means that atequilibrium under typical DNA concentrations in a cell, there are many more double-strandedforms of these DNA sequences than single-stranded forms.

If you mix together each of two complementary single DNA strands under physiologicalconditions, they will begin to pair together until the double-stranded forms outnumber thesingle strands by a ratio that satisfies the Keq for hybridization of those DNA strands. At thisequilibrium state there are so many more double-stranded DNA molecules that their raredissociation into single strands occurs about as frequently as the pairing together of the verysmall number of single-stranded molecule remaining to form a double-stranded molecule. Themore favorable the double-stranded form (i.e., the higher the value of Keq), the higher theratio of double-stranded to single-stranded DNA will be once this equilibrium is reached.

Page 5: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 36

Double-stranded:4 hydrogen bonds

+ +

Hydrogen Bonding: Matched Base Pairs

• The same # of H-bonds are possible on both sides, althoughscientists continue to debate how these H-bonds differ in strength

H

O H O

H

H

H

OHO

H

H

N

N

N

N

N

H

H

H

O

H

H

O H

N

N

N

N

N

H

H

N

N

CH3O

O

H

N

N

CH3O

O

H

H

O

H

OH

H

Single-stranded:4 hydrogen bonds

Within a double helix

Hydrogenbonding

alone maynot stronglyfavor either

side

DNA hybridization: the role of hydrogen bonding

Earlier we described the ways in which the DNA bases hydrogen bond with each other during basepairing. Hydrogen bonding is an important factor behind DNA hybridization, but the formation ofhydrogen bonds may not provide a strong energetic incentive for hybridization under conditions relevantto living systems. The reason is that living systems contain a great deal of water, and a water moleculeis an excellent hydrogen bond donor as well as an excellent hydrogen bond acceptor. As a result, thehydrogen bonds that form between A and T, or between C and G when DNA undergoes base pairing donot simply vanish in the single-stranded state; instead, they are replaced by hydrogen bonds that theDNA bases make with water in the unpaired state. We can draw an equilibrium, shown here, to describethis fact.

If we make the overly simplistic assumption that the hydrogen bonds between A and T are about asstrong as the hydrogen bonds between water and A, water and T, or water and water, then there are acomparable number of hydrogen bonds on each side of the equilibrium, and hydrogen bonding alonewould not significantly favor either the single-stranded or double-stranded states for the pairing of“matched” (A+T or C+G) bases. Although scientists acknowledge that these hydrogen bonds are not allequal in strength for reasons that will be presented later in this course, they continue to debate theextent to which hydrogen bonding helps to favor base pairing. An emerging consensus, however,suggests that hydrogen bonding by itself is not the major factor behind the strong tendency ofcomplementary DNA strands to pair.

This analysis predicts that hydrogen bonding should be quite favorable in solvents that are notcapable of forming strong hydrogen bonds (such as chloroform, CHCl3) because there is no hydrogenbonding alternative to base pairing. Indeed, in chloroform, adenine and thymine form two stronghydrogen bonds with each other, and their pairing in this solvent is largely driven by the formation ofthese hydrogen bonds.

Page 6: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 37

• Hydrogen bonding alone may not induce DNA hybridization,but the loss of hydrogen bonds disfavors mismatched pairing

N

N

N

O

H

H

N

N

N

N

N

H

H

+ +

Hydrogen Bonding: Mismatched Base Pairs

Hydrogenbonding

alone favorsthe unpaired

side formismatched

basesH

O H O

H

H

H

OHO

H

H

N

N

N

N

N

H

H

H

O

H

H

O H

N

N

N

O

H

O H

O

H

H

H

H

Double-stranded:2 hydrogen bonds

Single-stranded:4 hydrogen bonds

Within a double helix; noroom for water molecules

here

Keq = ~ 0.01

Even though hydrogen bonding in water may not provide a major energetic incentive toinduce base pairing, it is still a very important component of DNA hybridization. In double-stranded DNA, there is no room for water molecules to make hydrogen bonds with the parts ofthe DNA bases involved in base pairing. The loss of hydrogen bonding between two DNAbases that could otherwise each hydrogen bond with water is disfavored. To demonstrate thispoint, let’s consider the equilibrium between the mis-paired and unpaired forms of adenine andcytosine in single-stranded DNA in water.

In this example, adenine and cytosine can either hydrogen bond with water (as single-stranded DNA), or can adopt a double-stranded conformation resembling that of an A:T pairbut without the possibility of forming any Watson-Crick hydrogen bonds. Clearly forminghydrogen bonds with water is preferred to the loss of these hydrogen bonds, so A:C pairing isdisfavored in water. For most of the mismatched pairs of bases, Keq for mis-pairing is roughly0.01, meaning that incorrect pairing is disfavored 100 to 1. As we’ll see later in the course,living systems have evolved many layers of proofreading and error correction to enhance thismodest intrinsic base pairing preference by about 10 million-fold!

Our reasoning reveals that hydrogen bonding helps enforces the specificity of DNA basepairing (ensuring that A pairs with T and C pairs with G) because only correct pairing avoidsthe loss of hydrogen bonds that could otherwise be made to water. The molecular origins ofthe Watson-Crick pairing rules therefore lie in not wasting the opportunity to form hydrogenbonds. But if hydrogen bonding may not be the major driving force behind DNA hybridization,a very favorable process, what other factors cause double-stranded DNA to form?

Page 7: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 38

The Hydrophobic Effect

OH

H

O

H

H

O

H H

OH

H

O

H

H O

HH

OH

HO

H

H

O

HH

O

H

H

O

H

H

CH2

H2C

CH2

H2C

CH2

H2C

H3C

CH3O

H

HO

H

HOHH

OH

H

O

H

HO

H

H

O H

H

O

H

H

O

HHO

HH

O

H

H

H2C

CH2

H2C

CH2

H2C

CH2

CH3

H3C

Water forms an ordered “lattice”around hydrophobic (oily) surfaces

OH

H

OH H

O

H

H

OH

H

O

H

H

O

H HO

H

CH2H2C

CH2H2C

CH2H2C

H3C

CH3O

H

H

OHH

O

H

H

O H

H

O

H

H

O

HHO

H

H2C

CH2

H2C

CH2

H2C

CH2

CH3

H3C

H

H

• Ordered waters are disfavored because they are capable of lessmotion (they have less entropy)

• Fewer water molecules are ordered when hydrophobic groups aregathered together

OH

H

O

HH

O

H

H

O

H

H

OH

H

O

H

H

O

H

H

O

H

H

disorderedwater

+

orderedwater

DNA hybridization: the role of the hydrophobic effect and base stacking

A major energetic driving force behind the formation of double-stranded DNA is thehydrophobic effect. As you know, oil and water do not mix; oil is hydrophobic, or "water-fearing". The reason that oil and water tend to separate is actually related to the driving forcebehind DNA hybridization.

Molecules of oil are largely made up of hydrocarbon chains. As we learned earlier,hydrocarbons cannot participate in hydrogen bonding, and therefore oil cannot form hydrogenbonds with water. As a result, when a molecule of oil interacts with water, the watermolecules at the surface of the oil molecule must orient themselves in a specific way thatallows them to hydrogen bond with each other to avoid losing the opportunity to form as manyhydrogen bonds as possible. This ordering of water at the oil-water interface is unfavorablebecause it limits the water molecule's freedom to rotate or move. Trapping a molecule in oneof many possible orientations is disfavored under the Second Law of Thermodynamics, whichstates that the disorder, or entropy, of a system tends to increase, not decrease.

To minimize the number of entropically disfavored ordered water molecules, a mixture of oiland water naturally tends to separate in a manner that minimizes the surface area of oil-waterinterface. This surface area is at a minimum when all of the oil molecules are together, and allof the water molecules are together. The term hydrophobic is, chemically speaking, amisnomer. Oil and water separate not because oil "fears" water, but rather because watermust adopt an unfavorable ordered state when an oil molecule is nearby.

Page 8: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 39

N

N

N

N

NH H

HN

N

N

N

NH H

H

The Nucleic Acid Bases haveHydrophobic Surfaces

Top and bottomsurfaces arehydrophobic

Rotate 90o

• Water exposure of thesehydrophobic surfaces isminimized by stacking thebases

• In double-stranded DNA,the bases are largelystacked

N

N

N

N

NH H

H

N

N

N

N

NH H

H

Not accessibleto water

Bases stacked as indouble-stranded DNA

A single strand of DNA also contains hydrophobic groups. Although the phosphate andribose groups are capable of interacting strongly with water through hydrogen bonding andpolar interactions, the bases are fairly hydrophobic, especially at their faces. Each basescontain several atoms that are not capable of forming favorable interactions with water. Thesebases are surrounded by significant amounts of water in the case of single-stranded DNA, butare largely surrounded by other bases above, below, and to the side in double-stranded DNA.The bases therefore behave as an oil-like part of DNA, and prefer to cluster together as muchas possible. DNA hybridization minimizes the number of entropically disfavored watermolecules that surround the hydrophobic parts of the DNA bases. Researchers now believethat the hydrophobic effect is the major energetic factor behind the formation of double-stranded DNA in water.

Page 9: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 40

• DNA hybridization minimizeswater-exposed hydrophobicsurfaces

DNA Hybridization is Largely Drivenby the Hydrophobic Effect

No room for water molecules!

When discussing DNA hybridization, the role of the hydrophobic effect is often conflatedwith a separate factor called base stacking. This confusion probably arises because the factthat the bases stack on top of each other in double-stranded DNA is a major way that thehydrophobic faces of the bases minimize contact with water. However, there is also a specialkind of favorable interaction that arises when DNA bases lie on top of each other. Thisinteraction is actually electrostatic in origin and results from a partial negative charge near thecenters of the bases, and a partial positive charge around the edges of the bases. When theelectron-rich middle of a base overlaps with the electron-poor edge of an adjacent base, afavorable electrostatic interaction results which is more accurately referred to as a "basestacking" interaction.

Not all base pairs stack equally well. Scientists have experimentally measured andtheoretically calculated the ability of difference base pairs to stack, and conclude that in mostcases, G:C base pairs stack more favorably with other base pairs than do A:T base pairs.Indeed, this difference in base stacking ability can explain why double-stranded DNA with moreC and G bases pairs more favorably than double-stranded DNA rich in A and T bases.

Page 10: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 41

Lectures 3-5: Nucleic acids & the chemical requirements for replicating information

1. The primary biological roles of nucleic acids

2. The molecular components of DNA and RNA

a. The primary structure of deoxyribonucleic acid

b. The phosphate group in DNA; equilibrium, acidity, and protonation states

c. The sugar group in DNA; strand orientation and macromolecular chirality

d. The bases of DNA

e. The primary structure of ribonucleic acid

f. Why does DNA use deoxyribose? Why T?

3. The factors behind DNA base pairing

a. DNA hybridization as an equilibrium

b. The role of hydrogen bonding

c. The role of the hydrophobic effect and base stacking

4. The molecular basis of DNA replication

a. DNA replication; chemical reactions, substrates, and products

b. The role of DNA polymerase: faster and more accurate DNA replication

c. The polymerase chain reaction (PCR) and its impact on the life sciences

Page 11: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 42

How Does DNA Replicate?

Deoxynucleotidetriphosphates

Template

Primer

Extendedprimer

OO N

N

N

N

O

HO

N

H

H

HPOO

O

PO

POO O

O O

OO N

HO

NH3C

O

O

H

POO

O

PO

POO O

O O

OOPOO

O

N

N

N

N

N

HO

H H

PO

POO O

O O

OO N

HO

N

N

O

H H

POO

O

PO

POO O

O O

dATP dCTP

dTTPdGTP

Polymerization

4. The molecular basis of DNA replication

DNA replication; chemical reactions, substrates, and products

Now that we have an understanding of the factors behind DNA hybridization, we are readyto describe the way in which DNA replicates at the molecular level. DNA replicates through asingle chemical reaction repeated thousands of times in succession. This reaction, called DNApolymerization, is the addition of a single nucleotide to a growing strand of DNA to yield alonger strand of DNA. Chemical reactions are written with starting materials or substrates onthe left side of an arrow or above the arrow, and products on the right side of the arrow.The DNA polymerization reaction can be written as is shown above.

Page 12: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 43

A

T

G

G - ?

T - A

C - G

A - T

G - C

Base Pairing Determines SelectivityDuring DNA Polymerization

O

OPO

O

O

NOH

POPO

O

O

O

O

N

O

O CH3

H

O

OPO

O

O

N

N

N

N

O

OH

POPO

O

O

O

O

H

NH

HO

OPO

O

O

N

N

N

N

N

OH

H

H

POPO

O

O

O

O

O

OPO

O

O

NOH

POPO

O

O

O

O

N

O

NH

H

template primer

dTTP

dATP dGTP

dCTP

pyrophosphateOPOPO

O

O

O

O

Extended primer withnew 3’ OH group

• The nucleotidecapable of forminga base pair with thetemplate is addedto the primer

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

OH

H

H

O

OPO

O

O

OH

POPO

O

O

O

O

Base

:

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

O

H

HO

O P

OO

NOH

N

O

NH

H

This reaction has several notable features. It requires two different molecules assubstrates: a DNA primer that is being extended into the product strand, and a DNA nucleotidein a special activated form called a triphosphate. The products of this reaction also consist oftwo molecules: the DNA primer extended by one nucleotide unit, and pyrophosphate (twophosphate groups bonded together). In addition to the starting materials and products thatundergo covalent bond formation or bond breakage during the course of the reaction, a thirdmolecule is also required for the reaction to take place in a manner that is useful to the cell: along DNA template. The DNA primer is complementary to a portion of the DNA template andwhen combined these two molecules undergo DNA hybridization to form a template-primercomplex. Although this DNA template is not altered during the course of DNA polymerization,it serves the crucial purpose of telling the cell which nucleotide must be added to the growingDNA strand at every position. The heart of the DNA polymerization reaction is the formation ofa covalent bond between the 3' oxygen atom of the DNA primer and the phosphate group ofthe nucleotide triphosphate located closest to the ribose ring.

The nucleotide triphosphate incorporated at the end of the growing DNA primer must bechosen with extremely high accuracy to preserve the fidelity of an organism's genome. Ateach nucleotide position along the DNA template, one of the four DNA nucleotide triphosphatesin the cell (abbreviated dATP, dCTP, dGTP, and dTTP) is chosen based on its ability to form aWatson-Crick pair with the corresponding base on the template strand. Once the 3’ end of theprimer is successfully bonded with the new nucleotide, the 3’ oxygen atom of the newnucleotide forms a covalent bond with the next carefully selected nucleotide triphosphate, andthe polymerization reaction continues. Because the newly synthesized DNA strand grows onlyat its 3’ end, the directionality of DNA polymerization in nature is designated 5’-to-3’.

Page 13: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 44

DNA Polymerization, South Park-Style

A T G GT - AC - GA - TG - C

----3’

5’

5’® Comedy Central (don’t sue me, please)

Page 14: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 45

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

OH

H

H

DNA Polymerization in Action I

A T G GT - AC - GA - TG - C

template primer

5’

3’ 5’

----3’

5’

5’

dATPdCTP

dGTPdTTP

3’

Page 15: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 46

OO

PO

O

O

NOH

POPO

O

O

O

O

N

O

NH

H

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

OH

H

H

DNA Polymerization in Action II

A T G GT - AC - GA - TG - C

template primer

5’

3’ 5’

----3’

5’

5’

(dCTP)

3’

- CTP

dATP

dCTPdGTP

dTTP

Page 16: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 47

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

O

H

HO

O

P

O

O

N OH

N

O

NH

H

DNA Polymerization in Action III

A T G GT - AC - GA - TG - C

template primer

5’

3’ 5’

----3’

5’

5’

new 3’ OH groupof growing strand

- C

OP

OPO

O

O

O

O

pyrophosphatedATP

dCTPdGTP

dTTP

Page 17: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 48

O

O

P

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

O

H

HO

O

P

O

O

N OH

N

O

NH

H

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

DNA Polymerization in Action IV

A T G GT - AC - GA - TG - C

template primer

5’

3’ 5’

----3’

5’

5’

- C

dATPdCTP

dGTP

dTTP

Page 18: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 49

OO

PO

O

O

NOH

POPO

O

O

O

O

N

O

NH

H

O

O

P

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

O

H

HO

O

P

O

O

N OH

N

O

NH

H

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

DNA Polymerization in Action V

A T G GT - AC - GA - TG - C

template primer

5’

3’ 5’

----3’

5’

5’

- C

- CTP

dATPdCTP

dGTPdTTP

(dCTP)

Page 19: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 50

DNA Polymerization in Action VI

A T G GT - AC - GA - TG - C

template primer

5’

3’ 5’

----3’

5’

5’

- C

- C

dATPdCTP

dGTPdTTP

O

O

P

O

O

N

N

N

N

O

O N

H

H

H

O

O

P

O

O

N

O

N

O

O

H

H3C

O

O

P O

O

O

N

N

N

N

N

O

H

H

OO

P

O

O

N

ON

O

NH

H

O

O

PO

O

O

N

N

N

N

O

O N

H

H

H

O

O

PO

O

NOH

N

O

NH

H

Note that thenew strand isgrowing in the5’ to 3’ direction

Page 20: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 51

DNA Polymerase Accelerates Replication

OPOPO

O

O

O

O

OPOPO

O

O

O

O

A

T

G

G

T - A

C - G

A - T

G - C

A

T

G

G

T - A

C - G

A - T

G - C

A

T

G

G - C

T - A

C - G

A - T

G - C

A

T

G

G - C

T - A

C - G

A - T

G - C

dCTP +

dCTP +Fast

(~50 bases addedper second)

Very slow(no observable

reaction)

DNA polymerase

The role of DNA polymerase: faster and more accurate DNA replication

If you combine a DNA template, a DNA primer, and four DNA nucleotide triphosphates in atest tube under physiological conditions, the DNA primer will hybridize to the DNA template,but nothing else will happen on any reasonable time scale. For DNA replication to take placeefficiently requires the action of a protein called DNA polymerase. DNA polymerase is anenzyme, a macromolecule that catalyzes (accelerates) a chemical reaction. There are tens ofthousands of enzymes in your body that are necessary to catalyze all of the chemical reactionsthat must occur for your survival. The vast majority of enzymes are proteins althoughscientists discovered more recently (in the past 20 years) that RNA can also catalyze chemicalreactions. We will learn much more about enzymes and how they accelerate chemicalreactions in later lectures on the proteins involved in HIV and cancer. For now, simplyappreciate that DNA polymerase is responsible for accelerating the rate of DNA polymerizationin humans to a rate of about 50 bases per second.

While this speed is impressive, it would still take almost two years to copy your entire3,000,000,000-base genome at this rate. In contrast, your growing cells can divide in about24 hours, and each cell division requires replicating your entire genome. Therefore DNAreplication occurs at many locations on each chromosome simultaneously, and the pieces ofcopied DNA are then connected together (“ligated”) to yield a complete copy of the genome.In the case of humans, genome replication is complete within a few hours.

Page 21: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 52

DNA Polymerization in Cells, CG-Style

A T G GT - AC - GA - TG - C

----3’

5’

5’

• DNA polymerization in cells requires DNApolymerase plus several other proteins tounwind double-stranded DNA and to performseveral other essential tasks)

Animation by Drew Berry, used with permission

Page 22: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 53

DNA Replication is Extremely Accurate

Selectivity based onhydrogen bonding alone:

With all cellular machinery:

With DNA polymerase:

Method Error Rate

~1 in 100

~1 in 100,000,000

~1 in 10,000,000,000

In addition to accelerating the rate of DNA polymerization, DNA polymerase also plays acrucial role in improving the fidelity of DNA replication. As we learned earlier in this lecture,the hydrogen bonds that are formed (or not formed) upon juxtaposing two bases in a doublehelix create an intrinsic selectivity of roughly 100 to 1 favoring matched over mismatched basepairs. Without any additional source of fidelity, hydrogen bonding alone would therefore resultin an error rate of ~1% during DNA replication, creating a base substitution (also called amutation) once every ~100 bases. Such an error rate is far too high to sustain life for reasonsthat will be come apparent during the next several lectures.

DNA polymerase improves this selectivity tremendously (to an error rate of roughly 1 in1,000,000 bases) by accelerating the incorporation of the correct nucleotide to a much greaterextent than incorporation of an incorrect nucleotide. Many DNA polymerases also add an abilityto correct those errors that are made on rare occasion by hydrolyzing the incorrectlyincorporated nucleotide and trying again, an activity called proofreading. As a result, DNApolymerase copies the incorrect base only once every ~100,000,000 bases polymerized. Inaddition, cells generate a small army of other proteins that constantly survey the genome andcorrect errors.

When all of these ways of avoiding mutations are combined, it is estimated that your cellsintroduce mutations into your genome with a frequency of roughly 1 in 10,000,000,000 basescopied. The net result is that on the average, less than one mutation is introduced to yourgenome every time a cell divides. In Life Sciences 1b you will learn the role of the mutationrates of organisms in their evolution.

Page 23: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 54

The Polymerase Chain Reaction (PCR)

PrimersPolymerase

OO N

N

N

N

O

HO

N

H

H

HPOO

O

PO

POO O

O O

OO N

HO

NH3C

O

O

H

POO

O

PO

POO O

O O

OOPOO

O

N

N

N

N

N

HO

H H

PO

POO O

O O

OO N

HO

N

N

O

H H

POO

O

PO

POO O

O O

Four dNTPs

The polymerase chain reaction (PCR)

Although DNA replication in cells is both very efficient and highly accurate, it is not usefulfor many scientific applications because researchers have little control over which of the DNAsequences within a cell are copied. Most of the applications of DNA in the laboratory, includingthe recently completed Human Genome Project, require a much more controllable method forrapidly replicating small pieces of DNA (typically less than several thousand base pairs inlength) into useable quantities. The standard laboratory method for achieving the rapidreplication of a precisely specified segment of DNA is the polymerase chain reaction, or PCR.PCR has the remarkable feature of amplifying a desired DNA sequence in an exponentialfashion, meaning that each cycle of PCR in principle doubles the number of DNA moleculespresent in a test tube. As a result, minute quantities of DNA (as little as a single molecule) canbe rapidly amplified into billions of molecules in just a few hours using relatively simpleequipment. This method has proven to be extremely important for nearly all scientificapplications involving DNA. PCR was developed by Kary Mullis, who earned the 1993 NobelPrize in Chemistry for his achievement. We will end this lecture with a description of how PCRworks at the molecular level.

Page 24: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 55

Melt template(heat)

Hybridizeprimers(cool)

Extendprimers

Repeat

5’3’

3’5’5’3’

3’5’

5’ 3’

3’ 5’

The PCR Cycle

You are already familiar with the starting materials that are required for PCR. A PCRreaction contains one or more DNA template molecules, many copies of each of two shorterDNA primers, dATP, dCTP, dGTP, dTTP, and a DNA polymerase. The DNA primers are typicallyshort enough that they can be manufactured on a machine called a DNA synthesizer. Thesequences of the two DNA primers are carefully chosen to hybridize to each of the templatestrands at locations that surround the desired region to be amplified, and with the 3’ ends ofthe primers pointing at each other.

At the start of the PCR reaction, the test tube containing all of these above ingredients isheated to 95 °C, nearly the temperature at which water boils. At such a high temperature, thetwo strands of the DNA template melt into single strands. The test tube is then cooled downconsiderably (typically to ~60 °C, the temperature of uncomfortably hot, but not scalding,water). At this intermediate temperature, the DNA primers hybridize to the dissociatedtemplate strands, forming two template-primer complexes.

So far no DNA polymerization has taken place. But once the template-primer complexesform, DNA polymerase begins to extend each primer along its template, growing a strandcomplementary to the template in the 5’-to-3’ direction and consuming the nucleotidetriphosphates in the test tube. When both of the polymerization reactions are complete, theend result is that the desired region of the template molecule has been copied, such that twocopies of this DNA now exist for every one copy that existed at the start of the PCR process.

Page 25: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 56

PCR Exponentially Amplifies DNA

One DNA template molecule after 25 PCR rounds gives225 = 33,554,432 molecules!

Once DNA polymerization is complete, the test tube is again heated to 95 °C so that thedouble-stranded products from the first polymerization melt into single strands. Now theseamplified single strands can serve as the templates for the next cycle of PCR. The test tube isonce again cooled to ~60 °C, a new set of primers anneal to the templates, and DNApolymerization takes place once again. The net result of this process is that each cycle(template melting, primer hybridization, and DNA polymerization) of PCR can double thenumber of desired double-stranded DNA molecules in the tube. After 25 PCR cycles, in theorya single template molecule can be copied more than 30,000,000 times!

Page 26: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 57

Thermus aquaticus (Taq) DNA Polymerase

• Taq DNA polymerase: operates at 74 °C (extensiontemperature), tolerant of 95 °C (melting temperature)

• Thermostable polymerases (and the diversity of life onearth) made PCR practical

One difficulty with the original form of PCR is that most DNA polymerase enzymes aredestroyed at high temperatures, and therefore had to be added during each PCR cycle. Aclever solution to this problem was to make use of DNA polymerase enzymes taken from thebacteria which naturally grow at very high temperatures in deep sea vents. In general, theenzymes within these thermophilic (“heat-loving”) bacteria have evolved to operate at veryhigh temperatures. Thermophilic DNA polymerases are stable enough to high temperaturesthat the addition of a small amount of them at the start of a PCR reaction can last through 25or more cycles of PCR. As a result, PCR has truly become a “set up and forget” process inwhich a researcher simply drops a small amount of desired DNA and primers into a PCRcocktail, presses a button on a computer-controlled temperature heating and cooling machine,and returns an hour or two later to pick up millions of copies of the desired DNA. In thissense, PCR resembles science fiction (the Star Trek replicator comes to mind)— all madepossible by an understanding of DNA hybridization and the molecular basis of DNA replication.

Page 27: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 58

PCR Applications: Molecular BiologyA gene of interest can be amplifiedand inserted into a model organism

PCR using primers thatmatch the ends of the gene

Introduce geneinto bacteria

Bacteria use geneto make protein

Isolate protein,study its function

PCR with mutation-containing primer

Reassemblemutant gene

Make mutant protein; compareproperties to wild-type protein

mutantprotein

wild-typeprotein

A specific mutation in a protein-encoding gene can be introduced

genomic DNA

Make mutant protein

vs.

PCR has revolutionized the way that scientists study and manipulate genes and theproteins they encode. Over the next two slides we’ll summarize some of the most importantways in which PCR has advanced the life sciences. First, PCR enables molecular biologists toisolate and amplify genes from any organism’s genome and insert these cloned genes into theDNA of a model organism such as E. coli. As a result, genes that would be difficult orimpossible to study in their native contexts (such as human genes) can be examined muchmore easily and efficiently.

As a second example of a very powerful molecular biology manipulation enabled by PCR,scientists routinely introduce carefully designed mutations into DNA sequences in order toeffect the corresponding changes in the proteins encoded by those genes. Examining how thefunctions of the resulting mutant proteins differ from those of the non-mutated (“wild-type”)proteins has provided key insights into how many proteins work (including HIV protease, whichwe will discuss in detail later in this course). PCR makes this process of mutagenesis mucheasier than before by enabling scientists to add primer sequences containing a desiredmutation into a PCR reaction with the wild-type gene as the DNA template. The resulting PCRproducts have incorporated the mutation into a segment of the gene of interest. The mutatedgene segment is then reassembled into a whole gene containing the desired mutation.

Page 28: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 59

PCR Applications: HIV Detection• Because PCR can amplify extremely small quantities of DNA,

it can be used as a sensitive method of pathogen detection

Blood sample

Isolate HIV genome(RNA)

Convert to DNAPCR No PCR product

indicates nodetectable HIV levels

PCRamplificationindicatespresence of HIV

• Sensitivity limit = ~10-100 copies of HIV genome

PCR has also profoundly impacted medical diagnosis. Because PCR can amplify DNAmillions of times in a single experiment that lasts 1-3 hours, and because all pathogens have anassociated DNA or RNA genome, PCR can be used to detect the presence of a pathogen in ahuman patient’s blood. Indeed, HIV detection (especially in infants born from HIV-positivemothers) has relied heavily on PCR-based methods in which HIV RNA in human blood isconverted to DNA (using reverse transcriptase, an enzyme that will be discussed later in thiscourse) which then serves as a template for a PCR reaction. This PCR reaction contains primersequences that only match the sequence of the HIV genome. Therefore, PCR products canform only if HIV RNA is present in the original blood sample. Scientists can even measure thenumber of PCR cycles required to generate a given amount of PCR product to estimate thenumber of copies of HIV RNA originally present in the blood, a method called quantitative PCR.

Although in principle PCR can generate a product starting from a single copy of HIV RNA,in practice the sensitivity limit of this approach is typically ~10 to 100 copies of the HIVgenome in a small blood sample. While patients who harbor even smaller numbers of HIVvirions in their blood are still infected with HIV and can still eventually develop AIDS, this PCR-based detection method has proven very valuable in evaluating the possible progression of HIVreplication in patients and the effect of anti-HIV therapies such as those discussed later in thiscourse.

This week in lab you will have the wonderful opportunity to apply what you’ve learned inthis lecture to perform a PCR-based DNA analysis experiment of your own.

Page 29: Why Does DNA Use T Instead of U? - Harvard Universitysites.fas.harvard.edu/~lsci1a/10-3notes.pdf · 2009. 6. 15. · Why Does DNA Use T Instead of U? Problem: deaminated C is identical

Professor David Liu and Brian Tse, Life Sciences 1a page 60

Key Points: How DNA Meets theRequirements for the Blueprint of Life

• Resist degradation: negatively charged phosphates; no 2’ OH(equilibrium, acidity, Ka, pKa, pH, Henderson-Hasselbalch)

• Be recognized by cellular machinery: phosphate groups andbases (ionic bonds and hydrogen bonds)

• Contain multiple possible structures (bits) at each position: fourpossible bases per nucleotide

• Possess redundancy for error correction and replication: basepairing (hydrogen bonding, hydrophobic effect)

• Knowledge of DNA replication lead to PCR, a key invention• Understanding the chemistry of DNA is crucial to the life sciences

Despite the central role that DNA and RNA play both in nature and in science, JamesWatson (of Watson and Crick) noted at a recent Harvard lecture that “DNA is the script, butproteins are the actors.” In the next several lectures we will understand what Watson meantby describing the molecular basis of how the information in your DNA is used to program thesynthesis of the proteins in your body— a fundamental concept known as the Central Dogma.