evo 1 instructor guide s10 final - iris.nyit.eduiris.nyit.edu/~apetro01/unit-6 genetics/class-21...

10
Instructor Guide - Evolution Lecture 1 HUMAN GENOMES This is an instructor-only guide to seminar planning Week of April 12, 2010 Abstract While individual lives end, the descendents of one initial form of life—DNA—have persisted for about 4 billion years of chemical continuity and common history. Our species is inextricably embedded in this web of DNA-based relationships. Each of us is more or less closely related to every other living thing. Our genome encodes meaning—but how? This lecture features the basics and frontiers of how DNA is chemical, textual, improbable and historical. Lecture Key Points I. DNA: Life is Chemical The vast majority of life’s chemistry consists of the atoms Carbon (C), Hydrogen (H), Nitrogen (N), Oxygen (O), Phosphorus (P), and Sulfur (S). DNA is the molecular basis for life. It consists of two strands of sugars locked together by base pairs. The base pairing is quite specific: Cytosine (C) and Guanine (G) lock together to form a pair that has the same width as Adenine (A) and Thymine (T). This allows DNA to form into linear crystals (i.e. molecules whose physical structure repeats) while allowing arbitrary combinations of bases to be strung along a given strand. This property is at the heart of what allows DNA to convey genetic information. RNA is another essential molecule. It is usually a single strand of sugars attached to RNA bases, which are the same as the bases in DNA except that Thymine (T) is replaced by Uracil (U). Additionally, some RNAs encode proteins. Others regulate transcription and translation. Still others may be infectious (e.g., retroviruses). Proteins are constructed from strings of amino acids that fold up into particular shapes. These molecules are the principal actors in the cell. They are involved in everything from forming the scaffolding that keeps the cell in a stable shape to fostering the chemical reactions that underlie cellular metabolism to regulating the expression of genetic information. II. DNA and RNA: Life is Meaningful The meaning of DNA resides in what it encodes. This involves transforming information from the DNA molecule into messenger or mRNA molecules—a process known as transcription. A

Upload: others

Post on 15-Jul-2020

14 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

Instructor Guide - Evolution Lecture 1

HUMAN GENOMES This is an instructor-only guide to seminar planning

Week of April 12, 2010 Abstract While individual lives end, the descendents of one initial form of life—DNA—have persisted for about 4 billion years of chemical continuity and common history. Our species is inextricably embedded in this web of DNA-based relationships. Each of us is more or less closely related to every other living thing. Our genome encodes meaning—but how? This lecture features the basics and frontiers of how DNA is chemical, textual, improbable and historical. Lecture Key Points I. DNA: Life is Chemical The vast majority of life’s chemistry consists of the atoms Carbon (C), Hydrogen (H), Nitrogen (N), Oxygen (O), Phosphorus (P), and Sulfur (S). DNA is the molecular basis for life. It consists of two strands of sugars locked together by base pairs. The base pairing is quite specific: Cytosine (C) and Guanine (G) lock together to form a pair that has the same width as Adenine (A) and Thymine (T). This allows DNA to form into linear crystals (i.e. molecules whose physical structure repeats) while allowing arbitrary combinations of bases to be strung along a given strand. This property is at the heart of what allows DNA to convey genetic information. RNA is another essential molecule. It is usually a single strand of sugars attached to RNA bases, which are the same as the bases in DNA except that Thymine (T) is replaced by Uracil (U). Additionally, some RNAs encode proteins. Others regulate transcription and translation. Still others may be infectious (e.g., retroviruses). Proteins are constructed from strings of amino acids that fold up into particular shapes. These molecules are the principal actors in the cell. They are involved in everything from forming the scaffolding that keeps the cell in a stable shape to fostering the chemical reactions that underlie cellular metabolism to regulating the expression of genetic information. II. DNA and RNA: Life is Meaningful The meaning of DNA resides in what it encodes. This involves transforming information from the DNA molecule into messenger or mRNA molecules—a process known as transcription. A

Page 2: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

strand of RNA is constructed from one of the two DNA strands as a template. This is called messenger RNA (mRNA). The mRNA string is translated into a string of amino acids by a ribosome using adaptor molecules (tRNAs) that are unique for each codon. Triplets of RNA bases, called codons, translate to particular amino acids. Once the amino acid string is completed, it folds up and is termed a protein. The protein’s function is encoded by its shape; its shape is determined by its amino-acid sequence. For simpler organisms (prokaryotes), most genetic information that specifies biological form and phenotype (phenotype is any observable characteristic, or trait, of an organism) is expressed as proteins. This has also been presumed to be true for more complex organisms (eukaryotes). Thus, the focus of genetic research has been on DNA sequences encoding proteins (protein coding genes), and the vast amount of DNA sequences NOT encoding proteins has been regarded as simply “junk” DNA. However, recent research in genomics (the study of the genomes of organisms) has discovered that most of the mammalian genome is transcribed, including what has been known as “junk” DNA. The critical question then becomes what biological role do the transcribed RNA molecules play if they are not involved in protein synthesis? RNA molecules that do not encode proteins are known as non-coding RNAs (ncRNAs) and understanding their biological function has become an important and challenging frontier in molecular and evolutionary biology. A very small percentage of these non-coding RNAs are known and named, such as short interfering RNAs (siRNAs) and micro RNAs (miRNAs). These RNA molecules are involved in RNA interference, a process that regulates the expression of protein-coding genes. In some cases, RNA can be transcribed into DNA and sometimes inserted into the genome. In this way retroviruses (RNA viruses) can leave traces of infection in the form of DNA that is inherited. In some sense the meaning of a genome is the organism that it lays the blueprint for. However, any sense of ultimate meaning is a mirage since a segment of DNA that does not appear to be useful in the context of one environment may gain usefulness in others. The notion of pre-adaptation will be explored more thoroughly in Lecture 2 and Seminar 2. III. DNA’s Meaning is Improbable Given an alphabet with N letters, the number of different strings of length L that one could compose is NL. The exponential role of length implies that the number of combinations of letters grows very fast as the length of the string increases. The probability of picking a particular string of length L at random from the pool of all possible strings of length L is the inverse of the number of combinations: 1/ NL. This is an exponentially decreasing function as the length of the string increases. So even a string

Page 3: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

of modest size is highly improbable to construct at random. The meaning in DNA is encoded by strings of the bases A, T, G, C and is translated into RNA strings consisting of the bases A, U, G, C. A typical protein is a string of hundreds of amino acids—so usually thousands of bases long in the genome. The chances of such strings forming at random and being of use in constructing an organism are extraordinarily tiny. The imaginary space that contains all possible DNA combinations is called DNA design space. For strings long enough to encode even a small organism such as a bacterium, the design space is already enormous. IV. DNA is Historical: Semi-conservative replication The probability of sharing long meaningful sequences of DNA with other organisms (including individuals within a species) is exceedingly tiny. This suggests that organisms that share such stretches of DNA are related to one another. The structure of DNA suggests the method by which it can be inherited. DNA is copied via semi-conservative replication. In this process the two strands unzip and new strands are built to complement the old one. Thus, one DNA molecule is duplicated. Errors can occasionally creep into the duplication process. This is a source for novelty. The same aspects of the process that keep the DNA sequence stable makes sure that these mutations are conserved as well. Readings Required:

1. Check, Erika. 2007. Hitting the on switch. Nature 448: 855-858. This article reports on the challenge two scientists face in publishing their study on how micro RNAs (miRNA) not only dampens gene expression but also potentially activates gene expression in cells.

2. Chang, K. 2008. A guiding glow to track what was once invisible. The New York Times, October 14, 2008. This short NY Times article highlights the work on the green fluorescent protein (GFP). The scientists behind this work won a Nobel Prize for making roundworms glow.

Optional (Note: both articles are very challenging):

1. Coleman, J. R., D. Papamichail, S. Skiena, B. Futcher, E. Wimmer, S. Mueller. 2008. Virus attenuation by genome-scale changes in codon pair bias. Science 320: 1784-1787. The authors describe an experiment in which the poliovirus genome can be modified such that it can be used as a vaccine to build immunity to polio. This article serves as the base for the seminar activity.

2. Chalfie, M. Y. Tu, G. Euskirchen, W.W. Ward, D.C. Prasher. 1994. Green fluorescent protein as a marker for gene expression. Science 263: 802-805. This original paper describes the utility and methodology of the green fluorescent protein

Page 4: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

to monitor gene expression and protein distribution in cells. For instructors (highly recommended): Pearson, H. 2006. What is a gene? Nature 441: 398-401. This news feature article gives a great synopsis of what genes really are in an historical context. Homework 1. Transcription, Translation, and Mutation 2. Non-coding DNA and RNA Seminar Activity This week’s activity will focus on the study by Coleman et al. 2008 (see optional readings for full article). In the article, the authors explain how they generated a harmless version of polio that still produces the same proteins as original polio, and thus could potentially be used as a vaccine. The key idea is that a given amino acid can be coded for by multiple “synonymous” codons. The same is true for pairs of amino acids. If the different synonymous codon pairs were all equally effective at producing the amino acid pairs, then one would expect a certain statistical distribution of these codon pairs in the virus’ genetic code. The authors note that in fact, there is a deviation away from this distribution—certain codon pairs are over- or under-represented in the virus’ genetic code. This would indicate that natural selection has been at work: certain codon pairs provide advantages to the virus. The authors replace these with under-represented synonyms to create a non-virulent form of polio. Habits Sense of Scale (Ch 1) will be emphasized in lecture with respect to DNA design space. Probability (Ch 4) will be emphasized in this week’s activity. Proposed Seminar Outline

I. Review lecture materials with emphasis on key points. II. Seminar activity will take at least an hour. Specific instructions and suggestions on how

to run the activity are in the activity directions below. The activity has 3 parts; if instructors would like to shorten it, it can be easily modified.

Page 5: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

EVOLUTION 1: DNA is Textual and Meaningful

Activity guide for instructors only

CODON PAIR BIAS AND THE CREATION OF SAFER VACCINES This activity is based on the Coleman et al. 2008 paper on virus attenuation. Instructors might want to read through the paper itself (on CourseWorks/optional reading for students) for more background information. Since this is a 3-part question activity where each part builds on the next, we suggest having students work in groups, one question at a time before working on the next question. Specifically, we suggest passing out each question separately and reviewing each question before moving on to the next. As scientists unraveled the sequence of base pairs (the text) of the human genome, they noted that some synonymous codons are used more frequently than others, i.e., there is a codon bias in the human genome. The bias extends as expected to codon pairs, so that some codons are used in tandem more frequently than their synonyms to code for adjacent amino acids in a protein. No one knows why codon bias or codon pair bias exists, but some geneticists have found utility in it through the manufacture of a new vaccine for polio. In this activity we will examine their work in light of probability and experimental design. Question 1. Synonymous Codons and Codon Pairs A) Refer to Table 1. On average, how many synonymous codons are there that code for an

amino acid? 61 of the codons code for amino acids. That means that the average number of synonymous codons for an amino acid will be 61/20 = 3.1. If students use 64 instead of 61, that is okay too: 64/20 = 3.2. Students can count codons from the table or realize that there are 43 possibilities.

B) List two synonymous codon pairs and the amino acids they encode.

There are many possibilities, but one is CUU-GUU and CUC-GUC, both which code for leucine-valine.

C) On average, how many different synonymous codon pairs can code for the same pair of

amino acids? On average you can have 3.1 synonymous codons for each amino acid. So the average number of synonymous codon pairs is:

Page 6: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

3.1 * 3.1 = 9.6 (or 3.2 * 3.2 = 10.2) D) Given the result in C, what is the probability that an adjacent amino acid pair in a

protein is coded for by one particular pair of codons? Assume that each amino acid pair appears in roughly equal numbers. 1/9.6 = 0.1 (1/10.2 = 0.098)

Question 2. Using Synonymous Codons to Create a Polio Vaccine

A virus is an encapsulated sequence of DNA or RNA that causes disease by entering a host’s cells and replicating itself. A paper in 2008 in the journal, Science, “Virus Attenuation by Genome-Scale Changes in Codon Pair Bias” by Coleman, et al. describes an experiment in which the genome of the poliovirus was modified to try to develop a safer polio vaccine. There is no cure for polio, a viral disease that causes paralysis, but infection can be prevented through vaccination. The polio vaccine currently used is developed by rendering the virus innocuous through multiple point mutations. Though rare, the vaccine can mutate and revert to virulence, that is to say, vaccination can result in paralysis. In lecture, Prof Pollack stated, “DNA-based life is chemical and meaningful.” In their experiment, the scientists change the “chemistry” of the RNA in the virus by creating specific mutations in the virus’ genome to see if this changes the meaning of the virus, i.e., its harmfulness to a host.

The scientists manipulated codon pair bias in a region of the polio genome typically found in nature (i.e., the wild-type strain) to create two additional viral strains: one with codon pairs that were under-represented and another with codon pairs that were over-represented. They made use of the redundant genetic code to make only synonymous changes; that is to say, protein translation of the altered codon pairs will yield the same amino acid pair in all three viral strains. For example, let’s say that the majority of codons for valine-leucine are GUU-CUU and a small percent are GUC- CUC. In the strain with under-represented codon pairs, all codons for valine-leucine would be changed to GUC-CUC; in the strain with over-represented codons, the codon pairs would all become GUU-CUU. After growing the viral cultures and testing them, the scientists found that the strain with under-

Figure 1. Center for Disease Control's poster from 1963 to encourage immunization against polio.

Page 7: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

represented codon pairs were no longer harmful! But, how? The codon pair changes were synonymous. To investigate further, the researchers measured protein translation and replication in the three viral strains. The strain with under-represented codon pairs showed a decreased rate of protein translation compared with the wild type and the strain with over-represented codons. A) Speculate on what the scientists would need to know in order to genetically manipulate a virus to affect its virulence. The goal is to change the base pairs so that the product of gene expression no longer results in disease in the host. To create specific mutations, they first need to identify the regions that affect virulence (i.e., virulence factors) and then would need to know the sequence of base pairs in those regions that, when expressed, cause virulence (disease in the host) so that they could induce changes in this sequence. They would also need to know how to induce change in those regions so that very specific changes can be induced. In this experiment, the viral RNA was grown de novo- they did not alter the naturally existing poliovirus, but they were able to create a virus that they designed. Lastly, they would also need to know the outcome of their mutations- i.e., test whether or not the mutated virus is virulent (e.g., test in tissue cultures or live animals). B) What hypothesis did the scientists test? They tested whether changing codon pairs to synonymous under-represented or over-represented codon pairs changed the harmfulness of the virus. C) With respect to experimental design, why did the scientists create two viral strains? They needed a control. If they didn’t test both strains, then it could have been the genetic manipulation that changed virulence and not specific changes to the codon bias. D) Replacing the under-represented codon pairs in the wild-type strain with the over-represented codon pairs did not have any significant effect on protein translation. What would you have expected the effect to be? What might explain why this effect is not seen? Given that under-represented codon-pairs debilitate the virus, one would expect that adding over-represented codon-pairs would make it better at producing its proteins. It could be that the wild type has an optimal level for protein production. This suggests that the wild type ratio of synonymous codon pairs is a result of natural selection.

Page 8: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

E) Consider how a decreased rate of protein translation would affect viral replication rate. Sketch a graph, with appropriately labeled axes, to denote the expected replication of the three viral strains over time.

F) Discuss how the scientists’ findings on protein translation relate to the harmfulness of the virus. As proteins are responsible for structure and function, a decreased rate of protein translation (production) would result in limited viral replication. Viruses cause disease through replication in the hosts’ cells; if the viral has limited ability to replicate, it has limited ability to cause disease. Question 3: The Stability of the Polio Vaccine Let’s look at the probability that a vaccine mutates and returns to a harmful state. While the polio vaccine currently in use has hundreds of point mutations, only 5 of the point mutations are sufficient to render the virus harmless. Let’s suppose that the new vaccine made with changes to codon pair bias had 100 nucleotides (i.e., bases) of the genome changed, each with small, additive effects, i.e., all 100 point mutations need to mutate to their original base in order to obtain virulence. Assume that there is a 1% chance that each base will randomly mutate again (note: the true chance of mutation is usually < 1%!). Let us also assume that each base has a 1/3 chance of mutating back to its original base (e.g., A to G,C,T) so the probability of one specific base mutating back to its original state is 0.01*1/3 = 3.33x10-3. A) With 5 base mutations, what is the probability per replication event that the current polio vaccine genome will revert back to its original form? Assuming a total of approximately 20 replication events after vaccination, what is the total probability of a reversion to its original form? Probability/replication event: (3.33x10-3)5 = 4.12 x 10-13 Total probability: 4.12 x 10-13 x 20 replication events = 8.19 x 10-12

Page 9: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

B) What is the chance that the scientists’ new poliovirus vaccine will revert back to its original form? (3.33x10-3)100 = (1.76x1052) x (10-300) = 1.76 x 10-248 = ~10-248 C) Which version of the poliovirus appears to provide a better vaccine? Use the orders of magnitude of your answers in making your argument. Clearly, assuming that no other complications arise from the authors’ technique, the paper presents a better form of poliovirus. The chance of the conventional vaccine to revert back has an order of magnitude of one in a trillion. This is pretty small, but it is not impossible that when the vaccine is administered widely enough that some unfortunate person might end up being infected with a reversion to virulent polio. The authors’ technique produces a polio vaccine whose chance of reversion has an order of magnitude of 10-248. This is such an incredibly tiny number that it is practically impossible for anybody to be infected with a reversion to virulent polio. From the paper: “The correlation between the degree of codon pair deoptimization and the degree of viral atten- uation, as well as the lack of viral reversion upon passaging, are consistent with the idea that many of the 631 mutations in PV-Min cause small, additive defects. Thus, these attenuated viruses should be stable genetically, because an increase in virulence might require dozens or hundreds of reversions. This genetic stability is important; for example, the oral poliovirus vaccine (OPV) has 51 mutations, but only 5 have been shown to contribute to attenuation (15). Thus, OPV can (rarely) revert to neurovirulence in vaccine re-cipients, causing vaccine-associated paralytic poliomyelitis.” D) Do your calculations support the statement in lecture that “the design space for DNA sequences is too large for similarity by accident or choice”? Why or why not? Yes. The probability that a sequence will by chance change to a specific sequence is extremely low. Even if the sequence is only 5 bases long.

Page 10: Evo 1 Instructor Guide S10 FINAL - iris.nyit.eduiris.nyit.edu/~apetro01/Unit-6 Genetics/Class-21 & Class-22/video+ot… · 1. Transcription, Translation, and Mutation 2. Non-coding

Table 1. Correspondence between amino acids and codons. Methionine and Tryptophan are the only amino acids that uniquely correspond to one codon. The other amino acids can all be coded by several “synonymous” codons.

Amino acid Codons Amino acid Codons

Phenylalanine UUU, UUC Isoleucine AUU, AUC, AUA

Leucine UUA, UUG, CUU, CUC, CUA, CUG

Methionine (start codon)

AUG

Serine UCU, UCC, UCA, UCG, AGU, AGC Proline CCU, CCC, CCA,

CCG

Valine GUU, GUC, GUA, GUG Threonine ACU, ACC, ACA,

ACG

Alanine GCU, GCC, GCA, GCG Tyrosine UAU, UAC

Histidine CAU, CAC Glutamine CAA, CAG

Asparagine AAU, AAC Lysine AAA, AAG

Aspartic acid GAU, GAC Glutamic acid GAA, GAG

Cysteine UGU, UGC Tryptophan UGG

Arginine CGU, CGC, CGA, CGG, AGA, AGG

Glycine GGU, GGC, GGA, GGG

Stop codons UAA, UAG, UGA