evolution of proteins and genomes select subset of slides

Post on 31-Dec-2015

59 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Evolution of Proteins and Genomes select subset of slides. Evolution of Proteins. Jason de Koning. Description. Focus on protein structure, sequence, and functional evolution Subjects structural comparison and prediction, biochemical adaptation, evolution of protein complexes, - PowerPoint PPT Presentation

TRANSCRIPT

Biochemistry and Molecular GeneticsComputational Bioscience Program

Consortium for Comparative GenomicsUniversity of Colorado School of Medicine

David.Pollock@uchsc.eduwww.EvolutionaryGenomics.com

Evolution of Proteins and Genomes

select subset of slides

Evolution of Proteins

Jason de Koning

DescriptionFocus on protein structure, sequence, and

functional evolutionSubjects

structural comparison and prediction, biochemical adaptation, evolution of protein complexes, probabilistic methods for detecting patterns of

sequence evolution, effects of population structure on protein evolution, lattice and other computational models of protein

evolution, protein folding and energetics, mutagenesis experiments, directed evolution,

coevolutionary interactions within and between proteins, and detection of adaptation, diversifying selection and

functional divergence.

Reconstruction of Ancestral Function

Comparative Sequence AnalysisLooking at sets of sequences

Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...

Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...

Conservedproline

Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...

Conservedproline Variable

“High entropy”

A common but wrong assumption: sequences are a random sample from the set of all possible sequences

In reality, proteins are related by evolutionary process

Comparative Sequence AnalysisLooking at sets of sequences

Selection

SelectivePressure

Stochastic Realizations

Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...

Stability

AB

C

Function

Folding

Fitness

Model

SelectivePressure

Data

Mouse: …TLSPGLKIVSNPL…Rat: …TLTPGLKLVSDTL…Baboon: …TVSPGLRIVSDGV…Chimp: …TISPGLVIVSENL...

Stability

AB

C

Function

Folding

Understanding

Mutations result in genetic variation

…UGUACAAAG…

Genetic changes

…UGUAUAAAG…

Substitution

…UGUUACAAAG…

Insertion

…UGUAAAAG…

Deletion

Substitutions Can Be:

Purines: A G

Pyrimidines: C T

Transitions

Transversions

UGU/AGA/AAG

Substitutions in coding regions can be:

UGU/CGA/AAG

Silent

UGU/UGA/AAG

Nonsense

UGU/GGA/AAG

Missense

Cys STOP LysCys Gly Lys

Cys Arg Lys

Cys Arg Lys

First position: 4% of all changes silentSecond position: no changes silentThird position: 70% of all changes silent (wobble position)

Uneven crossover leading to gene deletion and duplication

Homologous crossover

Gene conversion

Fate of a duplicated gene

Keep on doing whatever it originally was doing

Lose ability to do anything(become a pseudogene)

Learn to do something new (neofunctionalization)

Split old functions among new genes (subfunctionalization)

Homologies

Rat Hb

Mouse Hb

Mouse Hb

Rat Hb

OrthologsParalogs

Hemoglobin Hemoglobin

Geneduplication

Speciation

Probability of fixation =

10-02

-0.01 0 0.01 0.02

10-04

1

10-06

10-08

10-10

10-12

10-14

N = 10,000

N = 1000

N = 10

= 1/(2N) when |s| < 1/(2N)

= 2s (large, positive S, large N)

Selective advantage (s)

Fix

atio

n pr

obab

ility

1-e-2s

1-e-2Ns

N = 100

The Rate of Evolution Depends on Constraints

Human vs. Rodent Comparison

Highest substitution rates pseudogenes introns 3’ flanking (not transcribed to mature mRNA) 4-fold degenerate sites

Intermediate substitution rates 5’ flanking (contains promoter) 3’, 5’ untranslated (transcribed to mRNA) 2-fold degenerate sites

Lowest substitution rates Nondegenerate sites

Selection of Species for DNA comparisons

Both coding and

non-coding

sequences

~70-75%

~150 MYA

4.2

Opossum

0.42.53.0Size (Gbp)

~65%~80%>99%Sequence

conservation (in coding regions)

Primarily coding

sequences

Both coding and non-coding sequences

Recently changed

sequences and genomic

rearrangements

Aids identification of…

~450 MYA~ 65 MYA~5 MYATime since divergence

PufferfishMouseChimpanzeeHuman versus

20

UCSC Genome Browser

Comparative analysis of multi-species sequences from targeted genomic regions

2121

Nature, 2003Nature, 2003

Looking backward from the human genome How much is still there after 450my (Fugu)

22

Transposable ElementsGone Wild!

Using 12 species, 561 Multi-Species ConservedSequences (MCSs) were found

How can be found using just the Mouse genome (rather than all 12)

Identifying Functionally Important Regions

How many comparative genomes do we need?Can’t we just use the mouse?

False Pos.

False Neg.True Pos.

Interpreting Evolutionary Changes Requires a Model

e.g. 0.00005 / my 20 x 20 Substitution Matrix

…IGTLS…

…IGRLS...

In evolution:what is the rate R(T R) at

which Ts become Rs?

top related