distances

53
Distances Distances

Upload: lacy-jennings

Post on 30-Dec-2015

23 views

Category:

Documents


0 download

DESCRIPTION

Distances. A natural or ideal measure of distance between two sequences should have an evolutionary meaning. One such measure may be the number of nucleotide substitutions that have accumulated in the two sequences since they have diverged from each other. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distances

DistancesDistances

Page 2: Distances

A natural or ideal measure of distance between two sequences should have an evolutionary meaning.

One such measure may be the number of nucleotide substitutions that have accumulated in the two sequences since they have diverged from each other.

Page 3: Distances
Page 4: Distances
Page 5: Distances

To derive a measure of distance, we need to make several simplifying assumptions regarding the probability of substitution of a nucleotide by another.

Page 6: Distances
Page 7: Distances

Jukes & Cantor Jukes & Cantor one-parameter one-parameter

modelmodel

Page 8: Distances

Assumption:Assumption:• Substitutions occur with equal probabilities Substitutions occur with equal probabilities

among the four nucleotide types.among the four nucleotide types.

Page 9: Distances

Kimura’s two-parameter

model

Page 10: Distances

Assumptions:

• The rate of transitional substitution at each nucleotide site is per unit time.

• The rate of each type of transversional substitution is per unit time.

Page 11: Distances
Page 12: Distances

NUMBER OF NUCLEOTIDE NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN SUBSTITUTIONS BETWEEN

TWO DNA SEQUENCESTWO DNA SEQUENCES

Page 13: Distances

After two nucleotide sequences diverge from each other, each of them will start accumulating nucleotide substitutions.

If two sequences of length N differ from each other at n sites, then the proportion of differences, n/N, is referred to as the degree of divergence or Hamming distance.

Degrees of divergence are usually expressed as percentages (n/N 100%).

Page 14: Distances
Page 15: Distances

The observed number of differences is likely to be smaller than the actual number of substitutions due to multiple hits at the same site.

Page 16: Distances

13 mutations=

3 differences

Page 17: Distances
Page 18: Distances

Number of substitutions between

two noncoding sequences

Page 19: Distances

The one-parameter model

In this model, it is sufficient to consider only I(t), which is the probability that the nucleotide at a given site at time t is the same in both sequences.

Page 20: Distances

where p is the observed proportion of different nucleotides between the two sequences.

Page 21: Distances

V (K) p p2

L 14

3p

2

L = number of sites compared in the ungapped alignment between the two sequences.

Page 22: Distances

The two-parameter model

Page 23: Distances

The differences between two sequences are classified into transitions and transversions.

P = proportion of transitional differences

Q = proportion of transversional

differences

ATCGGACCCG

Q = 0.2P = 0.2

Page 24: Distances
Page 25: Distances

V(K) 1

LP

1

1 2P Q

2

Q1

2 4P 2Q

1

2 4Q

2

P

1 2P Q

Q

2 4P 2Q

Q

2 4Q

2

Page 26: Distances
Page 27: Distances

Numerical example (2P-model)

Page 28: Distances

-Substitution schemes with more than two parameters.

- Parameter-free substitution schemes.

Page 29: Distances

Number of substitutions between

two protein-coding genes

Page 30: Distances
Page 31: Distances

Number of synonymous substitutions

Number of synonymous sites

Number of nonsynonymous substitutions

Number of nonsynonymous sites

Page 32: Distances

1. The classification of a site changes with time: For example, the third position of CGG (Arg) is synonymous. However, if the first position changes to T, then the third position of the resulting codon, TGG (Trp), becomes nonsynonymous.

Difficulties with denominator:

Page 33: Distances

T Trp

Nonsynonymous

Page 34: Distances

2. Many sites are neither completely synonymous nor completely nonsynonymous. For example, a transition in the third position of GAT (Asp) will be synonymous, while a transversion to either GAG or GAA will alter the amino acid.

Difficulties with denominator:

Page 35: Distances
Page 36: Distances

Difficulties with nominator:

1. The classification of the change depends on the order in which the substitutions had occurred.

Page 37: Distances
Page 38: Distances

Difficulties with nominator:

2. Transitions occur with different frequencies than transversions.

3. The type of substitution depends on the mutation. Transitions result more frequently in synonymous substitutions than transversions.

Page 39: Distances
Page 40: Distances

Miyata & Yasunaga (1980)and

Nei & Gojobori (1986)method

Page 41: Distances

U C A GUUU UCU UAU UGU UUUC

PheUCC UAC

TyrUGC

CysC

UUA UCA UAA Stop UGA Stop AU

UUGLeu

UCG

Ser

UAG Stop UGG Trp GCUU CCU CAU CGU UCUC CCC CAC

HisCGC C

CUA CCA CAA CGA AC

CUG

Leu

CCG

Pro

CAGGln

CGG

Arg

GAUU ACU AAU AGU UAUC ACC AAC

AsnAGC

SerC

AUAIle

ACA AAA AGA AA

AUG Met ACG

Thr

AAGLys

AGGArg

GGUU GCU GAU GGU UGUC GCC GAC

AspGGC C

GUA GCA GAA GGA AG

GUG

Val

GCG

Ala

GAGGlu

GGG

Gly

G

Step 1: Classify Nucleotides into non-degenerate, twofold and fourfold degenerate sites

L0

L2

L4

Page 42: Distances

KS L

2A

2 L

4A

4L

2 L

4 B4

V(KS ) L2

2V(A2 ) L42V(A4 )

(L2 L

4)2

V(B4 ) 2b

4Q

4a

4P4

c4

(1 Q4

) L

2 L

4

Page 43: Distances

KA A0 L

0B

0 L

2B

2L

0 L

2

V(KA ) V(A0 ) L0

2V(B0 ) L22V(B2 )

(L0 L

2)2

2b

0Q

0a

0P0 c

0(1 Q

0)

L0

L2

Page 44: Distances
Page 45: Distances

Number of Amino-Acid Replacements between Two Proteins

• The observed proportion of different amino acids between the two sequences (p) is

p = n /L

• n = number of amino acid differences between the two sequences

• L = length of the aligned sequences.

Page 46: Distances

Number of Amino-Acid Replacements between Two Proteins

The Poisson model is used to convert p into the number of amino replacements between two sequences (d ):

d = - ln(1 – p)

The variance of d is estimated as

V(d) = p/L (1 – p)

Page 47: Distances

How do you detect adaptive evolution at the genetic level?

Page 48: Distances

48

Theoretical ExpectationsTheoretical Expectations

Deleterious mutations

Advantageous mutations

Neutral mutations

Overdominant mutations

Page 49: Distances

49

Page 50: Distances

50

Page 51: Distances

51

Page 52: Distances

52

Page 53: Distances

53