lecture 1, 31/10/2001 - weizmann institute of science · • the needleman-wunsch algorithm for...

Post on 02-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Lecture 1, 31/10/2001:

• Introduction to sequence alignment

• The Needleman-Wunsch algorithm for global sequence alignment: description and properties

2

Computational sequence-analysis

The major goal of computational sequence analysis is to predict the function and structure of genes and proteins from their sequence.

This is made possible sinceorganisms evolve by mutation, duplication and selection oftheir genes.

Thus, sequence similarity often indicates functional andstructural similarity.

3

5’ 3’ 5’ 3’

Sequence alignment

ATCAGAGTC TTCAGTC

ATC ≠ CTA

AG ≠ GA

etc.

4

Sequence alignment

We wish to identify what regions are most similar to eachother in the two sequences . Sequences are shifted one by theother and gaps introduced, to cover all possible alignments.The shifts and gaps provide the steps by which one sequencecan be converted into the other.

ATCAGAGTC TTCA--GTC +++^^+++

5

A T C A G A G T C

T

T

C

A

G

T

C

A T C A G A G T C

T • •

T • •

C • •

A • • •

G • •

T • •

C • •

Sequence alignmentdot-plot

ATTCATCA

GA--GTCGTC

6

ATCAGAGTCTTCA--GTC

Sequence alignmentscoring

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Substitution matrix - the similarity value between each pair of residues

Gap penalty - the cost of introducing gaps Gap penalty -2

A C G TACGT

: 0+2+2+2-2-2+2+2+2 = 8•+++^^+++

7

A T C A G A G T C

T 0 2 0 0 0 0 0 2 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

A 2 0 0 2 0 2 0 0 0

G 0 0 0 0 2 0 2 0 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

[T2T1] ATC -TT

[C3T1] ATC- --TT

[T2T2] ATC TT-

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Initialization

Position 3,2 :

[ab]

[a-]

[-b]

8

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 2 0 0 0 0 0 2 0

T - 4 0 2 0 0 0 0 0 2 0

C - 6 0 0 2 0 0 0 0 0 2

A - 8 2 0 0 2 0 2 0 0 0

G - 1 0 0 0 0 0 2 0 2 0 0

T - 1 2 0 2 0 0 0 0 0 2 0

C - 1 4 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

[ab]

[a-]

[-b]

Directionality of score calculation

Initialization

9

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 2 0 0 0 0 0 2 0

T - 4 0 2 0 0 0 0 0 2 0

C - 6 0 0 2 0 0 0 0 0 2

A - 8 2 0 0 2 0 2 0 0 0

G - 1 0 0 0 0 0 2 0 2 0 0

T - 1 2 0 2 0 0 0 0 0 2 0

C - 1 4 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

10

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

11

σ[ab] : score of aligning a pair of residues a and b

σ[a-] : score of aligning residue a with a gap (gap penalty: -q)

S : score matrix

S(i,j) : optimal score of aligning residues positions 1 to i on one sequence with residues positions 1 to j on another sequence

Sequence alignmentNeedleman-Wunsch algorithm

12

Sequence alignmentNeedleman-Wunsch algorithm

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(0,j) ⇐ S(0,j-1) + σ[-bj]

for i ⇐ 1 to M do

{ S(i,0) ⇐ S(i-1,0) + σ[ai-]

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

S(i-1, j) + σ[ai- ],

S(i, j-1) + σ[-bj ])

} Pearson & MillerMeth Enz 210:575, ‘92

13

Sequence alignmentNeedleman-Wunsch global alignment

Optimal score/s is found - more steps needed to find thecorresponding alignment/s.This is a time-saving property in database searches and otherapplications.

Only a single pass through the alignment matrix is needed.

14

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

15

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

16

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

ATCAGAGTCTTCAG--TC•++++^^++ : 0+2+2+2+2-2-2+2+2=8

17

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

ATCAGAGTCTTC--AGTC•++^^++++ : 0+2+2-2-2+2+2+2+2=8

18

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

ATCAGAGTC : 8TTCAG--TC

ATCAGAGTC : 8TTC--AGTC

ATCAGAGTC : 8TTCA--GTC

19

Sequence alignmentNeedleman-Wunsch global alignment

Algorithm calculates score/s of optimal global sequence alignments, penalizes end gaps and penalizes each residue in a gap is equally.

ATCAGAGTC has lower score then CAGAGTC --TTCAGTC TTCAGTC

ATCACAGTC has same score as ATCACAGTC T-C--AGTC T---CAGTC

ATCACAGTC has lower score then ACACAGTC T---CAGTC T--CAGTC

20

Sequence alignmentNeedleman-Wunsch global alignment

In order to score a gap penalty q independent of the gap length, i.e

ACACAGTC ATCACAGTC AGCTTTCACAGTC all have theT--CAGTC T---CAGTC T-------CAGTC same score

the algorithm we presented is modified to extend alignments inmore then the three ways we considered.

21

[ab]

[a-]

[-b]

A T C A G A G T C

T 0 2 0 0 0 0 0 2 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

A 2 0 0 2 0 2 0 0 0

G 0 0 0 0 2 0 2 0 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

[ab]

[a-]

[-b]

22

Sequence alignmentNeedleman-Wunsch algorithm

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(0,j) ⇐ -q

for i ⇐ 1 to M do

{ S(i,0) ⇐ -q

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

max {S(0, j)...S(i-1, j)} -q,max {S(i, 0)...S(i, j-1)} -q)

} Pearson & MillerMeth Enz 210:575, ‘92

23

Sequence alignmentNeedleman-Wunsch global alignment

caveatsEvery algorithm is limited by the model it is built upon.

For example, the NW dynamic programming algorithm guaranteesus optimal global alignments with the parameters we supply(substitution matrix, gap penalty and gap scoring).

However -• Different parameters can give different alignments,• The correct alignment might not be the optimal one.• The correct alignment might correspond only to part of the global alignments,

24

Source: Pearson WR & Miller W"Dynamic programming algorithms for biological sequence comparison."Methods in Enzymology , 210:575-601 (1992).

Assignment: Calculate NW alignments with constant gap penalty seeingthe effect of different gap penalties and match/mismatch scores. In allcases use substitution matrices that have two types of scores only a valuefor an exact match and a lower value for mismatches. Try the nucleotidesequences used in class and the following amino acid sequences:“ACDGSMF” & “AMDFR”.

More details, sources and thingsto do for next class

top related