lecture 1, 31/10/2001 - weizmann institute of science · • the needleman-wunsch algorithm for...

24
1 Lecture 1, 31/10/2001: • Introduction to sequence alignment • The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Upload: others

Post on 02-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

1

Lecture 1, 31/10/2001:

• Introduction to sequence alignment

• The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Page 2: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

2

Computational sequence-analysis

The major goal of computational sequence analysis is to predict the function and structure of genes and proteins from their sequence.

This is made possible sinceorganisms evolve by mutation, duplication and selection oftheir genes.

Thus, sequence similarity often indicates functional andstructural similarity.

Page 3: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

3

5’ 3’ 5’ 3’

Sequence alignment

ATCAGAGTC TTCAGTC

ATC ≠ CTA

AG ≠ GA

etc.

Page 4: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

4

Sequence alignment

We wish to identify what regions are most similar to eachother in the two sequences . Sequences are shifted one by theother and gaps introduced, to cover all possible alignments.The shifts and gaps provide the steps by which one sequencecan be converted into the other.

ATCAGAGTC TTCA--GTC +++^^+++

Page 5: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

5

A T C A G A G T C

T

T

C

A

G

T

C

A T C A G A G T C

T • •

T • •

C • •

A • • •

G • •

T • •

C • •

Sequence alignmentdot-plot

ATTCATCA

GA--GTCGTC

Page 6: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

6

ATCAGAGTCTTCA--GTC

Sequence alignmentscoring

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Substitution matrix - the similarity value between each pair of residues

Gap penalty - the cost of introducing gaps Gap penalty -2

A C G TACGT

: 0+2+2+2-2-2+2+2+2 = 8•+++^^+++

Page 7: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

7

A T C A G A G T C

T 0 2 0 0 0 0 0 2 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

A 2 0 0 2 0 2 0 0 0

G 0 0 0 0 2 0 2 0 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

[T2T1] ATC -TT

[C3T1] ATC- --TT

[T2T2] ATC TT-

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Initialization

Position 3,2 :

[ab]

[a-]

[-b]

Page 8: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

8

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 2 0 0 0 0 0 2 0

T - 4 0 2 0 0 0 0 0 2 0

C - 6 0 0 2 0 0 0 0 0 2

A - 8 2 0 0 2 0 2 0 0 0

G - 1 0 0 0 0 0 2 0 2 0 0

T - 1 2 0 2 0 0 0 0 0 2 0

C - 1 4 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

[ab]

[a-]

[-b]

Directionality of score calculation

Initialization

Page 9: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

9

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 2 0 0 0 0 0 2 0

T - 4 0 2 0 0 0 0 0 2 0

C - 6 0 0 2 0 0 0 0 0 2

A - 8 2 0 0 2 0 2 0 0 0

G - 1 0 0 0 0 0 2 0 2 0 0

T - 1 2 0 2 0 0 0 0 0 2 0

C - 1 4 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Page 10: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

10

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

Page 11: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

11

σ[ab] : score of aligning a pair of residues a and b

σ[a-] : score of aligning residue a with a gap (gap penalty: -q)

S : score matrix

S(i,j) : optimal score of aligning residues positions 1 to i on one sequence with residues positions 1 to j on another sequence

Sequence alignmentNeedleman-Wunsch algorithm

Page 12: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

12

Sequence alignmentNeedleman-Wunsch algorithm

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(0,j) ⇐ S(0,j-1) + σ[-bj]

for i ⇐ 1 to M do

{ S(i,0) ⇐ S(i-1,0) + σ[ai-]

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

S(i-1, j) + σ[ai- ],

S(i, j-1) + σ[-bj ])

} Pearson & MillerMeth Enz 210:575, ‘92

Page 13: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

13

Sequence alignmentNeedleman-Wunsch global alignment

Optimal score/s is found - more steps needed to find thecorresponding alignment/s.This is a time-saving property in database searches and otherapplications.

Only a single pass through the alignment matrix is needed.

Page 14: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

14

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

A T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

Page 15: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

15

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

Page 16: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

16

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

ATCAGAGTCTTCAG--TC•++++^^++ : 0+2+2+2+2-2-2+2+2=8

Page 17: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

17

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

ATCAGAGTCTTC--AGTC•++^^++++ : 0+2+2-2-2+2+2+2+2=8

Page 18: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

18

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Sequence alignmentNeedleman-Wunsch global alignment

the tracebackA T C A G A G T C

0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4 - 1 6 - 1 8

T - 2 0 0 - 2 - 4 - 6 - 8 - 1 0 - 1 2 - 1 4

T - 4 - 2 2 0 - 2 - 4 - 6 - 8 - 8 - 1 0

C - 6 - 4 0 4 2 0 - 2 - 4 - 6 - 6

A - 8 - 4 - 2 2 6 4 2 0 - 2 - 4

G - 1 0 - 6 - 4 0 4 8 6 4 2 0

T - 1 2 - 8 - 4 - 2 2 6 8 6 6 4

C - 1 4 - 1 0 - 6 - 2 0 4 6 8 6 8

ATCAGAGTC : 8TTCAG--TC

ATCAGAGTC : 8TTC--AGTC

ATCAGAGTC : 8TTCA--GTC

Page 19: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

19

Sequence alignmentNeedleman-Wunsch global alignment

Algorithm calculates score/s of optimal global sequence alignments, penalizes end gaps and penalizes each residue in a gap is equally.

ATCAGAGTC has lower score then CAGAGTC --TTCAGTC TTCAGTC

ATCACAGTC has same score as ATCACAGTC T-C--AGTC T---CAGTC

ATCACAGTC has lower score then ACACAGTC T---CAGTC T--CAGTC

Page 20: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

20

Sequence alignmentNeedleman-Wunsch global alignment

In order to score a gap penalty q independent of the gap length, i.e

ACACAGTC ATCACAGTC AGCTTTCACAGTC all have theT--CAGTC T---CAGTC T-------CAGTC same score

the algorithm we presented is modified to extend alignments inmore then the three ways we considered.

Page 21: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

21

[ab]

[a-]

[-b]

A T C A G A G T C

T 0 2 0 0 0 0 0 2 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

A 2 0 0 2 0 2 0 0 0

G 0 0 0 0 2 0 2 0 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

[ab]

[a-]

[-b]

Page 22: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

22

Sequence alignmentNeedleman-Wunsch algorithm

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(0,j) ⇐ -q

for i ⇐ 1 to M do

{ S(i,0) ⇐ -q

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

max {S(0, j)...S(i-1, j)} -q,max {S(i, 0)...S(i, j-1)} -q)

} Pearson & MillerMeth Enz 210:575, ‘92

Page 23: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

23

Sequence alignmentNeedleman-Wunsch global alignment

caveatsEvery algorithm is limited by the model it is built upon.

For example, the NW dynamic programming algorithm guaranteesus optimal global alignments with the parameters we supply(substitution matrix, gap penalty and gap scoring).

However -• Different parameters can give different alignments,• The correct alignment might not be the optimal one.• The correct alignment might correspond only to part of the global alignments,

Page 24: Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

24

Source: Pearson WR & Miller W"Dynamic programming algorithms for biological sequence comparison."Methods in Enzymology , 210:575-601 (1992).

Assignment: Calculate NW alignments with constant gap penalty seeingthe effect of different gap penalties and match/mismatch scores. In allcases use substitution matrices that have two types of scores only a valuefor an exact match and a lower value for mismatches. Try the nucleotidesequences used in class and the following amino acid sequences:“ACDGSMF” & “AMDFR”.

More details, sources and thingsto do for next class