lcs and extensions to global and local alignment

22
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003

Upload: anevay

Post on 05-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

LCS and Extensions to Global and Local Alignment. Dr. Nancy Warter-Perez June 26, 2003. Overview. Recursion Recursive solution to hydrophobicity sliding window problem LCS Smith-Waterman Algorithm Extensions to LCS Global Alignment Local Alignment Affine Gap Penalties - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LCS and Extensions to Global and Local Alignment

LCS and Extensions to Global and Local Alignment

Dr. Nancy Warter-PerezJune 26, 2003

Page 2: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 2

Overview Recursion Recursive solution to hydrophobicity

sliding window problem LCS Smith-Waterman Algorithm Extensions to LCS

Global Alignment Local Alignment Affine Gap Penalties

Programming Workshop 6

Page 3: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 3

Project References http://www.sbc.su.se/~arne/kurser/swell/pair

wise_alignments.html Computational Molecular Biology – An

Algorithmic Approach, Pavel Pevzner Introduction to Computational Biology –

Maps, sequences, and genomes, Michael Waterman

Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield

Page 4: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 4

Recursion Problems can be solved iteratively

or recursively Recursion is useful in cases where

you are building upon a partial solution

Consider the hydrophobicity problem

Page 5: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 5

Main.cpp#include <iostream>#include <string>using namespace std;#include "hydro.h"

double hydro[25] = {1.8,0,2.5,-3.5,-3.5,2.8,-0.4,-3.2,4.5,0,-3.9,3.8,1.9,-3.5,0,-1.6,-3.5,-4.5,-0.8,-0.7,0,4.2,-0.9,0,-1.3};

void main () { string seq; int ws, i; cout << "This program will compute the hydrophobicity of an sequence of

amino acids.\n"; cout << "Please enter the sequence: "<< flush; cin >> seq;

for(i = 0; i < seq.size(); i++) if((seq.data()[i] >= 'a') && (seq.data()[i] <= 'z')) seq.at(i) = seq.data()[i] - 32;

cout << "Please enter the window size: "<< flush; cin >> ws; compute_hydro(seq, ws);}

Page 6: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 6

Hydro.cpp#include <iostream>#include <string>using namespace std;#include "hydro.h"

void print_hydro(string seq, int ws, int i, double sum);void compute_hydro(string seq, int ws) { cout << "\n\nThe hydrophocity values are:" << endl;

print_hydro(seq, ws, seq.size()-1, 0);}void print_hydro(string seq, int ws, int i, double sum) {

if(i == -1)return;

if(i > seq.size() - ws)sum += hydro[seq.data()[i] - 'A'];

elsesum = sum - hydro[seq.data()[i+ws] - 'A'] + hydro[seq.data()

[i] - 'A'];print_hydro(seq, ws, i-1, sum);

if (i <= seq.size() - ws)cout << "Hydrophocity value:\t" << sum/ws << endl;

}

Page 7: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 7

hydro.h

extern double hydro[25];

void compute_hydro(string seq, int ws);

Page 8: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 8

Dynamic Programming Applied to optimization problems Useful when

Problem can be recursively divided into sub-problems Sub-problems are not independent

Page 9: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 9

Longest Common Subsequence (LCS) Problem Reference: Pevzner Can have insertion and deletions

but no substitutions (no mismatches)

Ex: V: ATCTGAT W: TGCATALCS:TCTA

Page 10: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 10

LCS Problem (cont.) Similarity score

si-1,j

si,j = max { si,j-1

si-1,j-1 + 1, if vi = wj

On board example: Pevzner Fig 6.1

Page 11: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 11

Indels – insertions and deletions (e.g., gaps)

alignment of V and W V = rows of similarity matrix (vertical axis) W = columns of similarity matrix (horizontal

axis) Space (gap) in W (UP)

insertion Space (gap) in V (LEFT)

deletion Match (no mismatch in LCS) (DIAG)

Page 12: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 12

LCS(V,W) Algorithmfor i = 0 to n

si,0 = 0for j = 1 to m

s0,j = 0for i = 1 to n

for j = 1 to mif vi = wj

si,j = si-1,j-1 + 1; bi,j = DIAGelse if si-1,j >= si,j-1

si,j = si-1,j; bi,j = UPelse

si,j = si,j-1; bi,j = LEFT

Page 13: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 13

Print-LCS(V,i,j)if i = 0 or j = 0

returnif bi,j = DIAG

PRINT-LCS(V, i-1, j-1)print vi

else if bi,j = UPPRINT-LCS(V, i-1, j)

elsePRINT-LCS(V, I, j-1)

Page 14: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 14

Classic Papers Needleman, S.B. and Wunsch

, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/needlemanandwunsch1970.pdf)

Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smithandwaterman1981.pdf)

Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smith1990.pdf)

Page 15: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 15

Smith-Waterman (1 of 3)Algorithm

The two molecular sequences will be A=a1a2 . . . an, and B=b1b2 . . . bm. A similarity s(a,b) is given between sequence elements a and b. Deletions of length k are given weight Wk. To find pairs of segments with high degrees of similarity, we set up a matrix H . First set

Hk0 = Hol = 0 for 0 <= k <= n and 0 <= l <= m.

Preliminary values of H have the interpretation that H i j is the maximum similarity of two segments ending in ai and bj. respectively. These values are obtained from the relationship

Hij=max{Hi-1,j-1 + s(ai,bj), max {Hi-k,j – Wk}, max{Hi,j-l - Wl }, 0} ( 1 ) k >= 1 l >= 1

1 <= i <= n and 1 <= j <= m.

Page 16: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 16

Smith-Waterman (2 of 3)

The formula for Hij follows by considering the possibilities for ending the segments at any ai and bj.

(1) If ai and bj are associated, the similarity is

Hi-l,j-l + s(ai,bj).

(2) If ai is at the end of a deletion of length k, the similarity is

Hi – k, j - Wk .

(3) If bj is at the end of a deletion of length 1, the similarity is

Hi,j-l - Wl. (typo in paper)

(4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to ai and bj.

Page 17: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 17

Smith-Waterman (3 of 3)The pair of segments with maximum similarity is found by first locating the maximum element of H. The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero. This procedure identifies the segments as well as produces the corresponding alignment. The pair of segments with the next best similarity is found by applying the traceback procedure to the second largest element of H not associated with the first traceback.

Page 18: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 18

Extend LCS to Global Alignment

si-1,j + (vi, -)si,j = max { si,j-1 + (-, wj)

si-1,j-1 + (vi, wj)

(vi, -) = (-, wj) = - = fixed gap penalty(vi, wj) = score for match or mismatch –

can be fixed, from PAM or BLOSUM

Page 19: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 19

Extend to Local Alignment0 (no negative scores)si-1,j + (vi, -)

si,j = max { si,j-1 + (-, wj)si-1,j-1 + (vi, wj)

(vi, -) = (-, wj) = - = fixed gap penalty(vi, wj) = score for match or mismatch –

can be fixed, from PAM or BLOSUM

Page 20: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 20

Discussion on adding affine gap penalties Affine gap penalty

Score for a gap of length x-( + x)

Where > 0 is the insert gap penalty > 0 is the extend gap penalty

Page 21: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 21

Alignment with Gap Penalties Can apply to global or local (w/ zero) algorithms

si,j = max { si-1,j - si-1,j - ( + )

si,j = max { si1,j-1 - si,j-1 - ( + )

si-1,j-1 + (vi, wj)si,j = max { si,j

si,jNote: keeping with traversal order in Figure 6.1, is replaced by

, and is replaced by

Page 22: LCS and Extensions to Global and Local Alignment

June 26, 2003 LCS and Extensions 22

Programming Workshop 6 Implement LCS

LCS(V,W) b and s are global matrices

Print-LCS(V,i,j) Write a program that uses LCS and

Print-LCS. The program should prompt the user for 2 sequences and print the longest common sequence.