a table-driven, full-sensitivity similarity search algorithm gene myers and richard durbin presented...
TRANSCRIPT
![Page 1: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/1.jpg)
A Table-Driven, Full-Sensitivity Similarity Search Algorithm
Gene Myers and Richard Durbin
Presented by Wang, Jia-Nan and Huang, Yu-Feng
![Page 2: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/2.jpg)
Outline
Introduction Background Preliminary Method Experiment
![Page 3: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/3.jpg)
Introduction
Given a Query and database . Do local alignment
Smith-Waterman : Guaranteed to find all local alignment . Expensive
BLAST FASTA
![Page 4: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/4.jpg)
Improvement
Hardware: more investment on computer ,CPU
Software Phil Green’s SWAT appeal to sparsity and some machine-level coding tricks
60% of dynamic programming matrix has value 0
Avoiding computing most of these unproductive entries
![Page 5: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/5.jpg)
Focus on improving protein similarity searches
This approach examines and compute only 4% of the underlying dynamic programming matrix
![Page 6: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/6.jpg)
Recall
Sequence alignment Local sequence alignment Global sequence alignment
Goal – matching path with highest score
Table-based computation and dynamic programming
![Page 7: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/7.jpg)
Dynamic Programming
Three basic components Recurrence relation Tabular computation Traceback
![Page 8: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/8.jpg)
Smith-Waterman Method
Dynamic programming algorithm Find the most similar subsequences of
two sequences Problem
Lots of computation will be googol Programmer will be crazy and excite Why? how to accelerate
![Page 9: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/9.jpg)
Background
Scoring System Simple scoring scheme Affine gap penalty scoring scheme PAM120 (PAMn) BLOSUM62 (BLOSUMn)
![Page 10: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/10.jpg)
Simple Scoring Scheme
Match (e.g. +8) Mismatch (e.g. -5) Gap constant penalty (e.g. -20)
![Page 11: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/11.jpg)
Affine Gap Penalty Scoring Scheme
Match (e.g. +8) Mismatch (e.g. -5) Gap symbol (e.g. -5) Gap open penalty (e.g. -10)
![Page 12: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/12.jpg)
PAM
PAM – Percent Accepted Mutation Dayhoff et al. (1978)
PAM unit Evolutionary time corresponding to average of 1
mutation per 100 residues 1% accepted PAMn
Relates to mutation probabilities in evolutionary interval of n PAM units
Some information from: http://www.apl.jhu.edu/~przytyck/CAMS_2004_1b.pdf
![Page 13: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/13.jpg)
PAM120
Source: http://eta.embl-heidelberg.de:8000/misc/mat/pam120.html
![Page 14: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/14.jpg)
BLOSUM62
BLOSUM – BLOcks SUbstitution Matrix Steven and Jorga G. Henikoff (1992) Paper: Amino acid substitution matrices from pro
tein blocks [PubMed] BLOSUMn
Relates to mutation probabilities observed between pairs of related proteins that diverged so above n% identity
Some information from: http://www.apl.jhu.edu/~przytyck/CAMS_2004_1b.pdf
![Page 15: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/15.jpg)
BLOSUM62C S T P A G N D E Q H R K M I L V F Y W
C 9 -1 -1 -3 0 -3 -3 -3 -4 -3 -3 -3 -3 -1 -1 -1 -1 -2 -2 -2
S -1 4 1 -1 1 0 1 0 0 0 -1 -1 0 -1 -2 -2 -2 -2 -2 -3
T -1 1 4 1 -1 1 0 1 0 0 0 -1 0 -1 -2 -2 -2 -2 -2 -3
P -3 -1 1 7 -1 -2 -1 -1 -1 -1 -2 -2 -1 -2 -3 -3 -2 -4 -3 -4
A 0 1 -1 -1 4 0 -1 -2 -1 -1 -2 -1 -1 -1 -1 -1 -2 -2 -2 -3
G -3 0 1 -2 0 6 -2 -1 -2 -2 -2 -2 -2 -3 -4 -4 0 -3 -3 -2
N -3 1 0 -2 -2 0 6 1 0 0 -1 0 0 -2 -3 -3 -3 -3 -2 -4
D -3 0 1 -1 -2 -1 1 6 2 0 -1 -2 -1 -3 -3 -4 -3 -3 -3 -4
E -4 0 0 -1 -1 -2 0 2 5 2 0 0 1 -2 -3 -3 -3 -3 -2 -3
Q -3 0 0 -1 -1 -2 0 0 2 5 0 1 1 0 -3 -2 -2 -3 -1 -2
H -3 -1 0 -2 -2 -2 1 1 0 0 8 0 -1 -2 -3 -3 -2 -1 2 -2
R -3 -1 -1 -2 -1 -2 0 -2 0 1 0 5 2 -1 -3 -2 -3 -3 -2 -3
K -3 0 0 -1 -1 -2 0 -1 1 1 -1 2 5 -1 -3 -2 -3 -3 -2 -3
M -1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5 1 2 -2 0 -1 -1
I -1 -2 -2 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4 2 1 0 -1 -3
L -1 -2 -2 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4 3 0 -1 -2
V -1 -2 -2 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 3 1 4 -1 -1 -3
F -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6 3 1
Y -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7 2
W -2 -3 -3 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11
![Page 16: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/16.jpg)
Preliminaries
Σ : sequences are composed |Σ| × |Σ| Substitution matrix S giving th
e score Uniform gap penalty g > 0 Query = q1q2 . . . qp of P letters
Target = t1t2 . . . tn of N letters Threshold T > 0
![Page 17: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/17.jpg)
Score Table Edit Graph
Picture source: http://searchlauncher.bcm.tmc.edu/help/Pictures/S-Wexample.gif
![Page 18: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/18.jpg)
![Page 19: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/19.jpg)
Problem
Find a high score local alignment between Query and Target whose path score T≧
Edit-graph figure1 Limit our attention to prefix-positive paths If there is a path of score T or greater in
the edit graph then there is a prefix positive path of score T or greater
![Page 20: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/20.jpg)
Definition A set P of index-value pairs { (i,v): i is [0,P]
![Page 21: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/21.jpg)
The start and extension tables Consider a vertex x in row j of the edit
graph of Query vs. Target
![Page 22: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/22.jpg)
![Page 23: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/23.jpg)
Start Trimming
Limiting the dynamic programming to the startable vertices requires a table Start(w) where w = |Σ|ks
![Page 24: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/24.jpg)
Start Trimming Worst case Let αbe the expected percentage of vertic
es that are seed
![Page 25: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/25.jpg)
Extension Trimming
A table that eliminates vertices that are not extendable
(i,j) is extendable vertex iff C(i,j)>Extend(i,Target[j+1…j+ke])
![Page 26: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/26.jpg)
Extension Trimming
![Page 27: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/27.jpg)
![Page 28: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/28.jpg)
A Table-Driven Scheme for DP
Goal: to restrict the SW computation to productive vertices
Jump table – captures the effect of Advance and Delete over kJ > 0 rows
space unmanageably large But only record those for which
} versus ofgraph edit in the
is ),( to),0( frompath maximal the:),{(),(
Queryw
ukkiukwiJump j
)( 2 Jk PO
),( uk )1( Tu
![Page 29: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/29.jpg)
Jump table
Start table
Space-saving version for Jump and Start tables
![Page 30: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/30.jpg)
Check for paths scoring T or more
jCand
jJ CandviTkjjetTiPeakv in ),(each for )])1(...1[arg,(
![Page 31: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/31.jpg)
![Page 32: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/32.jpg)
Recall – Affine Gap Penalty
Score Match Mismatch Gap symbol - gsp Gap open penalty - gop
Affine cost of gap of length k g + kh, g = gop, h = gsp
![Page 33: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/33.jpg)
Diagram of Affine Gap Penalty
CI
D
CI
D
CI
D
CI
D
-h-g-h
-g-h
-h
δ(ai,bj)
Source: kmchao’s lecture note
![Page 34: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/34.jpg)
Recurrence system - Gotoh
),(
),(
),()1,1(
max),(
)1,(
)1,(max),(
),1(
),1(max),(
jiI
jiD
bajiC
jiC
hgjiC
hjiIjiI
hgjiC
hjiDjiD
ji
![Page 35: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/35.jpg)
The Case of Affine Gap Costs
Simple scoring scheme affine gap penalty scheme
Affine edit graph and vertex structure Question: how to modify the equations
defined above?
![Page 36: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/36.jpg)
![Page 37: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/37.jpg)
Recurrence System for Affine Gap Costs
Two observations To compute the jth row form the (j-1)st requires
knowing only the vectors of and values in row j-1, and not on the values in that row
If then the value at vertex need not be recorded as any maximal path through its will have score less than the maximal path passing through the corresponding
gjiCjiI ),(),(),( ji
I
vertexI
vertexC
C ID
![Page 38: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/38.jpg)
Recurrence System
![Page 39: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/39.jpg)
Results
![Page 40: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/40.jpg)
Experiment
Method Edit graph based approach vs. SWAT
Scoring matrix PAM120
Affine gap cost 8+4n
Database (target) 3 million residue subset of the PIR database
Query A periodic clock protein of length 173 (pcp) A lactate dehydrogenase of length 319 (dehydro) A cGMP kinase of length 670 (kinase) A growth factor of length 1210 (g factor)
![Page 41: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/41.jpg)
PAM120 & Gap Cost 8+4n
![Page 42: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/42.jpg)
BLOSUM62 & Gap Cost 8+2n
![Page 43: A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649f555503460f94c78dc1/html5/thumbnails/43.jpg)
Thanks for Your Attention
Ending