blat – the b last- l ike a lignment t ool kent, w.j. genome res. 2002 12: 656-664 presenter:...
TRANSCRIPT
![Page 1: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/1.jpg)
BLAT – The BLAST-Like Alignment Tool
Kent, W.J.
Genome Res. 2002 12: 656-664
Presenter: 巨彥霖 田知本
![Page 2: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/2.jpg)
BLAT overview
• Use an index to find regions in genome
homologous to query.
• Do a detailed alignment between query
and homologous regions.
• Use dynamic programming to stitch
together detailed alignments regions
into detailed alignment of whole.
![Page 3: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/3.jpg)
Index
• Database : non-overlapping
• Query : overlapping
K-merK-mer
…K-mer
…K-merK-mer
![Page 4: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/4.jpg)
Example
• Database: cacaattatcacgaccgc
3-mers: cac aat tat cac gac cgc
Index: aat 3 gac 12
cac 0,9 tat 6
cgc 15
• Query: aattctcac
3-mers: aat att ttc tct ctc tca cac
0 1 2 3 4 5 6
![Page 5: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/5.jpg)
Search Criteria
• Single Perfect Matches
• Single Near Perfect Matches
• Multiple Perfect Matches
![Page 6: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/6.jpg)
Notation
• K : K-mer size
• M : The match ratio between homologous
area
• H : Homologous region size
• G : Query sequence size
• A : The alphabet size
![Page 7: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/7.jpg)
Single Perfect Matches (1)
K-mer
Perfect Match
kMp 1
Homologous
region
![Page 8: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/8.jpg)
Single Perfect Matches (2)
KHkMP /)1(1
Homologous
region
The prob of at least one k-mer perfect match :
H
K K K K K K K
(Sensitivity)
![Page 9: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/9.jpg)
Single Perfect Matches (3)
• The number of k-mer in the database = G / K• The number of k-mer in the query = Q – K + 1
The number of k-mer that are expected to
matched by chance : KAKGKQF )/1()/()1( (Specificity)
![Page 10: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/10.jpg)
Single Perfect Nucleotide K-mer Matches as Search Criterion
![Page 11: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/11.jpg)
Case (perfect match)
• Comparing mouse and human coding sequences at the nucleotide level :
H = 100
M = 86%
Sensitivity = 0.99
max K = 7
chance matches = 13078962
(query = 500 , database = 3 billion)
![Page 12: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/12.jpg)
Single Near Perfect Matches (1)
K-mer
Near Perfect Match
)1(11 MMKMp Kk
Homologous
region
Almost Perfect : One letter may mismatch
![Page 13: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/13.jpg)
Single Near Perfect Matches (2)
• Sensitivity
• Specificity
KHkpP /1 )1(1
))/1())/1(1()/1(()/()1( 1 KK AAAKKGKQF
![Page 14: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/14.jpg)
Case (near perfect match)
• Comparing mouse and human coding sequences at the nucleotide level :
H = 100
M = 86%
Sensitivity = 0.99
max K = 12
chance matches = 275671
(query = 500 , database = 3 billion)
![Page 15: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/15.jpg)
Single Near Perfect Nucleotide K-mer Matches as Search Criterion
![Page 16: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/16.jpg)
Multiple Perfect Matches
• Hit is triggered :– there must be N perfect matches– each no further than W letters from each other
in the database coordinate– have the same diagonal coordinate
![Page 17: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/17.jpg)
Example
W
a
b
c
d
The hits a, b, c, and d are all k letters long. Hits b and d have the same diagonal coordinate within W letters of each other. Therefore, they would match the 2 perfect K-mer search criteria.
Target Coordinate
Query C
oordinate
![Page 18: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/18.jpg)
Multiple Perfect Nucleotide K-mer Matches as Search Criterion
![Page 19: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/19.jpg)
Default
• Nucleotide– two perfect 11-mer
• Protein– single perfect 5-mer for standalone version– three perfect 4-mer for client/server version
![Page 20: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/20.jpg)
BLAST
1) Build the hash table for Sequence A.
2) Scan Sequence B for hits.
3) Extend hits.
![Page 21: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/21.jpg)
BLASTStep 1: Build the hash table for Sequence A. (3-tuple example)
For DNA sequences:
Seq. A = AGATCGAT 12345678AAAAAC..AGA 1..ATC 3..CGA 5..GAT 2 6..TCG 4..
TTT
For protein sequences:
Seq. A = ELVIS
Add xyz to the hash table if Score(xyz, ELV) T;≧Add xyz to the hash table if Score(xyz, LVI) T;≧Add xyz to the hash table if Score(xyz, VIS) T;≧
![Page 22: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/22.jpg)
BLASTStep2: Scan sequence B for hits.
![Page 23: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/23.jpg)
BLASTStep2: Scan sequence B for hits.
Step 3: Extend hits.
hit
Terminate if the score of the sxtension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.)
BLAST 2.0 saves the time spent in extension, and
considers gapped alignments.
![Page 24: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/24.jpg)
Algorithm
1. Search Stage– Use an index to find regions in genome
homologous to query
2. Alignment Stage– Do a detailed alignment between query and
homologous regions
3. Stitching and Filling In– Use dynamic programming to stitch together
detailed alignments regions into detailed alignment of whole
![Page 25: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/25.jpg)
Search Stage
• Build an index which contains positions of each K-mer in database.
• Step through each overlapping K-mer in query and look it up in index
• Get list of ‘hits’ - positions in query and in database that match for K bases
• Cluster hits to find homologous regions
![Page 26: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/26.jpg)
Search Stage
• Clump hits
![Page 27: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/27.jpg)
• Clump ‘clumps’
• Eliminate small clumps
homologous region
Search Stage
![Page 28: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/28.jpg)
Alignment Stage (nucleotide)
• Start from scratch with regions defined with K-mers
• Index on smaller K-mers, but extend each K-mer until it becomes specific
• Extend in both direction without mismatches or gaps and merge overlapping or continues alignments
• Recurse on gaps with smaller K until gap or hits are eliminated
![Page 29: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/29.jpg)
Alignment Stage (nucleotide)
recursive
![Page 30: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/30.jpg)
Alignment Stage (protein)
• Extend hits into maximal scoring ungapped alignment (HSPs) with +2/-1 scoring scheme
• Create a graph of all possible HSP merges
• Use dynamic programming to traverse the graph
![Page 31: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/31.jpg)
Alignment Stage (protein)
![Page 32: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/32.jpg)
Alignment Stage (protein)
query
homologous region
HSP
![Page 33: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/33.jpg)
Stitching and Filling In
• The alignment of gene is often scattered across multiple homologous regions found in the search stage
query
database
![Page 34: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/34.jpg)
Stitching and Filling In
query
database
homologous region
![Page 35: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/35.jpg)
Evaluation
• Comparison with Other Tools:– mRNA/Genome Alignments– Remapped 713 mRNAs corresponding to annotated
chromosome 22– BLAT took 26 sec while Sim4 took 17,468 sec
(almost 5h)
Est_genome Sim4 BLAT
Relative speed 1 333 223,000
Base accuracy N/A 99.66% 99.99%
Gene accuracy 77.7% 93.4% 99.5%
![Page 36: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/36.jpg)
Evaluation• Comparison with Other Tools:
– Translated Mouse/Human Alignments– 13 million mouse genomic reads vs. human
chromosome 22
WU-TBLASTX BLAT
Relative Speed 1x 73x
% RefSeq Covered 84.5% 86.7%
% Genome Covered 2.67% 2.89%
![Page 37: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/37.jpg)
BLAT vs. BLAST
• Index– Query vs. Database
• Hits– Perfect vs. Near Perfect
• Alignment– Separate vs. Together
![Page 38: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/38.jpg)
Magic Time !
![Page 39: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/39.jpg)
Magic
4
4
3
3
2
1
.5
Prediction !No
mind !Great !
![Page 40: BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本](https://reader036.vdocuments.mx/reader036/viewer/2022070406/56649de55503460f94adcf69/html5/thumbnails/40.jpg)
Reference
• http://amber.cs.umd.edu/class/838-s04/nada.ppt
• http://bioportal.weizmann.ac.il/course/ATIB/ATIB03_lecture3.print.pdf