1 approximate boyer-moore string matching source : siam journal on computing, vol. 22, no. 2, 1993,...
TRANSCRIPT
![Page 1: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/1.jpg)
1
Approximate Boyer-Moore String Matching
Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260
J. Tarhio and E. Ukkonen
Advisor: Prof. R. C. T. Lee
Speaker: Kuei-hao Chen
![Page 2: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/2.jpg)
2
• The k mismatches problem
• The k differences problem
![Page 3: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/3.jpg)
3
Definition of the k mismatches problem• Given a pattern string P of length m and a
text string T of length n, we would like to find all approximate occurrences P in T with at most k mismatches.
If k=1, then
Text
Pattern
a
b
![Page 4: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/4.jpg)
4
Consider the following situation where a pattern P is matching with a windows W of T and there are already (k+1) mismatches:
T
P
W
k+1 mismatches
![Page 5: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/5.jpg)
5
Since there are already (k+1) mismatches, we must move the pattern. The following is obvious:
P must be moved to such an extent that there are at most k mismatches between a suffix S of W and a substring S’ of P.
T
Pk+1 mismatches
S
S’
![Page 6: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/6.jpg)
6
Our trick is as follows: Consider the (k+1)-suffix of W. There are two cases:
![Page 7: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/7.jpg)
7
Case 1: There is one character in this (k+1)-suffix which exists in P in such a way as shown below. Move the pattern to match these characters. Note that in such a situation, there are at most k mismatches between the (k+1)-suffix and its corresponding substring in P.
T
P
(k+1)-suffix
x
x
T
P
(k+1)-suffix
x
x
![Page 8: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/8.jpg)
8
Case 2: No such a character exists. Move the pattern in such a way that the k-prefix of P aligns with the k-suffix of W as shown below. Under such a situation, again, there are at most k-mismatches between the k-suffix of W and k-prefix of P.
T
P
(k+1)-suffix
k-prefix
![Page 9: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/9.jpg)
9
The generalization of the BM algorithm for the k mismatches problem will be very natural: for k=0 the generalized algorithm is exact string matching.
Recall that the k mismatches problem asks for finding all occurrences of P in T such that in at most k positions of P, T and P have different characters.
![Page 10: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/10.jpg)
10
We just scan the pattern from right to the left until we have found k+1 mismatches (unsuccessful search) or the pattern ends (successful search).
![Page 11: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/11.jpg)
11
Preprocessing phase for approximate matching
Dk tableThe value Dk for a particular alphabet is defined as the rightmost position of that character in the pattern – 1 and the end position i where i=[m..m-k].
Σ A C G *
D1[i=8, a]
1 6 2 8
Example : Let k=1, m=8, a ∑
j 1 2 3 4 5 6 7 8
P:
G C A G A G A G
i 1 2 3 4 5 6 7 8
P:
G C A G A G A G
Σ A C G *
D1[i=7, a]
2 5 1 8
j 1 2 3 4 5 6 7
P:
G C A G A G A
Σ A C G *
D1[i=8, a]
1 6 2 8
D1[i=7, a]
2 5 1 8
![Page 12: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/12.jpg)
12
P = p1p2…pm,T = t1t2…tn
Preprocessing
For a ∑ Do
For j=m downto m-k Do Begin
dk[j,a] ← m
Find a character a that it is close to pj. If it is found, we calculate the distance between the position of the character a and j and insert it into dk[j,a].
Algorithm for preprocessing phase
![Page 13: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/13.jpg)
13
P = p1p2…pm,T = t1t2…tn
Searchingj=m;While j ≦ n+ k Do Begin
h=j; i=m; mismatch=0;While i>0 and mismatch ≦ k Do Begin
d=min(dk[i, th], dk[i-1, th-1]);
If th≠pi Then mismatch=mismatch+1;i= i- 1; h= h-1 End of while;
If mismatch ≦ k Then report match at position j;j= j+ d End of while
Algorithm for searching phase
![Page 14: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/14.jpg)
14
Complete example for approximate string matching
Example 1:Let k=1, m=4, n=17
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
![Page 15: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/15.jpg)
15
Example 1 (1/6)
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
![Page 16: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/16.jpg)
16
Example 1 (2/6)
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
![Page 17: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/17.jpg)
17
Example 1 (3/6)
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
![Page 18: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/18.jpg)
18
Example 1 (4/6)
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
![Page 19: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/19.jpg)
19
Example 1 (5/6)
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
![Page 20: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/20.jpg)
20
Example 1 (6/6)
T: T T A A C G T A A T G C A G C T A
P: A G C T
Σ A C G T
D1[i=4, a]
3 1 2 4
D1[i=3, a]
2 3 1 4
j ← 16 + p , j ← 16+ 3, j ← 19
jump out of while loop
![Page 21: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/21.jpg)
21
Example 2:Let k=1, m=8, n=24
T: G C A T C G C A G A G A G T A T A C A G T A C G
P: G C A G A G A G
Σ A C G *
D1[i=8, a]
1 6 2 8
D1[i=7, a]
2 5 1 8
![Page 22: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/22.jpg)
22
Example 2 (1/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D1[i=8, a]
1 6 2 8
D1[i=7, a]
2 5 1 8
![Page 23: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/23.jpg)
23
Example 2 (3/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
![Page 24: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/24.jpg)
24
Example 2 (4/14)
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
![Page 25: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/25.jpg)
25
Example 2 (5/14)
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A GThen report match at position j;
j ← 13 + p , j ← 13+ 2, j ← 15
![Page 26: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/26.jpg)
26
Example 2 (6/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
![Page 27: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/27.jpg)
27
Example 2 (7/14)
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
![Page 28: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/28.jpg)
28
Example 2 (8/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
![Page 29: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/29.jpg)
29
Example 2 (9/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
![Page 30: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/30.jpg)
30
Example 2 (11/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
![Page 31: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/31.jpg)
31
Example 2 (13/14)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
![Page 32: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/32.jpg)
32
Example 2 (14/14)
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A GIf h = 0 Then report match at position j;
j ← 24 + p , j ← 24+ 2, j ← 26
jump out of while loop
![Page 33: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/33.jpg)
33
Time complexity
• preprocessing phase in O(m+ kc) time and O(kc) space complexity.
• searching phase in O(mn) time complexity.
![Page 34: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/34.jpg)
34
Definition of the k differences problem• Given a pattern string P of length m and a
text string T of length n, we would like to find all approximate occurrences P in T with edit distance not larger than k.
![Page 35: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/35.jpg)
35
The basic approach to solve the problem is to find the edit distance for T(1, i) and P for every i [Ukk85b] :
Let Edit be an m+1 by n+1 table such that Edit(i, j) is the minimum edit distance between p1p2
…pj and any substring of T ending at ti.
niiEdit
tpjiEdit
jiEdit
jiEdit
minjiEdit
ij
0 ,0)0,(
1 0 1)- 1,- (
11)- , (
1) 1,- (
) , (
elsethenif
![Page 36: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/36.jpg)
36
Table Edit must be completely evaluated column-by-column in time O(mn).
![Page 37: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/37.jpg)
37
If we can find out all occurrences of i where Edit(T(1, i), P) cannot be smaller than k. We may skip this i.
This paper is based upon Rule 7 proposed by Professor Lee.
![Page 38: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/38.jpg)
38
Rule 7
• If k characters in String A do not appear in String B, Distance(A,B) is not smaller than k.
![Page 39: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/39.jpg)
39
In the scanning phase, we define some terms first.
A diagonal h of Edit for h=-m,…, n, consists of all Edit(i, j) such that i- j=h.
For every Edit(i, j), there is a minimizing arc from Edit(i-1, j) to Edit(i, j) if Edit(i, j)=Edit(i-1, j)+1, from Edit(i, j-1) to Edit(i, j) if Edit(i, j-1)+1, and from Edit(i-1, j-1) to Edit(i, j) if Edit(i, j)=Edit(i-1, j-1) where pj=ti or if Edit(i, j)=Edit(i-1, j-1)+1 where pj≠ti. The costs of the arcs are 1, 1, 0 and 1, respectively. Edit(i, j-1)
Edit(i-1, j)
Edit(i-1, j-1)
Edit(i, j)
pj≠ti
pj≠ti
pj≠ti
pj=ti
Minimizing arc
Deletion
Insertion
Substitution
![Page 40: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/40.jpg)
40
A minimizing path is any path that consists of minimizing arcs and leads from an entry Edit(i, 0) on the first row of Edit to an entry Edit(h, m) on the last row of Edit.
A minimizing path is successful if it leads to an entry Edit(h, m)≤k.
![Page 41: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/41.jpg)
41
Lemma 1: The entries on a successful minimizing path M are contained in ≤ k+1 successive diagonals of Edit.
Proof : Each addition of a diagonal comes from either an insertion or deletion. If there are more than (k+1) diagonals, there must be more than (k+1) operations, either deletions or insertions. Thus there cannot be more than (k+1) diagonals.
Text
Pattern
t1t2... ...tn-1tn
pm
p1
p2
pm-1...
...
A successful minimizing pathSuccessive diagonals
![Page 42: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/42.jpg)
42
A B C A B B A
C
B
A
B
A
C
0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 1
2 2 1 1 1 1 1 2
3 2 2 2 1 2 2 1
4 3 2 3 2 1 2 2
5 4 3 3 3 2 2 2
6 5 4 3 4 3 3 3
T:ABCABBAP:CBABAC
S:C-AB--P:CBABAC
EDIT(P, S)=3
There are (k+1)=3+1=4 successive diagonals because there are three deletions.
Successive diagonals
![Page 43: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/43.jpg)
43
B C A B D A B
C
B
A
D
B
0 0 0 0 0 0 0 0
1 1 0 1 1 1 1 1
2 1 1 1 1 2 2 1
3 2 2 1 2 2 2 2
4 3 3 2 2 2 3 3
5 4 4 3 3 3 3 3
T:BCABDABP:CBADBk =3
S:C-ABDABP:CBA-D-B
EDIT(P, S)=3
There are 1+2=3 <(k+1) =3+1=4 successive diagonals because there are one deletion and two insertions.
Successive diagonals
![Page 44: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/44.jpg)
44
By Lemma 1, for each diagonal d, any successful minimizing path starting at the top of this diagonal will have a bandwidth of 1+k+k=2k+1
t1t2... ...tn
pm
p1
p2
pm-1
......
M
Bandwidth ≤ k of Edit
h
k
k
2k+1
![Page 45: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/45.jpg)
45
A B C A B B A
C
B
A
B
A
C
0 0 0 0 0 0 0 0
1 1 1 0 1 1 1 1
2 2 1 1 1 1 1 2
3 2 2 2 1 2 2 1
4 3 2 3 2 1 2 2
5 4 3 3 3 2 2 2
6 5 4 3 4 3 3 3
T:ABCABBAP:CBABACk=3
S:C-AB--P:CBABAC
EDIT(P, S)=3
Result
Successive diagonals
The successful minimizing path is only in the bandwidth ≤ 7 of Edit.
k=3
k=3
![Page 46: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/46.jpg)
46
For the width of bandwidth ≤ k of Edit, we give it a name, call k-environment.
For each j=1, …, m, let the k-environment of the pattern symbol pj be the string Cj=pj-k…pj+k, where pa=ε for a<1 and a> m.
Ppj+kpj-1pjpj+1...pj-k...
k-environment
![Page 47: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/47.jpg)
47
The longest vertical path in any minimizing path has length not greater than 2k+1. We only have to determine whether ti appears in the k environment of pj.
t1t2... ...tn
pm
p1
p2
pm-1
......
Bandwidth ≤ k of Edit
h
2k+1
ti
pj
![Page 48: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/48.jpg)
48
P
AT
t5 t8 t11
T C G C A G A G A T
G C A G A G A G A T G
p5 p8 p11
Given T=ATGCGAGAGAT, P=GCAGAGAGATG, and k=2. We select t5, t8 and t11 three characters.
The 2-environment of t5 is C5=p3p4p5p6p7=AGAGA.The 2-environment of t8 is C8=p6p7p8p9p10=GAGAT.The 2-environment of t11 is C11=p9p10p11=ATG.
![Page 49: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/49.jpg)
49
We now obtain a stronger version of Rule 7.
Lemma 2: Let a successful minimizing path M go through some entry on a diagonal h of Edit. Then for at most k indexes j, 1≤j ≤m, character th+j does not occur in the k environment of Cj.
A formal proof can be found in the paper. In the following, we give some physical feeling of it.
![Page 50: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/50.jpg)
50
In this case, although there are two mismatches, by deleting a which mismatches x, we may achieve a perfect match. Thus the edit distance between T and P may still be 1.
T
P
y
a
x
x by
k=1
![Page 51: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/51.jpg)
51
In this case, it can be seen that deleting one character in P will not result in a perfect match. Thus, the edit distance between T and P must be larger than 1.
k=1
T
P
y
b
x
ca bca
![Page 52: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/52.jpg)
52
The shift table is based on table Dk. We determine the first diagonal after h, say h+d, where at least one of the characters th+m, th+m-1, …, th+m-k matches with corresponding character of P. Finally, the maximum of k+1 and d is the length of the shift.
![Page 53: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/53.jpg)
53
The algorithm explains when a possible occurrence of P in T was found, DP approach is immediately used to find alignment result.
![Page 54: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/54.jpg)
54
Input: P = p1p2…pm,T = t1t2…tn and kOutput: All occurrence P in TInitially, the start position h of T =0, i=h+m;While i≤ n+ k do begin j=m; bad=0; While i>k and bad ≤ k do begin
If ti does not occur in Cj then bad=bad+1j=j-1;i=i-1 end;
If bad ≤ k then
W is a sequence from th-k to th+m.Using dynamic programming to align W with POutput alignment result.
We calculate shift steps d=min(Dk[i, tr], Dk[i-1, tr-1],); h=h+max(k+1,d) end;
Algorithm
![Page 55: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/55.jpg)
55
Complete example for approximate string matching
For example :Let k=1, m=8, n=24
T: G C A T C G C A G A G A G T A T G C A G A G C G
P: G C A G A G A G
Σ A C G *
D1[i=8, a]
1 6 2 8
D1[i=7, a]
2 5 1 8
![Page 56: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/56.jpg)
56
Example(1/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
k=1>k
t8=A appears in P(7,8)t7=C does not appear in P(6,8)t6=G appears in P(5,7)t5=C does not appear in P(4,6)
Shifting is needed now.
![Page 57: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/57.jpg)
57
Example(2/15)
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
>kk=1
t9=G appears in P(7,8)t8=A appears in P(6,8)t7=C does not appear in P(5,7)t6=G appears in P(4,6)t5=C does not appear in P(3,5)
Shifting is needed now.
![Page 58: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/58.jpg)
58
Example(3/15)
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
>kk=1
t11=G appears in P(7,8)t10=A appears in P(6,8)t9=G appears in P(5,7)t8=A appears in P(4,6)t7=C does not appear in P(3,5)t6=G appears in P(2,4)t5=C appears in P(1,3)t4=T does not appear in P(1,2) Shifting is needed now.
![Page 59: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/59.jpg)
59
Example(4/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
Σ A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
GCAGAGAGGCAGAGAG
Output :
C G C A G A G A G T
G
C
A
G
A
G
A
G
0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 1 0 1 0 1
2 1 1 0 1 1 1 1 1 1 1
3 2 2 1 0 1 1 2 1 2 2
4 2 1 0 1 1 2 1 2
5 2 1 0 1 1 2 2
6 2 1 0 1 1 2
7 2 1 0 1 2
8 2 1 0 1
W= CGCAGAGAGTP= GCAGAGAG
k=1
![Page 60: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/60.jpg)
60
Example(5/15)
GCAGAGA-GCAGAGAG
Output :
C G C A G A G A G T
G
C
A
G
A
G
A
G
0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 1 0 1 0 1
2 1 1 0 1 1 1 1 1 1 1
3 2 2 1 0 1 1 2 1 2 2
4 2 1 0 1 1 2 1 2
5 2 1 0 1 1 2 2
6 2 1 0 1 1 2
7 2 1 0 1 2
8 2 1 0 1
W= CGCAGAGAGTP= GCAGAGAG
![Page 61: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/61.jpg)
61
Example(6/15)
-CAGAGAGGCAGAGAG
Output :
C G C A G A G A G T
G
C
A
G
A
G
A
G
0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 1 0 1 0 1
2 1 1 0 1 1 1 1 1 1 1
3 2 2 1 0 1 1 2 1 2 2
4 2 1 0 1 1 2 1 2
5 2 1 0 1 1 2 2
6 2 1 0 1 1 2
7 2 1 0 1 2
8 2 1 0 1
W= CGCAGAGAGTP= GCAGAGAG
![Page 62: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/62.jpg)
62
Example(7/15)
CGCAGAGAG-GCAGAGAG
Output :
C G C A G A G A G T
G
C
A
G
A
G
A
G
0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 1 0 1 0 1
2 1 1 0 1 1 1 1 1 1 1
3 2 2 1 0 1 1 2 1 2 2
4 2 1 0 1 1 2 1 2
5 2 1 0 1 1 2 2
6 2 1 0 1 1 2
7 2 1 0 1 2
8 2 1 0 1
W= CGCAGAGAGTP= GCAGAGAG
![Page 63: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/63.jpg)
63
Example(8/15)
GCAGAGAGTGCAGAGAG-
Output :
C G C A G A G A G T
G
C
A
G
A
G
A
G
0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 1 0 1 0 1
2 1 1 0 1 1 1 1 1 1 1
3 2 2 1 0 1 1 2 1 2 2
4 2 1 0 1 1 2 1 2
5 2 1 0 1 1 2 2
6 2 1 0 1 1 2
7 2 1 0 1 2
8 2 1 0 1
W= CGCAGAGAGTP= GCAGAGAG
![Page 64: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/64.jpg)
64
Example(9/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
>kk=1
t15=A appears in P(7,8)t14=T does not appear in P(6,8)t13=G appears in P(5,7)t12=A appears in P(4,6)t11=G appears in P(3,5)t10=A appears in P(2,4)t9=G appears in P(1,3)t8=A does not appear in P(1,2) Shifting is needed now.
![Page 65: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/65.jpg)
65
Example(10/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
>kk=1
t16=T does not appear in P(7,8)t15=A appears in P(6,8)t14=T does not appear in P(5,7)
Shifting is needed now.
![Page 66: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/66.jpg)
66
Example(11/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
>kk=1
t18=C does not appear in P(7,8)t17=G appears in P(6,8)t16=T does not appear in P(5,7)
Shifting is needed now.
![Page 67: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/67.jpg)
67
Example(12/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
>kk=1
t19=A appears in P(7,8)t18=C does not appear in P(6,8)t17=G appears in P(5,7)t16=T does not appear in P(4,6)
Shifting is needed now.
![Page 68: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/68.jpg)
68
Example(13/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
>kk=1
t20=G appears in P(7,8)t19=A appears in P(6,8)t18=C does not appear in P(5,7)t17=G appears in P(4,6)t16=T does not appear in P(3,5)
Shifting is needed now.
![Page 69: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/69.jpg)
69
Example(14/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
>kk=1
t22=G appears in P(7,8)t21=A appears in P(6,8)t20=G appears in P(5,7)t19=A appears in P(4,6)t18=G does not appear in P(3,5)t17=G appears in P(2,4)t16=T does not appear in P(1,3)Shifting is needed now.
![Page 70: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/70.jpg)
70
Example(15/15)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T:
G C A T C G C A G A G A G T A T G C A G A G C G
P:
G C A G A G A G
a A C G *
D[i=8, a] 1 6 2 8
D[i=7, a] 2 5 1 8
GCAGAGCGGCAGAGAG
jump out of while loop
T G C A G A G C G
G
C
A
G
A
G
A
G
0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 1 0 1 0
2 2 1 0 1 1 1 1 0 1
3 2 1 0 1 1 2 1 1
4 2 1 0 1 1 2 1
5 2 1 0 1 2 2
6 2 1 0 1 2
7 2 1 1 2
8 2 2 1
W= TGCAGAGCGP= GCAGAGAGResult :
k=1
![Page 71: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/71.jpg)
71
Time complexity
• preprocessing phase and searching phase in O(mn/k) time and O(|Σ|n) space complexity.
![Page 72: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/72.jpg)
72
References
• [Bae89a] R. Baeza-Yates, Efficient Text Searching. Ph.D. Thesis, Report CS-89-17, University of Waterloo, Computer Science Department, 1989.
• [Bae89b] R. Baeza-Yates, String searching algorithms revisited. In: Proceedings of the Workshop on Algorithms and Data Structures
• (ed. F. Dehne et al.), Lecture Notes in Computer Science 382, Springer-Verlag, Berlin, 1989, pp.75–96.
• [BoM77] R. Boyer and S. Moore, A fast string searching algorithm. Communcations of the ACM 20, 1977, pp.762–772.
• [ChL90] W. Chang and E. Lawler, Approximate string matching in sublinear expected time. In: Proceedings of the 31st IEEE Annual Symposium on Foundations of Computer Science, 1990, pp.116–124.
• [Fel65] W. Feller, An Introduction to Probability Theory and Its Applications. Vol. I. John Wiley & Sons, 1965.
![Page 73: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/73.jpg)
73
References• [Fel66] W. Feller, An Introduction to Probability Theory and Its Applicatio
ns. Vol. II. John Wiley & Sons, 1966.• [GaG86] Z. Galil and R. Giancarlo, Improved string matching with k mism
atches. SIGACT News ,Vol. 17, 1986, pp.52–54.• [GaG88] Z. Galil and R. Giancarlo, Data structures and algorithms for appr
oximate string matching. Journal of Complexity, Vol. 4, 1988, pp.33–72.• [GaP89] Z. Galil and K. Park, An improved algorithm for approximate stri
ng matching. Proceedings of the 16t International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science 372, Springer-Verlag, Berlin, 1989, pp.394–404.
• [GrL89] R. Grossi and F. Luccio, Simple and efficient string matching with k mismatches. Information Processing Letters, Vol. 33, 1989, pp.113–120.
• [Hor80] N. Horspool, Practical fast searching in strings. Software Practice & Experience, Vol. 10, 1980, pp.501–506.
• [JTU90] P. Jokinen, J. Tarhio and E. Ukkonen, A comparison of approximate string matching algorithms. In preparation.
• [Kos88] S. R. Kosaraju, Efficient string matching. Extended abstract. Johns Hopkins University, 1988.
![Page 74: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/74.jpg)
74
References• [KMP77] D. Knuth, J. Morris and V. Pratt, Fast pattern matching in strings.
SIAM Journal on Computing, Vol. 6, 1977, pp.323–350.• [LaV88] G. Landau and U. Vishkin, Fast string matching with k difference
s. Journal of Computer and System Sciences, Vol. 37 (1988), 63–78.• [LaV89] G. Landau and U. Vishkin, Fast parallel and serial approximate st
ring matching. Journal of Algorithms, Vol. 10 (1989), pp.157–169.• [Sel80] P. Sellers, The theory and computation of evolutionary distances: P
attern recognition. Journal of Algorithms, Vol. 1, 1980, pp.359–372.• [Ukk85a] E. Ukkonen, Algorithms for approximate string matching.Inform
ation Control, Vol. 64, 1985, pp.100–118.• [Ukk85b] E. Ukkonen, Finding approximate patterns in strings. Journal of
Algorithms, Vol. 6, 1985, pp.132–137.• [UkW90] E. Ukkonen and D. Wood, Fast approximate string matching wit
h suffix automata. Report A-1990-4, Department of Computer Science, University of Helsinki, 1990.
• [WaF75] R. Wagner and M. Fischer, The string-to-string correction problem. Journal of the ACM, Vol. 21, 1975, pp.168–173.
![Page 75: 1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260 J. Tarhio and E. Ukkonen Advisor: Prof](https://reader036.vdocuments.mx/reader036/viewer/2022062417/55149f8f550346f06e8b5994/html5/thumbnails/75.jpg)
75
THANK YOU