1 kmp algorithm advisor: prof. r. c. t. lee reporter: c. w. lu knuth d.e., morris (jr) j.h., pratt...
Post on 21-Dec-2015
217 views
TRANSCRIPT
1
KMP algorithm
Advisor: Prof. R. C. T. Lee
Reporter: C. W. Lu
KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal on Computing 6(1), 1977, pp.323-350.
2
KMP Table • The KMP algorithm constructs a table in prepr
ocessing phase.
• In searching phase, the window shifting will be determined easily by looking up the table.
• Example:
P = bcbabcbaebcbabcba
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
-1 0 -1 1 4 -1 0 -1-1 0 -1 1
P
KMP Table13
a
1
i 14
b
15
c
16
b
-1 0 -1
a
1
0
3
• Once the KMP table is constructed, whenever a mismatch occurs at location i, for the KMP algorithm, we move the pattern i-KMPtable(i) steps under the assumption that the location starts with 0.
4
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
-1 0 -1 1 4 -1 0 -1-1 0 -1 1
P
13
a
1
i 14
b
15
c
16
b
-1 0 -1
a
1
0
T:
P: b ac bb c b a e b ac bb c b a
b ac cb c b a e b ac bb c b a… …
Mismatch occurs at location 4 of P.
Move P (4 - KMPtable[4]) = 4 - (-1) = 5 steps.
b ac bb c b a e b ac bb c b a
5
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
-1 0 -1 1 4 -1 0 -1-1 0 -1 1
P
13
a
1
i 14
b
15
c
16
b
-1 0 -1
a
1
0
T:
P: b ac bb c b a e b ac bb c b a
b ac bb c b a b b ac bb c b a… …
Mismatch occurs in position 8 of P.
Move P (8 - KMPtable[8]) = 8 - 4 = 4 steps.
b ac bb c b a e b ac bb c b a
6
The Definition of the KMP Table
• For location i, if j is the largest such that P(0,j-1) is a suffix of P(0,i-1) and P(i) not equal to P(j), then KMPtable(i)=j.
• Example 5
b
6
c
7
b
8
b e
1
b
2
c
3
b
4
aP
i 0
-1 0 -1 3-1 0 -1 1 0
∵P(0, 2) is the longest prefix which is equal to a suffix of P(0, 6), and P(7)≠P(3).
∴KMPtable[7] = 3.
7
Condition for KMPtable[i] = -1
Condition A: P(0) = P(i)
Condition B: P(0, j) is a suffix of P(0, i-1)
Condition C: P(j+1) = P(i)
)&&)(()&(
))&(&)(()&(
))&)(((&
CBAjBA
CBAjBA
CBjBA
KMPtable(i) = -1 :
8
• There is no suffix of P(0, 3) which is equal to a prefix of P(0, 3). ( )
• P(0) = P(4). (A)• KMPtable[4] = -1 because it satisfies the condition
.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
-1-1 0 -1 1
P
13
a
i 14
b
15
c
16
b a
0
B
)&( BA
9
• There are two suffixes of P(0, 14) which are equal to a prefix of P(0, 14):
bcbabc (P(0, 5)) and P(6) = P(15);
bc (P(0, 1)), and P(2) = P(15). ( )• P(0) = P(4). (A)• KMPtable[15] = -1 because it satisfies the condition
.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1 0 -1
)&&)(( CBAj
)&)(( CBj
10
Condition for KMPtable[i] = 0
• Condition A: P(0) = P(i)• Condition B: P(0, j) is a suffix of P(0, i-1)• Condition C: P(j+1) = P(i)
)&&)(()&(
))&(&)(()&(
))&)(((&
CBAjBA
CBAjBA
CBjBA
KMPtable(i) = 0 :
11
• There are two suffixes of P(0, 13) which are equal to a prefix of P(0, 13):
bcbab (P(0, 4)) and P(5) = P(14);
b (P(0)), and P(1) = P(14). ( )• P(0) = P(4). ( )• KMPtable[14] = 0 because it satisfies the condition
.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1 0
)&)(( CBjA
)&&)(( CBAj
12
How to Construct the KMP Table Efficiently?
• Note that the KMP algorithm is actually an improvement of the MP algorithm. Therefore, we may now take a look at the table used in the MP algorithm.
• We call the table used in the MP algorithm the prefix table.
13
The Definition of the Prefix Table.
• For location i, let j be the largest j, if it exists, such that P(0,j-1) is a suffix of P(0,i), Prefix(i)=j.
• If, for P(0,i), there is no prefix equal to a suffix, Prefix (i)=0.
14
Example
• Note that, in the MP algorithm, we move the pattern i-Prefix(i)+1 steps when a mismatch occurs at location i.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
1 2 3 4 0 1 2 30 0 1 0
P
13
a
4
i 14
b
15
c
16
b
5 6 7
a
8
0
Prefix
15
How can we construct the Prefix Table Efficiently?
• To compute Prefix(i), we look at Prefix(i-1).
• In the following example, since Prefix(11)=4, we know that there exists a prefix of length 4 which is equal to a suffix with length 4 of P(0,11). Besides, P(4)=P(12). We may conclude that Prefix(12)=Prefix(11)+1=4+1=5.
5
a
6
c
7
c
8
g
9
a
10
g
11
c
12
a
1 2
g
3
c
4
aP a
i 0
a
1 0 0 0 1 2 3 40 0 1 50Prefix
16
Another Case
• Consider the following example.
• Prefix(9)=4. But P(4)≠P(10).
• Can we conclude that Prefix(10)=0?
• No, we cannot.
5
a
6
g
7
c
8
g
9
c
10
g c
1 2
g
3
c
4
gP
i 0
c
0 0 1 2 3 4 ?0 1 20Prefix
17
• There exists a shorter prefix with length 2 which is equal to a suffix of P(0, 9), and P(10)=P(2). We should conclude that Prefix(10)=2+1=3.
5
a
6
g
7
c
8
g
9
c
10
g c
1 2
g
3
c
4
gP
i 0
c
0 0 1 2 3 4 ?0 1 20Prefix
18
• In other words, we may use the pointer idea expressed below:
• It may be necessary to examine P(0, j) to see whether there exists a prefix of P(0, j) equal to a suffix of P(0, j).
• Thus the Prefix function can be found recursively.
Y X
i-1j
19
Construct the Prefix
Function f
f [0]=0
For ( i=1 ; i<m ; i++ ){
t = f (i-1);
While(t>=0){
if ( P(i) = P(t) ) {
f [i] = t + 1;
break;
}
else{
if ( t != 0)
t = f [t-1]; /*recursive*/
else{
f [i] = 0;
break;
}
}
}
}
20
t = f[i-1] = f[0] = 0;
∵P[1] = c ≠ P[t] = P[0] = b ∴f [1] = 0.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
0
P
13
a
i 14
b
15
c
16
b a
0
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
0 0
P
13
a
i 14
b
15
c
16
b a
0
Example:
Prefix
Prefix
21
Example:
t = f[i-1] = f[7] = 4;
∵P[8] = e ≠ P[t] = P[4] = b, t != 0;
t = f[t-1] = f[3] = 0;
∵P[8] = e ≠ P[t] = P[0] = b, ∴f [8] = 0.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
1 20 0 1 0
P
13
a
i 14
b
15
c
16
b a
0
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
1 2 3 4 00 0 1 0
P
13
a
i 14
b
15
c
16
b a
0
t = f[i-1] = f[4] = 1;
∵P[5] = c = P[t] = P[1] = c ∴f [5] = t +1 = 2.
Prefix
Prefix
22
Example:
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
1 2 3 4 0 1 2 30 0 1 0
P
13
a
4
i 14
b
15
c
16
b
5 6 7
a
0
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
a
1 2 3 4 0 1 2 30 0 1 0
P
13
a
4
i 14
b
15
c
16
b
5 6 7
a
8
0
t = f[i-1] = f[14] = 6;
∵P[15] = b = P[t] = P[6] = b ∴f [15] = t +1 = 7.
Prefix
Prefix
24
KMPtable[0] = -1
For ( i=1 ; i<m ; i++ ) {
t=f (i-1)
While ( t > 0 ) {
if ( P(i) ≠ P(t) ) {
KMPtable[i]=t
break
}
else t = f ( t – 1 ) /*recursive*/
}
if ( KMPtable[i] = ψ)
if ( P(i) = P(0)) .
KMPtable[i] = -1
else
KMPtable[i] = 0
}
The KMPtable
25
t = f[i-1] = f[2] = 1;
∵P[3] = a ≠ P[t] = P[1] = c, ∴ KMPtable [3] = t = 1.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
-1KMPtable
1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
-1 0 -1 1KMPtable
1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8
Example:
Prefix
Prefix
26
t = f[i-1] = f[6] = 3;
∵P[7] = a = P[t] = P[3] = a;
t = f[t-1] = f[2] = 1;
∵P[7] = a ≠ P[t] = P[1] = c;
∴ KMPtable [3] = t = 1.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
-1 0 -1 1-1 0 -1 1KMPtable
1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8
Example:
Prefix
27
t = f[i-1] = f[12] = 4;
∵P[13] = b = P[t] = P[4] = b;
t = f[t-1] = f[3] = 0;
∵P[13] = b = P[0] = b;
∴ KMPtable [13] = -1.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1KMPtable
1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8
Example:
Prefix
28
t = f[i-1] = f[15] = 7;
∵P[16] = a = P[t] = P[7] = a;
t = f[t-1] = f[6] = 3;
∵P[16] = a = P[t] = P[3] = a;
t = f[t-1] = f[2] = 1;
∵P[16] = a ≠ P[t] = P[1] = c;
∴ KMPtable [16] = t = 1.
5
b
6
c
7
b
8
a
9
e
10
b
11
c
12
b
1
b
2
c
3
b
4
aP
13
a
i 14
b
15
c
16
b a
0
f
-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1 0 -1 1KMPtable
1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8
Example:
29
Time Complexity
Preprocessing phase in O(m) space and time complexity.
Searching phase in O(n+m) time complexity.
30
References • AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of Theoretical Computer Science, Volum
e A, Algorithms and complexity, J. van Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam. • AOE, J.-I., 1994, Computer algorithms: string pattern matching strategies, IEEE Computer Society Press. • BAASE, S., VAN GELDER, A., 1999, Computer Algorithms: Introduction to Design and Analysis, 3rd Edition, Ch
apter 11, pp. ??-??, Addison-Wesley Publishing Company. • BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B., 1999, Indexing and Searching, in Modern Information R
etrieval, Chapter 8, pp 191-228, Addison-Wesley. • BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Ma
sson, Paris. • CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms, Chapter 34, pp 853-885, MI
T Press. • CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostoli
co and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. • CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computatio
n Handbook, M.J. Atallah ed., Chapter 11, pp 11-1--11-28, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression algorithms, in CRC Computer Sc
ience and Engineering Handbook, A. Tucker ed., Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. • GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data Structures in Pascal and C, 2nd
Edition, Chapter 7, pp. 251-288, Addison-Wesley Publishing Company.
31
References • GOODRICH, M.T., TAMASSIA, R., 1998, Data Structures and Algorithms in JAVA, Chapter 11, pp 441-467, John
Wiley & Sons. • GUSFIELD, D., 1997, Algorithms on strings, trees, and sequences: Computer Science and Computational Biology ,
Cambridge University Press. • HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie
des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110.
• HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte , Ph. D. Thesis, University Paris 7, France.
• KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350.
• SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley Publishing Company. • SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing Company. • SEDGEWICK, R., FLAJOLET, P., 1996, An Introduction to the Analysis of Algorithms, Chapter ?, pp. ??-??, Addis
on-Wesley Publishing Company. • STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific. • WATSON, B.W., 1995, Taxonomies and Toolkits of Regular Language Algorithms, Ph. D. Thesis, Eindhoven Univ
ersity of Technology, The Netherlands. • WIRTH, N., 1986, Algorithms & Data Structures, Chapter 1, pp. 17-72, Prentice-Hall.