1 kmp algorithm advisor: prof. r. c. t. lee reporter: c. w. lu knuth d.e., morris (jr) j.h., pratt...

1

KMP algorithm

Advisor: Prof. R. C. T. Lee

Reporter: C. W. Lu

KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal on Computing 6(1), 1977, pp.323-350.

2

KMP Table • The KMP algorithm constructs a table in prepr

ocessing phase.

• In searching phase, the window shifting will be determined easily by looking up the table.

• Example:

P = bcbabcbaebcbabcba

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1 0 -1 1 4 -1 0 -1-1 0 -1 1

P

KMP Table13

a

1

i 14

b

15

c

16

b

-1 0 -1

a

1

0

3

• Once the KMP table is constructed, whenever a mismatch occurs at location i, for the KMP algorithm, we move the pattern i-KMPtable(i) steps under the assumption that the location starts with 0.

4

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1 0 -1 1 4 -1 0 -1-1 0 -1 1

P

13

a

1

i 14

b

15

c

16

b

-1 0 -1

a

1

0

T:

P: b ac bb c b a e b ac bb c b a

b ac cb c b a e b ac bb c b a… …

Mismatch occurs at location 4 of P.

Move P (4 - KMPtable[4]) = 4 - (-1) = 5 steps.

b ac bb c b a e b ac bb c b a

5

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1 0 -1 1 4 -1 0 -1-1 0 -1 1

P

13

a

1

i 14

b

15

c

16

b

-1 0 -1

a

1

0

T:

P: b ac bb c b a e b ac bb c b a

b ac bb c b a b b ac bb c b a… …

Mismatch occurs in position 8 of P.

Move P (8 - KMPtable[8]) = 8 - 4 = 4 steps.

b ac bb c b a e b ac bb c b a

6

The Definition of the KMP Table

• For location i, if j is the largest such that P(0,j-1) is a suffix of P(0,i-1) and P(i) not equal to P(j), then KMPtable(i)=j.

• Example 5

b

6

c

7

b

8

b e

1

b

2

c

3

b

4

aP

i 0

-1 0 -1 3-1 0 -1 1 0

∵P(0, 2) is the longest prefix which is equal to a suffix of P(0, 6), and P(7)≠P(3).

∴KMPtable[7] = 3.

7

Condition for KMPtable[i] = -1

Condition A: P(0) = P(i)

Condition B: P(0, j) is a suffix of P(0, i-1)

Condition C: P(j+1) = P(i)

)&&)(()&(

))&(&)(()&(

))&)(((&

CBAjBA

CBAjBA

CBjBA

KMPtable(i) = -1 :

8

• There is no suffix of P(0, 3) which is equal to a prefix of P(0, 3). ( )

• P(0) = P(4). (A)• KMPtable[4] = -1 because it satisfies the condition

.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

-1-1 0 -1 1

P

13

a

i 14

b

15

c

16

b a

0

B

)&( BA

9

• There are two suffixes of P(0, 14) which are equal to a prefix of P(0, 14):

bcbabc (P(0, 5)) and P(6) = P(15);

bc (P(0, 1)), and P(2) = P(15). ( )• P(0) = P(4). (A)• KMPtable[15] = -1 because it satisfies the condition

.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1 0 -1

)&&)(( CBAj

)&)(( CBj

10

Condition for KMPtable[i] = 0

• Condition A: P(0) = P(i)• Condition B: P(0, j) is a suffix of P(0, i-1)• Condition C: P(j+1) = P(i)

)&&)(()&(

))&(&)(()&(

))&)(((&

CBAjBA

CBAjBA

CBjBA

KMPtable(i) = 0 :

11

• There are two suffixes of P(0, 13) which are equal to a prefix of P(0, 13):

bcbab (P(0, 4)) and P(5) = P(14);

b (P(0)), and P(1) = P(14). ( )• P(0) = P(4). ( )• KMPtable[14] = 0 because it satisfies the condition

.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1 0

)&)(( CBjA

)&&)(( CBAj

12

How to Construct the KMP Table Efficiently?

• Note that the KMP algorithm is actually an improvement of the MP algorithm. Therefore, we may now take a look at the table used in the MP algorithm.

• We call the table used in the MP algorithm the prefix table.

13

The Definition of the Prefix Table.

• For location i, let j be the largest j, if it exists, such that P(0,j-1) is a suffix of P(0,i), Prefix(i)=j.

• If, for P(0,i), there is no prefix equal to a suffix, Prefix (i)=0.

14

Example

• Note that, in the MP algorithm, we move the pattern i-Prefix(i)+1 steps when a mismatch occurs at location i.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

1 2 3 4 0 1 2 30 0 1 0

P

13

a

4

i 14

b

15

c

16

b

5 6 7

a

8

0

Prefix

15

How can we construct the Prefix Table Efficiently?

• To compute Prefix(i), we look at Prefix(i-1).

• In the following example, since Prefix(11)=4, we know that there exists a prefix of length 4 which is equal to a suffix with length 4 of P(0,11). Besides, P(4)=P(12). We may conclude that Prefix(12)=Prefix(11)+1=4+1=5.

5

a

6

c

7

c

8

g

9

a

10

g

11

c

12

a

1 2

g

3

c

4

aP a

i 0

a

1 0 0 0 1 2 3 40 0 1 50Prefix

16

Another Case

• Consider the following example.

• Prefix(9)=4. But P(4)≠P(10).

• Can we conclude that Prefix(10)=0?

• No, we cannot.

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 0 1 2 3 4 ?0 1 20Prefix

17

• There exists a shorter prefix with length 2 which is equal to a suffix of P(0, 9), and P(10)=P(2). We should conclude that Prefix(10)=2+1=3.

5

a

6

g

7

c

8

g

9

c

10

g c

1 2

g

3

c

4

gP

i 0

c

0 0 1 2 3 4 ?0 1 20Prefix

18

• In other words, we may use the pointer idea expressed below:

• It may be necessary to examine P(0, j) to see whether there exists a prefix of P(0, j) equal to a suffix of P(0, j).

• Thus the Prefix function can be found recursively.

Y X

i-1j

19

Construct the Prefix

Function f

f [0]=0

For ( i=1 ; i<m ; i++ ){

t = f (i-1);

While(t>=0){

if ( P(i) = P(t) ) {

f [i] = t + 1;

break;

}

else{

if ( t != 0)

t = f [t-1]; /*recursive*/

else{

f [i] = 0;

break;

}

}

}

}

20

t = f[i-1] = f[0] = 0;

∵P[1] = c ≠ P[t] = P[0] = b ∴f [1] = 0.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

0

P

13

a

i 14

b

15

c

16

b a

0

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

0 0

P

13

a

i 14

b

15

c

16

b a

0

Example:

Prefix

Prefix

21

Example:

t = f[i-1] = f[7] = 4;

∵P[8] = e ≠ P[t] = P[4] = b, t != 0;

t = f[t-1] = f[3] = 0;

∵P[8] = e ≠ P[t] = P[0] = b, ∴f [8] = 0.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

1 20 0 1 0

P

13

a

i 14

b

15

c

16

b a

0

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

1 2 3 4 00 0 1 0

P

13

a

i 14

b

15

c

16

b a

0

t = f[i-1] = f[4] = 1;

∵P[5] = c = P[t] = P[1] = c ∴f [5] = t +1 = 2.

Prefix

Prefix

22

Example:

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

1 2 3 4 0 1 2 30 0 1 0

P

13

a

4

i 14

b

15

c

16

b

5 6 7

a

0

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

a

1 2 3 4 0 1 2 30 0 1 0

P

13

a

4

i 14

b

15

c

16

b

5 6 7

a

8

0

t = f[i-1] = f[14] = 6;

∵P[15] = b = P[t] = P[6] = b ∴f [15] = t +1 = 7.

Prefix

Prefix

23

• The KMP Table can also be constructed recursively.

24

KMPtable[0] = -1

For ( i=1 ; i<m ; i++ ) {

t=f (i-1)

While ( t > 0 ) {

if ( P(i) ≠ P(t) ) {

KMPtable[i]=t

break

}

else t = f ( t – 1 ) /*recursive*/

}

if ( KMPtable[i] = ψ)

if ( P(i) = P(0)) .

KMPtable[i] = -1

else

KMPtable[i] = 0

}

The KMPtable

25

t = f[i-1] = f[2] = 1;

∵P[3] = a ≠ P[t] = P[1] = c, ∴ KMPtable [3] = t = 1.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

-1KMPtable

1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

-1 0 -1 1KMPtable

1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8

Example:

Prefix

Prefix

26

t = f[i-1] = f[6] = 3;

∵P[7] = a = P[t] = P[3] = a;

t = f[t-1] = f[2] = 1;

∵P[7] = a ≠ P[t] = P[1] = c;

∴ KMPtable [3] = t = 1.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

-1 0 -1 1-1 0 -1 1KMPtable

1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8

Example:

Prefix

27

t = f[i-1] = f[12] = 4;

∵P[13] = b = P[t] = P[4] = b;

t = f[t-1] = f[3] = 0;

∵P[13] = b = P[0] = b;

∴ KMPtable [13] = -1.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1KMPtable

1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8

Example:

Prefix

28

t = f[i-1] = f[15] = 7;

∵P[16] = a = P[t] = P[7] = a;

t = f[t-1] = f[6] = 3;

∵P[16] = a = P[t] = P[3] = a;

t = f[t-1] = f[2] = 1;

∵P[16] = a ≠ P[t] = P[1] = c;

∴ KMPtable [16] = t = 1.

5

b

6

c

7

b

8

a

9

e

10

b

11

c

12

b

1

b

2

c

3

b

4

aP

13

a

i 14

b

15

c

16

b a

0

f

-1 0 -1 1 4 -1 0 -1-1 0 -1 1 1 -1 0 -1 1KMPtable

1 2 3 4 0 1 2 30 0 1 0 4 5 6 7 8

Example:

29

Time Complexity

Preprocessing phase in O(m) space and time complexity.

Searching phase in O(n+m) time complexity.

30

References • AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of Theoretical Computer Science, Volum

e A, Algorithms and complexity, J. van Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam. • AOE, J.-I., 1994, Computer algorithms: string pattern matching strategies, IEEE Computer Society Press. • BAASE, S., VAN GELDER, A., 1999, Computer Algorithms: Introduction to Design and Analysis, 3rd Edition, Ch

apter 11, pp. ??-??, Addison-Wesley Publishing Company. • BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B., 1999, Indexing and Searching, in Modern Information R

etrieval, Chapter 8, pp 191-228, Addison-Wesley. • BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Ma

sson, Paris. • CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms, Chapter 34, pp 853-885, MI

T Press. • CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostoli

co and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. • CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computatio

n Handbook, M.J. Atallah ed., Chapter 11, pp 11-1--11-28, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression algorithms, in CRC Computer Sc

ience and Engineering Handbook, A. Tucker ed., Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL. • CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. • GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data Structures in Pascal and C, 2nd

Edition, Chapter 7, pp. 251-288, Addison-Wesley Publishing Company.

31

References • GOODRICH, M.T., TAMASSIA, R., 1998, Data Structures and Algorithms in JAVA, Chapter 11, pp 441-467, John

Wiley & Sons. • GUSFIELD, D., 1997, Algorithms on strings, trees, and sequences: Computer Science and Computational Biology ,

Cambridge University Press. • HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie

des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110.

• HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte , Ph. D. Thesis, University Paris 7, France.

• KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350.

• SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley Publishing Company. • SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing Company. • SEDGEWICK, R., FLAJOLET, P., 1996, An Introduction to the Analysis of Algorithms, Chapter ?, pp. ??-??, Addis

on-Wesley Publishing Company. • STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific. • WATSON, B.W., 1995, Taxonomies and Toolkits of Regular Language Algorithms, Ph. D. Thesis, Eindhoven Univ

ersity of Technology, The Netherlands. • WIRTH, N., 1986, Algorithms & Data Structures, Chapter 1, pp. 17-72, Prentice-Hall.

32

Thank You!

1 kmp algorithm advisor: prof. r. c. t. lee reporter: c. w. lu knuth d.e., morris (jr) j.h., pratt...

Documents