overcoming computer word size limitation in bit-parallel pattern matching

48
5/2/2009 LSD&LAW'09, King's College, London, UK 1/48 Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching M. Oğuzhan Külekci TÜBİTAK - UEKAE Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching M. Oğuzhan Külekci TÜBİTAK-UEKAE National Research Institute of Electronics & Cryptology,Turkey [email protected] www.busillis.com/o_kulekci

Upload: alisa

Post on 22-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching. M. Oğuzhan Külekci TÜBİTAK-UEKAE National Research Institute of Electronics & Cryptology,Turkey [email protected] www.busillis.com/o_kulekci. Pattern Matching. On-line. Off-line. Approximate. Exact. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 1/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Overcoming Computer Word Size Limitation in

Bit-parallel Pattern Matching

M. Oğuzhan Külekci

TÜBİTAK-UEKAE

National Research Institute of

Electronics & Cryptology,Turkey

[email protected]

www.busillis.com/o_kulekci

Page 2: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 2/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

The area of research

Pattern Matching

On-line

Off-line

Exact

Approximate

Using Bit-parallelism

Other techniques

. . .

. . .

Page 3: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 3/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Bit-parallelism ?

• Computers perform bitwise operations very fast.

• Designing algorithms that benefit from that intrinsic property of processors

Page 4: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 4/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Previous bit-parallel pattern matching algorithms

• Shift-or algorithm– Shift-or (SO), (Baeza-Yates&Gonnet,1992)– Fast (FSO), Average optimal (AOSO), and Fast AOSO

(FAOSO), (Fredriksson&Grabowski,2005)

• BNDM algorithm– Actually (BDM + SO) BNDM

(Navarro&Raffinot,2000)– SBNDM (Peltola&Tarhio,2003)– SBNDM2 (Holub&Durian,2005)

Page 5: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 5/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Problems in Bit-parallel Pattern Matching

• Lack of shift mechanism (in original idea)– BNDM solved it.– Recent SO variants (AOSO, FAOSO) also

include shift mechanisms.

• Patterns are required to be no longer than the computer word size !

Page 6: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 6/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

What causes that limitation?

• The way that the bits are used!

• In previous approaches, each bit marks the position of a character in the pattern.

• If pattern is longer than the computer word size, more words are needed. significant drop in efficiency

Page 7: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 7/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Bitmasks in previous algorithms

• Mask creation in BNDM (SO is also similar) :

unsigned long B[ALPHABET_SIZE];for (a є Σ) B[a] = 0;

for j=1..m B[pj] = B[pj] | (1<<(m-j));

• Bits in mask B[c] express the location of character c in the pattern, – e.g. For pattern P = abaab

B[a] = 0....10110 B[b] = 0....01001

Page 8: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 8/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

How to overcome ?

• Load a different information, which is lenght independent, to a single bit.

• Each bit carries information about the whole pattern right shifted some amount in the proposed bit-parallel length independent matching (BLIM).

Page 9: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 9/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Basic notation

• Text T = t0t1t2t3...tn-1

• Pattern P = p0p1...pm-1

• Computer Word Size W • Σ denotes alphabet

Page 10: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 10/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Off-line Pattern Matching (in general...)

• Slide a window over the text

• Check & Shift

Text :

Pattern:

Page 11: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 11/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Sliding window

p0 p1 p2 pm-10p0 p1 p2 pm-1

p0 p1 p2 pm-1

1

2

0 1 2 3

p0 p1 p2 pm-1W-1

W-m+2

• The window that is to be slid over T.• W rows, ws = W+m-1 columns• ith row contains i character right shifted P.

Page 12: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 12/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Sliding window

01

2

0 1 2 3 4 5 6 7 8 9 10 11

3

4

5

6

7

a b a a ba b a a b

a b a a b

a b a a b

a b a a b

a b a a b

a b a a b

a b a a b

P = abaabW = 8

Page 13: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 13/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

01

2

3

4

5

6

7

a b a a ba b a a b

a b a a b

a b a a b

a b a a b

a b a a b

a b a a b

a b a a b

0 1 2 3 4 5 6 7 8 9 10 11

tj tj+1 tj+2 tj+3 tj+4 tj+5 tj+6 tj+7 tj+8 tj+9 tj+10 tj+11

b

Mask[b][6] = 1 0 1 0 0 1 1 1 = A7

b0=1

b1=1

b2=1

b3=0

b4=0

b5=1

b6=0

b7=1

Bitmask Creation

Page 14: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 14/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Bitmask

• Mask[ch][pos] is a bitvector of W bits as

bw-1bw-2 ... b1b0

where ch Є Σ, and 0 pos (W+m-1)

• Bits denote which of the alignments in the investigation window are appropriate when one observes character ch at position pos.

Page 15: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 15/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Bitmask

• ith bit of Mask[ch][pos] gathers info whether the i character right shifted placement of pattern mathes with the observed ch at position pos.

bi = 0 , if (0 pos-i < m) and (ch ppos-i)

bi = 1 , otherwise

Page 16: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 16/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

FF FE FD FB F6 ED DB B7 6F DF BF 7FFE FD FA F4 E9 D3 A7 4F AF 3F 7F FFFE FC F8 F0 E0 C1 83 87 8F 1F 3F 7FFE FC F8 F0 E0 C1 83 87 8F 1F 3F 7F

0 1 2 3 4 5 6 7 8 9 10 11

pos

abcd

ch

P = abaab , W = 8, Σ = {a,b,c,d} (ch)ws = W + m – 1 = 12 (pos)Mask[ch][pos] = ...

Sample Bitmask

Page 17: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 17/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Up to now, we created the sliding window, and the associated bitmasks.

How those masks are used for

matching followed by a shift procedure?

Page 18: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 18/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Checking...

Text

Pattern window

ch

pos

flag = 1111 ... 1

flag = flag & Mask[ch][pos]

W bits

Page 19: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 19/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Checking...

Text

Pattern window

Continue until flag becomes zero or all the positions are visited.

If all positions visited, there are some matches. The index of the bits that are 1 on the flag determine

which of the alignments are observed.

In what order we visitthe positions on the window?

Page 20: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 20/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Scan Order

• A heuristic approach to visit minimum number of characters in case of a mismatch

ScanOrder = {m-i,2m-i,...,km-i}– i = 1,2,...m – (km-i) < ws (ws = W+m-1)

Page 21: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 21/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Scan Order

a b a a ba b a a b

a b a a ba b a a b

a b a a ba b a a b

a b a a ba b a a b

0 1 2 3 4 5 6 7 8 9 10 11

ScanOrder = 4, 9, 3, 8, 2, 7, 1, 6,11, 0, 5,10

Page 22: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 22/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Shifting...

Text

The amount of shift ?

Pattern window

Pattern window

Page 23: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 23/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Shift Mechanism

• Same as Sunday’s quick search

• Move right according to the immediate text character succeding the current window under investigation

Page 24: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 24/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Shift Mechanism

a b a a ba b a a b

a b a a ba b a a b

a b a a ba b a a b

a b a a ba b a a b

tj tj+1 tj+2 tj+3 tj+4 tj+5 tj+6 tj+7 tj+8 tj+9 tj+10 tj+11

a b a a ba b a a b

a b a a ba b a a b

a b a a b

tj+12

Character Shift Valuea 9b 8others... 13

...... ......

Page 25: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 25/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

BLIM Algorithm

Ws = W+m-1;

Compute Mask;

Compute ScanOrder;

Compute Shift;

Pad text T with ws number of NULL characters;

i=0;

while(i<n){

flag = Mask[T[i+ScanOrder[0]]][ScanOrder[0]];

for(i=j;j<ws;j++)

flag &= Mask[T[i+ScanOrder[j]]][ScanOrder[j]];

if (flag){

Check bits of the flag to locate occurences

}

i+=Shift[T[i+ws]];

}

Page 26: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 26/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Sample Run

Page 27: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 27/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Complexity

• Best case :

• Worst Case:

Minimum number of character comparison

Maximum shift

Minimum shift

Maximum number of character comparison

Page 28: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 28/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Experimental Results

• On DNA sequences (Manzini’s DNA compression corpus)

• On natural language text (enwik8.txt)

• 100 sample pattern for each length tested

• gcc -O3

• Intel Xeon 2.4 Ghz, 3GB memory

Page 29: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 29/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

BLIM vs. SO Family on DNA

BLIM vs. SO Family on Short DNA Pattern Search

0,000

0,100

0,200

0,300

0,400

0,500

0,600

Length

Av

era

ge

Ela

ps

ed

Tim

e i

n s

ec

.

BLIM

SO

FSO

AOSO

FAOSO

Page 30: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 30/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

BLIM vs. BNDM Family on DNA

BLIM vs. BNDM Family on Short DNA Pattern Search

0,000

0,100

0,200

0,300

0,400

0,500

0,600

Length

Av

era

ge

Ela

ps

ed

Tim

e i

n s

ec

.

BLIM

BNDM

SBNDM

SBNDM2

Page 31: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 31/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Overall Performance on DNA

Overall Performance of Bit-parallel algorithms on DNA Sequence Search

0

0,05

0,1

0,15

0,2

0,25

BLIM BNDM SBNDM SBNDM2 SO FSO AOSO FAOSO

Algo.

Av

era

ge

Ela

ps

ed

Tim

e

Page 32: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 32/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

BLIM vs. SO Family on Nat.Lan.

BLIM vs. SO Family on Short Natural Language Pattern Search

0,000

0,100

0,200

0,300

0,400

0,500

0,600

0,700

Length

Ave

rag

e E

lap

sed

Tim

e in

sec

.

BLIM

SO

FSO

AOSO

FAOSO

Page 33: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 33/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

BLIM vs. BNDM Family on Nat.Lan.

BLIM vs. BNDM Family on Short Natural Language Pattern Search

0,000

0,050

0,100

0,150

0,200

0,250

0,300

0,350

0,400

Length

Av

era

ge

Ela

ps

ed

Tim

e i

n s

ec

.

BLIM

BNDM

SBNDM

SBNDM2

Page 34: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 34/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Overall performance on Nat. Lan.

Overall Performance of Bit-parallel Algorithms on Short Natural Language Pattern Search

0,000

0,020

0,040

0,060

0,080

0,100

0,120

0,140

0,160

0,180

BL

IM

SO

FS

O

AO

SO

FA

OS

O

BN

DM

SB

ND

M

SB

ND

M2

Algo.

Av

era

ge

Ela

ps

ed

Tim

e

Page 35: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 35/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi-pattern Case

• Bit-parallel approaches suffer more in multi-pattern case, as the total length is more likely to exceed the word size.

• BLIM serves a good basis for that case with its ability to search up to W patterns of any length in a common bit-parallel fashion.

Page 36: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 36/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi-pattern BLIM

a b a a

b b a

a b a aa b a a

a b a a

b b ab b a

b b a

01234567

0 1 2 3 4 5 6

P = {abaa,bba}

R = W / |P| = 8/2 = 4

pivot = min{R-1+|Pi|, Pi Є P}= 6

ws = max{R-1+|Pi|, Pi Є P}= 7

Page 37: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 37/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi-pattern BLIM

• Bitmask creation – straight forward as before.

• ScanOrder– Let s = min{ Pi , Pi Є P }

• S1 = { s-i, 2s-i, ... ks-i}, i = 1,2,...,s• (ks-i) < pivot

ScanOrder = S1 U {pivot, pivot+1, ...., ws-1}

Page 38: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 38/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi-BLIM Scan Order

a b a a

b b a

a b a aa b a a

a b a a

b b ab b a

b b a

01234567

0 1 2 3 4 5 6

s = min {4,3} = 3

ScanOrder= 2, 5, 1, 4, 0, 3, 6

Page 39: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 39/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi-BLIM Shift Mechanism

a b a a

b b a

a b a aa b a a

a b a a

b b ab b a

b b a

01234567

tj tj+1 tj+2 tj+3 tj+4 tj+5 tj+6 tj+7 ...........

b b ab b a

b b a

a b a aa b a a

a b a aa b a a

Character Shift Valueaa 4ab 6ba 6bb 6{c,d}a 7{c,d}b 7...else 8

Page 40: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 40/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Experimental Results on Multi_BLIM

• Multi_BLIM is compared with Aho&Corasick and Commentz&Walter algorithms via the SPARE Parts 2003 toolkit.

• DNA pattern lenghts in between 4 to 30.• NL pattern lenghts in between 2 to 20.• Up to 32 patterns randomly collected for each test.• Intel Xeon 2.4GHz, 3GB Memory• Manzini’s DNA corpus & enwik8.txt

Page 41: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 41/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi_BLIM Performance on DNAMulti_BLIM vs. Aho-Corasick & Commentz-Walter on DNA

Sequences

0,00

50,00

100,00

150,00

200,00

250,00

300,00

350,00

400,00

450,00

1 4 7

10

13

16

19

22

25

28

31

Number of patterns

CP

U T

ime

AC

M_BLIM

CW_NAIVE

CW_NLA

CW_NORM

CW_OPT

CW_RLA

CW_WBM

Page 42: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 42/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Multi_BLIM Performance on Nat. Lan.

Multi_BLIM vs. Aho-Corasick & Commentz-Walter on Natural Language

0,0020,0040,0060,0080,00

100,00120,00140,00160,00

1 4 7 10 13 16 19 22 25 28 31

Number of patterns

CP

U T

ime

AC

M_BLIM

CW_NAIVE

CW_NLA

CW_NORM

CW_OPT

CW_RLA

CW_WBM

Page 43: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 43/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

About q-gram utilization?

• Instead of reading one character at a time, read more by the help of the recent advances in CPU architecture– Fredriksson, ‘Shift-or string matching with super

alphabets’,2003– Durian et al., ‘Tuning BNDM with q-grams’, 2009

• Unfortunately, not so much gain because of BLIM’s random access structure

Page 44: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 44/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

About q-gram utilization

Mainly 2 reasons for low gain when using q-grams in BLIM:

1. BLIM does not pass over the text sequentially, but instead performs distant reads on the investigation window.

2. Mask is of size |Σ|*(W+m-1). As Σ grows with q-gram usage, Mask becomes large that is not fitting into the first level cache.

Page 45: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 45/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Conclusion

• An initial attempt to solve computer word size limitation in bit-parallel pattern matching

• The speed is in range of SBNDM, and SBNDM2, with an additional advantage that it does not require to do something special when input pattern length is longer than W.

Page 46: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 46/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Conclusion

• It must be noted that, in general, it is slower than Lecroq’s new algorithm, and also for lengths longer than 100, backward (suffix)oracle matching is a better alternative (applies to all bit-parallel algorithms also).

• Multi pattern BLIM shows good performance and maybe a strong alternative for classical multi pattern search algorithms.

Page 47: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 47/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Acknowledgement

Thanks to – Jorma Tarhio, – Kimmo Fredriksson, – Thierry Lecroq,

for sharing their codes and comments.

Page 48: Overcoming  Computer Word Size Limitation in  Bit-parallel Pattern Matching

5/2/2009 LSD&LAW'09, King's College, London, UK 48/48

Overcoming Computer Word Size Limitation in Bit-parallel Pattern Matching

M. Oğuzhan KülekciTÜBİTAK - UEKAE

Thank you!

any question?