string matching in lempel-ziv compressed strings
DESCRIPTION
String Matching in Lempel-Ziv Compressed Strings. 論文紹介. Algorithmica (1998) 20: 388-404. M. Farach and M. Thorup. 竹田研究室 修士課程 2 年 喜田 拓也. Preliminaries. 用語の説明. prefix, substring, suffix. 用語の説明. F.E.R.C. ある文字列 w に対して、. w = xyz. サフィクス. プレフィクス. サブストリング. - PowerPoint PPT PresentationTRANSCRIPT
-
String Matching in Lempel-Ziv Compressed StringsM. Farach and M. Thorup 2 Algorithmica (1998) 20: 388-404
-
Preliminaries
-
prefix, substring, suffix ()
-
prefix, substring, suffix ()w = nobinobita
-
Pattern Matching
-
Pattern Matching
-
Data Compression 0.0000000001% 400%453 3
-
Data Compressionaldoghqu3850pcxps;lafdjaeqw09bjzpafq05^@62:vzZIAPF(90rwDEVcx0832nkvl;pzp99OPF:eDfja
-
Goal of this paper
-
Ideaaldoghqu3850pcxps;lafdjaeqw09bjzpafq05^@62:vzZIAPF(90rwDEVcx0832nkvl;pzp99OPF:eDfja 0.0000000001% 400%453 3
-
Ideaaldoghqu3850pcxps;lafdjaeqw09bjzpafq05^@62:vzZIAPF(90rwDEVcx0832nkvl;pzp99OPF:eDfja
-
Previous studiesEilam-Tsoreff and VishkinAmir, Landau, and VishikinAmir and BensonFarach and ThorupGasieniec, et al.Amir, Benson and FarachKarpinski, et al.Miyazaki, et al.
Kida, et al.yearresearchercompression methodrun-lengthtwo-dimensionalrun-lengthLZ77LZ77LZWstraight-line programsstraight-line programsLZW198819921992199519961996199719971998Kida
-
LZ77 CompressionLZ77
-
contents
-
example Z = a b c
-
useful propertyb a c a
-
Main Algorithm
-
Basic Idea existence problem
-
Basic Ideai prefix substring i suffix substring i substring Yes
-
Basic IdeaPattern: b a c ai i+1 a b c c b c b a c a c a c b a bb a c a
-
Winding Phase
-
Winding Phase
-
Winding Phase
-
Winding Phase
-
Winding Phase
-
Winding Phase
-
Winding Phase
-
Winding Phase
-
Winding Phasea b a b c a b a b c b a c a b a b b c a b i = 3
-
Unwinding Phase
-
Unwinding Phasea b ca b a b c a b a b c b a c a b a b b c a b Pattern: b a c a
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b 55
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b 6
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b
-
Final operation
-
Complexity of Algorithm
-
Winding Phase
-
Winding PhaseL :O( log | Z | )Balanced tree
-
Winding PhaseL :O( log | Z | )O( 1 )
-
Winding PhaseL :O( | Z | log |T | )Segment-Merge
-
Winding Phase
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b 55
-
Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b O( | Z | )O( log| P | )55
-
Unwinding PhaseO( P + | Z | ( | Z | + log |P | ) )
-
final operationO( log| P | )O( log| P | )O( log| P | )O( log| P | )O( log| P | )O( log| P | )O( log| P | )O( |Z| log| P | )
-
Total
-
Conclusion
-
conclusion
-
conclusion
String Matching in Lempel-Ziv Compressed String W W X,Y,Z X W Y W Z W W W W nob,nobin W existence problem YES,NO YES all-occurrences problem KMPBM
KMPFarachThorupLZ77LZ77LZ77 0 a -3,1 b -2,1 ab 0 2 0,2 c -1,1 ababc 0 5 0,5 substring substring baca LZ77LZ77P+P- i P-i P+i+1 substring i i+1 i suffix substring P- prefix substring P+ P-i P+i+1 YES P- P+ P- P+ Winding phaseWinding PhaseWinding PhaseWinding PhaseP- cP+ a c a i=3 YES Unwinding Phase Unwinding Phase Winding Phase P+P-Unwinding PhaseUnwinding Phase
ba substring ba ba Unwinding ca substring P+ P- i 5 YESLZ77Winding Phase Balanced treeO(log|Z|)Segment-MergeO(|Z|log|T|)Winding Phase Unwinding Phase Unwinding Phase Winding PhaseWinding Phase substring substring O(|P|)O(log|P| )Unwinding PhaseP+P- substring substring T/Zlog |P|O(|P|+|Z|)(competitive and opportunistic)