string matching in lempel-ziv compressed strings

54
String Matching in Lempe String Matching in Lempe l-Ziv Compressed Strings l-Ziv Compressed Strings 論論論論 M. Farach and M. Thorup M. Farach and M. Thorup 竹竹竹竹竹 竹竹竹竹 竹竹竹竹竹 竹竹竹竹 2 2 竹竹 竹竹 竹竹 竹竹 Algorithmica (1998) 20: 388-404 Algorithmica (1998) 20: 388-404

Upload: rasia

Post on 14-Jan-2016

73 views

Category:

Documents


0 download

DESCRIPTION

String Matching in Lempel-Ziv Compressed Strings. 論文紹介. Algorithmica (1998) 20: 388-404. M. Farach and M. Thorup. 竹田研究室 修士課程 2 年 喜田 拓也. Preliminaries. 用語の説明. prefix, substring, suffix. 用語の説明. F.E.R.C. ある文字列 w に対して、. w = xyz. サフィクス. プレフィクス. サブストリング. - PowerPoint PPT Presentation

TRANSCRIPT

  • String Matching in Lempel-Ziv Compressed StringsM. Farach and M. Thorup 2 Algorithmica (1998) 20: 388-404

  • Preliminaries

  • prefix, substring, suffix ()

  • prefix, substring, suffix ()w = nobinobita

  • Pattern Matching

  • Pattern Matching

  • Data Compression 0.0000000001% 400%453 3

  • Data Compressionaldoghqu3850pcxps;lafdjaeqw09bjzpafq05^@62:vzZIAPF(90rwDEVcx0832nkvl;pzp99OPF:eDfja

  • Goal of this paper

  • Ideaaldoghqu3850pcxps;lafdjaeqw09bjzpafq05^@62:vzZIAPF(90rwDEVcx0832nkvl;pzp99OPF:eDfja 0.0000000001% 400%453 3

  • Ideaaldoghqu3850pcxps;lafdjaeqw09bjzpafq05^@62:vzZIAPF(90rwDEVcx0832nkvl;pzp99OPF:eDfja

  • Previous studiesEilam-Tsoreff and VishkinAmir, Landau, and VishikinAmir and BensonFarach and ThorupGasieniec, et al.Amir, Benson and FarachKarpinski, et al.Miyazaki, et al.

    Kida, et al.yearresearchercompression methodrun-lengthtwo-dimensionalrun-lengthLZ77LZ77LZWstraight-line programsstraight-line programsLZW198819921992199519961996199719971998Kida

  • LZ77 CompressionLZ77

  • contents

  • example Z = a b c

  • useful propertyb a c a

  • Main Algorithm

  • Basic Idea existence problem

  • Basic Ideai prefix substring i suffix substring i substring Yes

  • Basic IdeaPattern: b a c ai i+1 a b c c b c b a c a c a c b a bb a c a

  • Winding Phase

  • Winding Phase

  • Winding Phase

  • Winding Phase

  • Winding Phase

  • Winding Phase

  • Winding Phase

  • Winding Phase

  • Winding Phasea b a b c a b a b c b a c a b a b b c a b i = 3

  • Unwinding Phase

  • Unwinding Phasea b ca b a b c a b a b c b a c a b a b b c a b Pattern: b a c a

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b 55

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b 6

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b

  • Final operation

  • Complexity of Algorithm

  • Winding Phase

  • Winding PhaseL :O( log | Z | )Balanced tree

  • Winding PhaseL :O( log | Z | )O( 1 )

  • Winding PhaseL :O( | Z | log |T | )Segment-Merge

  • Winding Phase

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b 55

  • Unwinding PhasePattern: b a c aa b a b c a b a b c b a c a b a b b c a b O( | Z | )O( log| P | )55

  • Unwinding PhaseO( P + | Z | ( | Z | + log |P | ) )

  • final operationO( log| P | )O( log| P | )O( log| P | )O( log| P | )O( log| P | )O( log| P | )O( log| P | )O( |Z| log| P | )

  • Total

  • Conclusion

  • conclusion

  • conclusion

    String Matching in Lempel-Ziv Compressed String W W X,Y,Z X W Y W Z W W W W nob,nobin W existence problem YES,NO YES all-occurrences problem KMPBM

    KMPFarachThorupLZ77LZ77LZ77 0 a -3,1 b -2,1 ab 0 2 0,2 c -1,1 ababc 0 5 0,5 substring substring baca LZ77LZ77P+P- i P-i P+i+1 substring i i+1 i suffix substring P- prefix substring P+ P-i P+i+1 YES P- P+ P- P+ Winding phaseWinding PhaseWinding PhaseWinding PhaseP- cP+ a c a i=3 YES Unwinding Phase Unwinding Phase Winding Phase P+P-Unwinding PhaseUnwinding Phase

    ba substring ba ba Unwinding ca substring P+ P- i 5 YESLZ77Winding Phase Balanced treeO(log|Z|)Segment-MergeO(|Z|log|T|)Winding Phase Unwinding Phase Unwinding Phase Winding PhaseWinding Phase substring substring O(|P|)O(log|P| )Unwinding PhaseP+P- substring substring T/Zlog |P|O(|P|+|Z|)(competitive and opportunistic)