· web viewtext compression, modem communications, image compression. techniques that incorporate...

29
Adopted from K.Sayood, “Introduction to Data Compression“, 4 th edition, Morgan Kaufmann,2012 Ch. 5 Dictionary techniques LZ, LZ 77 (or LZ 1), LZ 78 (or LZ 2), LZ W Lempel-Ziv-Welch algorithm Applications Unix Compression Command V-42bis PK Zip, Zip, GIF L Harc, PNG, gzip and ARJ Text Sources/ Computer Commands ( Sources that generate a relatively small number of patterns quite frequently.) Applications: Text Compression, Modem Communications, Image Compression. Techniques that incorporate structure in the data in order to increase Compression 1) Static

Upload: phammien

Post on 29-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Adopted from K.Sayood, “Introduction to Data Compression“, 4 th edition, Morgan Kaufmann,2012

Ch. 5 Dictionary techniquesLZ, LZ 77 (or LZ 1), LZ 78 (or LZ 2), LZ W

Lempel-Ziv-Welch algorithm

Applications

Unix Compression Command

V-42bis PK Zip, Zip,

GIF L Harc, PNG, gzip and

ARJ

Text Sources/ Computer Commands

( Sources that generate a relatively small number of patterns quite frequently.)

Applications:

Text Compression, Modem Communications, Image Compression.

Techniques that incorporate structure in the data in order to increase Compression

1) Static

2) Dynamic (Adaptive)

Commonly occurring patterns. Develop an index for these.

Most useful with sources that generate a relatively small number of patterns quite frequently such as text sources and computer commands class of frequently occurring patterns (size of dictionary) must be much smaller than the number of all possible patterns.

DICTIONARY

Ex:Consider 4 character words, 3 character from lower case English alphabet (26 letters) one character from six punctuation marks(, ? . ! ; :)

Alphabet size = 32 (26 letters + 6 punctuation marks)

Number of character patterns = 324 = 220 = 1048576

Need 20 bits (5 bits/character) to code each pattern. Assume 256 most likely patterns placed into a dictionary.

1-bit flag

0 (In the dictionary) + 8 bits for pattern in the dictionary (Total 9 bits)

1(not in dictionary) + 20 bits for pattern (Total 21 bits)

p = probability of pattern from the dictionary

Ar. Number of bits/ pattern = R

R = 9p + 21(1-p) = 21-12p, (5.1)

For R< 20, p≥0.084

20 = 21-12p

12p = 1, p = 1/12 ≈ 0.084

p should be as large as possible. Carefully select patterns that are most likely to occur as entries in the dictionary.

Static approach: Dictionary developed before encoding

Adaptive of Dynamic approach: Dictionary developed on the fly.

5.3 Static Dictionary

Most appropriate when considerable prior knowledge about the source is available.

Ex. Student records, bank statements, credit card statements

Efficient for a specific application

Application-specific or data-specific static-dictionary-based coding scheme is the most efficient. The coding scheme designed for a specific application may not work well for a different application.

5.3.1 Digram Coding

Static Dictionary Coding.

Digrams: pairs of letters

ASC II characters

Digram Coding: static dictionary technique that is less specific to a single application.

Ex 5.3.1/ p 119 (Source)

5-letter alphabet A = {a,b,c,d,r}

Encode ‘abracadabra’

Table 5.1: A sample dictionary

Code Entry Code Entry000 a 100 r001 b 101 ab010 c 110 ac

011 d 111 ad

Add 101100110111101100000

101⏟ab

100⏟r

110⏟ac

111⏟ad

101⏟ab

100⏟r

000⏟a

Dictionary designed for LaTex (Table 5.2) is not suitable for C programs.

nl = new line = space

Technique (generating dictionary) to adapt to source output characteristics.

Table 5-2 Table 5.3

(Latex document C-programs

Ch. 5) These tables are different.

5.4 Adaptive dictionary based technique. (LZ 77)

Lempel-Ziv 1977-LZ1

Lempel-Ziv 1978-LZ2

Lempel-Ziv-Welch - LZW

WAN data communication products use LZ 77 or LZ 78 algorithm (see table 7.4, p. 186, Hoffman, “Data compression in digital systems: Kluwer, 1995).

Publishing! Text, graphics and print ready images are compressed with LZW and other lossless algorithmsIbid p. 292.

Ex. 5.4.4 LZW algorithm decoding

Encoder output sequence

5 2 3 3 2 1 6 8 10 12 9 11 7 16

5 4 4 11 21 4 (see Table 5)

Decoder starts with the same initial dictionary as the encoder (Table 3)

Table 3 Initial LZW dictionary

Index Entry Index Entry1 6 wa2 a 7 ab3 b 8 bb4 o 9 ba5 w 10 ab

Start with Index 5 corresponds to w, decode (Already in the dictionary)

Next decoder input is 2 (index) corresponds to ‘a’

Decode ‘a’ and concatenate with our current pattern to form ‘wa’. This is not in the dictionary. Add this as 6th element of the dictionary and start a new pattern beginning with ‘a’

The next four inputs 3 3 2 1

Corresponds to b b a b

These generate ab⏟(7)

bb⏟(8)

ba⏟(9)

∧ab⏟

(10)

The next input is 6 wa

Concatenate b with w to form bw (11)

New pattern starts with w

(‘wa’ already in the dictionary)

Index 8 bb

Concatenate ‘wa’ with ‘b’ to wab(12)

Continue the construction (decoding) of the LZW dictionary.

Situation where LZW decoding breaks down

Table 5.10: Initial dictionary for abababab

Index Entry1 a2 b

Table 5.11: Final dictionary for abababab

Index Entry Index Entry

1 a 9 ababa2 b 10 ababab3 ab 11 babab4 ba 12 bababa5 aba 13 abababa6 abab 14 abababab7 bab 15 bababab8 baba

Source alphabet A = {a,b}

Encode the sequence ababababab -------

Transmitted sequence 1 2 3 5--------

Decoding: Begin with initial dictionary (Table 5.10).

(1, 2) decoded as (a,b) leads to 3rd entry ab. Next input is 3 (gives ab). Next is 4 (gives ba). See table (5.14). Next input is 5. Not in the dictionary

5.5 Applications: LZW is one of the most widely used compression algorithms.

Table 5.13: Constructing the fifth entry (stage one)

Index Entry1 a2 b3 ab4 ba5 a…

Table 5.14: Constructing the fifth entry (stage two)

Index Entry1 a2 b3 ab4 ba5 ab…

Table 5.14: Completion of the fifth entry.

Index Entry1 a2 b3 ab4 ba5 aba6 a…

See prob8/ p. 140

Program diffim, huff_enc

(Compress command)

(Unix Compress Command)

LZW decoder has to contain an exception handler to handle the special case of decoding an index that does not have a corresponding complete entry in the decoder dictionary.

(See Tables 4.7 and 4.8)

Table 5.16: Comparison of GIF with arithmetic coding

Image GIF Arithmetic Coding of Pixel Values

Arithmetic Coding of Pixel Difference

Sena 51,085 53,431 31,847Sensin 60,649 58,306 37,126Earth 34,276 38,248 32,137

Omaha 61,580 56,061 51,393

5.5.2 GIF (Image Compression)

Developed by Compuserve Info Service to encode graphical images (For details see pages 151, 152). GIF is very popular for encoding all kinds of images both computer generated and natural images. Not very efficient to losslessly compress images of natural scenes,photographs, satellite images etc., (see table 5.16 above)

References

1. J. Ziv. and A. Lempel "A Universal Algorithm for Data Compression," IEEETrans. on Information Theory, vol. IT-23, pp. 337-343, May 1977.

2. J. Ziv and A. Lempel "Compression of Individual Sequences via Variable-RateCoding," IEEE Trans. on Information Theory, vol. IT-24, pp. 530-536, Sept. 1978.

3. J. A. Storer and T. G. Syzmanski, "Data Compression via Textual Substitution,"Journal of the ACM, pp. 928-951,1982.

4. T. C. Bell "Better OPMIL Text Compression," IEEE Trans. on Comm., vol. COM-34, pp. 1176-1182, Dec. 1986.

5. T. A. Welch "A Technique for High-Performance Data Compression," IEEE Computer, pp. 8-19, June 1984.

6. T. C. Bell, J. G. Cleary, and I. H. Witten "Text Compression," Advanced Reference Series. Englewood Cliffs, NJ: Prentice Hall, 1990.

7. M. Nelson "The Data Compression Book," New York: M&T Books, 1991.8. G. Held and T. R. Marshall "Data Compression," New York: Wiley, third edition, 1991.9. P. Marchand, "Graphics and GUI's with MATLAB," Boca Raton, FL: CRC Press, 1996.10. W. Kou, "Digital Image Compression Algorithms and Standards," Amsterdam, Kluwer

Academic, 1995.11. G. Louchard and W. Szpankowski, "Generalized Lempel-Ziv parsing scheme and its

preliminary analysis of the average profile," DCC '95 Data Compression Conf., pp. , Snowbird, UT, March 1995.

12. R. Horspool, "The effect of non-greedy parsing Lempel-Ziv compression methods," DCC' 95 Data Compression Conf., pp. ,Snowbird, UT, March 1995.

13. G. Louchard and W. Szpankowski, "On the Average Redundancy Rate of the Lempel-Ziv Code," DCC '96, Data Compression Conf., Snowbird, UT, April 1996.

14. J. A. Storer, "Lossless Image Compression Using Generalized LZ1-Type Methods," DCC' 96, Data Compression Conf., UT, April 1996.

15. C. T. Chen and L. G. Chen, "A novel architecture for Lempel-Ziv based data compression," IEEE ICCE, Chicago, IL, June 1996.

16. D. Sheinwald, "On the Ziv-Lempel proofand related topics," Proc. IEEE, vol. 82, pp. 866-871, June 1994.

17. A. D. Wyner and J. Ziv, "The sliding window Lempel-Ziv algorithm is asymptotically optimal," Proc. IEEE, vol. 82, pp. 872-877, June 1994.

18. Y. F. Hu and X. S. Wu, "The methods of improving the compression ratio ofLZ77 family data compression algorithms," ICSP, Beijing, China, Oct. 1996.

19. V. G. Ruiz and I. Garcia, "A lossy data compressor based on the LZW algorithm,"ICSPAT 96, pp. 1002-1006, Boston, MA, Oct. 1996.

20. S. A. Savari, "Redundancy of the Lempel-Ziv-Welch Code," Data Compression Conf., (DCC 97), Snowbird, UT, March 1997.

21. S. R. Kosaraju and G. Manzini, "Compression oflow entropy strings with LempelZiv algorithms," Compression and Complexity of Sequences 1997, Salerno, Italy,June 1997.

22. J. I. Lathrop and M. Strauss, "A universal upper bound on the performance of the Lempel-Zivalgorithm on maliciously-constructed data," Compression and Complexity ofSequences 1997, Salerno, Italy, June 1997.

23. D. Greene et al, "A progressive Ziv-Lempel algorithm for image compression," Compression andComplexity of Sequences 1997, Salerno, Italy, June 1997.

24. M. Cohn and H. Helfgott, "Asymmetry in Ziv-Lempel compression," Compression andComplexity of sequences 1997, Salerno, Italy, June 1997.

25. S. De Agostino, "A parallel decoder for LZ2 compression using the ID update heuristic,"Compression and Complexity of sequences 1997, Salerno, Italy, June 1997.

26. R. H. Wyman and P. Y. K. Cheung, "Bit plane differential LZW for the compression of video for variable bandwidth channels," IEEE ISCAS' 97, Hong Kong,June 1997.

27. C. Su, C-F. Yan and J-C. Yo, "Hardware efficient updating technique for LZW codec design," IEEE ISCAS' 97, Hong Kong, June 1997.

28. C. T. Chen and L. G. Chen, "High-Speed VLSI design of the LZ-based datacompression," IEEE ISCAS'97, Hong Kong, June 1997.

29. G. Held, "Data and image compression: Tools and techniques," 4th Edition, New York, NY: Wiley, 1996.

30. P. Tischer, "A modified LZW data compression scheme," Australian ComputerScience Commun., vol. 9, pp. 262-272, 1987.

31. R. Hoffman, "Data compression in digital systems," New York, NY: Chapman & Hall,1997.

32. D.J. Craft, "ADLC and a pre-processor extension, BDLC, provides ultra fast compression for general-purpose bit-mapped image data," Data Compression Conf., p.400, IEEE Computer Society Press, 1995. (ADLC - Adaptive lossless data compression, BDLC - Bit-mapped lossless datacompression, an LZ77 variant).

33. T. Kida et al, "Multiple pattern matching in LZW compressed text," IEEE DCC Conf, UT,Mar. 1998.

34. S. Even, "Four value adding algorithms," IEEE Spectrum, vol. 35, pp.33-38, May 1998.35. J. C. Kieffer, T.H. Park and Y. Xu, "Progressive lossless image coding via self referential

partitions," IEEE ICIP, pp. , Chicago, IL, Oct. 1998.36. C-Ho Cheung, C. S-Wai and P. Lai-Man, " Predictive lossy LZSS algorithm for fidelity

constrainedimage coding," Intl. Forum cum Conf. on Info. Technology and Commun. at the dawn of the new Millennium, Bangkok, Thailand, Aug. 2000.

37. Y-K. Lai and K-C. Chen, " A novel VLSI architecture for Lempel-Ziv based data compression,"IEEE ISCAS, Geneva, Switzerland, May 2000.

38. L.P.Deutsch, "Deflate compressed data format specification," Request for Comments (RFC), 1951, available in ftp ftp://ftp.uu.netlpub/archiving/zip/doc/1996.

39. J. Miano, " Compressed image file formats: JPEG, PNG, GIF, XBM, BMP,"Addison Wesley, 1999. (software on disk)

40. H.H. Shih, S.S. Narayanan and C.-C. Jay Kuo, "Automatic main melody extraction from MIDI files with a modified Lempel-Ziv algorithm," IEEE ISIMP 2001, Hong Kong, May 2001.

41. M. J. Weinberger and Ordentlich, “On-line decision making for a class of loss functions via Lempel-ziv parsing”, DCC 2000, Snow Bird, UT March 2000, http://www.cs.brandeis.edu/~dcc

42. Y. Reznik and W. Szpankowski, “On the average redundancy rate of the Lempel-ziv code with K-error protocol,” DCC 2000. Data compression conference.;

43. S. De Agostino, “Work-optimal parallel decoders for LZ2 data compression,” DCC 2000.44. N. J. Brittain and M. R. El-Sakka, “Grayscale true two-dimensional dictionary based

image compression,” JVCIR, vol. 18, pp 35-44, Feb 2007. (2D-LZ).45. J.D. Gibson et al, "Digital compression for multimedia," San Diego, CA: Academic

Press, 1998 (see Appendices E and F).46. M. Aboy, R. Hornero, D.Abasalo, and D. Alvarez. Interpretation of Lempel-Ziv

complexity measure in the context of biomedical signal analysis. IEEE Transactions on Biomedical Engineering,53(11):2282-2288,Nov.2006.

47. N. Radhakrishnan and B.N. Gangadhar. Estimating regularity in epileptic seizure time-series data. IEEE Engineering in Medicine and Biology Magazine,17:89-94,1998.

48. X.-S. Zhang, R.J. Roy, and E.W. Jensen. EEG complexity as a measure of depth of anesthesia for patients. IEEE Transactions on Biomedical Engineering,48(12):1424-1433, Dec.2001.

49. Daniel Abasolo, Roberto Hornero, Carlos Gomez, Maria Garcia, and Miguel Lopez. Analysis of EEG background activity in Alzheimer’s disease patients with Lampel-Ziv complexity and central tendency measure. Medical Engineering Physics,28(4):315-322,2006.

50. H. Zhang, Y.Zhu, and Z. Wang. Complexity measure and complexity rate information based detection of ventricular tachycardia and fibrillation. Medical and Biological Engineering amd Computing, 38:553-557,2000.

51. B. Li, J. Xu and F. Wu, "ld dictionary mode for Screen Content Coding," in Visual Communication and Image Processing Conference, pp. 189 - 192, Dec. 2014.

52. X. Guo et al, "Wyner - Ziv - based multiview video coding," IEEE trans. on CSVT, Vol. 18, pp. 713 - 714, June 2008.

53. J.-S. Kim and J.-G. Kim, "Reliability-based selective encoding in pixel-domain Wyner-Ziv residual video codec," Future Information Communication Technology and Applications, Lecture Notes in Electrical Engineering (LNEE), Vol. 235, pp. 359-367, Sep 2013.

54. J.-S. Kim, J.-G. Kim, H. Choi, and K.-D. Seo, "Pixel-domain Wyner-Ziv residual video coder with adaptive binary-to-Gray code converting process," Electronics Letters, Vol. 49, no.3, Jan. 2013.

Further Reading

1. Text Compression, by T.C. Bell, J.G. Cleary, and I.H. Witten. Text Compression. Advanced Reference Series. Prentice Hall, Eaglewood Cliffs, New Jersey, 1990. This provides an excellent exposition of dictionary-based coding techniques.

2. The Data Compression Book, by M.Nelson and J.-L.Gailley. The Data Compression Book. This also does a good job of describing the Ziv-Lempel algorithms. There is also a very nice description of some of the software implementation aspects.

3. Data Compression, by G. Held and T.R. Marshall. Data Compression. Wiley, third edition, 1991. This contains a description of diagram coding under the name “diatomic coding.” The book also includes BASIC programs that help in the design of dictionaries.

4. The PNG algorithm is described in a very accessible manner in “PNG Lossless Compression,” by G. Roelofs. PNG Lossless Compression. In K. Sayood, editor, Lossless Compression Handbook, pages 371-390. Academic Press,2003 .

5. A more in-depth look at dictionary compression is provided in “Dictionary- Based Data Compression: An Algorithm Perspective,” by S.C. Sahinalp and N.M. Rajpoot. Dictionary-Based Data Compression: An Algorithmic Perspective. In K Sayood, editor, Lossless Compression Handbook, pages 153-168. Academic Press, 2003.