module 4 arithmetic coding

23
Module 4, Data Compression 1 LISA, NTPU Module 4 Arithmetic Coding Prof. Hung-Ta Pai

Upload: anithabalaprabhu

Post on 29-Jun-2015

2.213 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Module 4 Arithmetic Coding

Module 4, Data Compression 1LISA, NTPU

Module 4Arithmetic Coding

Prof. Hung-Ta Pai

Page 2: Module 4 Arithmetic Coding

Module 4, Data Compression 2LISA, NTPU

Reals in BinaryAny real number x in the interval [0, 1) can be represented in binary as .b1b2... where bi is a bit

Page 3: Module 4 Arithmetic Coding

Module 4, Data Compression 3LISA, NTPU

First Conversion

L:=0; R:=1; i :=1;while x > L *

if x < (L+R)/2 then bi := 0; R := (L+R)/2;if x ≥ (L+R)/2 then bi := 1; L := (L+R)/2;i := i + 1;

end{while}bi := 0 for all j ≥ i;

* Invariant: x is always in the interval [L, R)

Page 4: Module 4 Arithmetic Coding

Module 4, Data Compression 4LISA, NTPU

Basic IdeasRepresent each string x of length n by a unique interval [L, R) in [0, 1)The width of the interval [L, R) represents the probability of x occurringThe interval [L, R) can itself be represented by any number, called a tag, within the half open intervalThe k significant bits of the tag .t1t2t3.... is the code of x

That is, .t1t2t3...tk000... is in the interval [L, R)

Page 5: Module 4 Arithmetic Coding

Module 4, Data Compression 5LISA, NTPU

Example

1. Tag must be in the half open interval2. Tag can be chosen to be (L+R)/23. Code is the significant bits of the tag

Page 6: Module 4 Arithmetic Coding

Module 4, Data Compression 6LISA, NTPU

Better Tag

Page 7: Module 4 Arithmetic Coding

Module 4, Data Compression 7LISA, NTPU

Example of CodesP(a) = 1/3, P(b) = 2/3

Page 8: Module 4 Arithmetic Coding

Module 4, Data Compression 8LISA, NTPU

Code Generation from TagIf binary tag is .t1t2t3... = (L+R)/2 in [L, R), then we want to choose k to form the code t1t2 ...tkShort code: choose k to be as small as possible so that L ≤ . t1t2 ...tk000... < RGuaranteed code:

Choose k = ⎡log2(1/(R-L))⎤ + 1L ≤ . t1t2 ...tkb1b2b3... < R for any bits b1b2b3... For fixed length strings provides a good prefix codeExample: [.000000000..., .000010010...), tag = .000001001...

Short code: 0Guaranteed code: 000001

Page 9: Module 4 Arithmetic Coding

Module 4, Data Compression 9LISA, NTPU

Guaranteed Code ExampleP(a) = 1/3, P(b) = 2/3

Guaranteed code -> Prefix code

Page 10: Module 4 Arithmetic Coding

Module 4, Data Compression 10LISA, NTPU

Coding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Encode x1x2...xn

Initialize L := 0; and R:=1;For i = 1 to n do

W := R - L;L := L + W * C(xi);R := L + W * P(xi);

end;t := (L+R)/2; choose code for the tag

Page 11: Module 4 Arithmetic Coding

Module 4, Data Compression 11LISA, NTPU

Coding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4abca

Page 12: Module 4 Arithmetic Coding

Module 4, Data Compression 12LISA, NTPU

Coding ExcerciseP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4bbbb

Page 13: Module 4 Arithmetic Coding

Module 4, Data Compression 13LISA, NTPU

Decoding (1/3)Assume the length is known to be 30001 which converts to the tag .0001000

Page 14: Module 4 Arithmetic Coding

Module 4, Data Compression 14LISA, NTPU

Decoding (2/3)Assume the length is known to be 30001 which converts to the tag .0001000

Page 15: Module 4 Arithmetic Coding

Module 4, Data Compression 15LISA, NTPU

Decoding (3/3)Assume the length is known to be 30001 which converts to the tag .0001000

Page 16: Module 4 Arithmetic Coding

Module 4, Data Compression 16LISA, NTPU

Decoding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Decode b1b2...bm, number of symbols is n

Initialize L := 0; and R:=1;t := b1b2...bm000...for i = 1 to n do

W := R - L;find j such that L + W * C(aj) ≤ t < L + W * (C(aj)+P(aj));output aj;L := L + W * C(aj); R = L + W * P(aj);

Page 17: Module 4 Arithmetic Coding

Module 4, Data Compression 17LISA, NTPU

Decoding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/400101

Page 18: Module 4 Arithmetic Coding

Module 4, Data Compression 18LISA, NTPU

Decoding IssuesThere are two ways for the decoder to know when to stop decoding

Transmit the length of the stringTransmit a unique end of string symbol

Page 19: Module 4 Arithmetic Coding

Module 4, Data Compression 19LISA, NTPU

Practical Arithmetic CodingScaling:

By scaling we can keep L and R in a reasonable range of values so that W = R–L does not underflowThe code can be produced progressively, not at the endComplicates decoding some

Integer arithmetic coding avoids floating point altogether

Page 20: Module 4 Arithmetic Coding

Module 4, Data Compression 20LISA, NTPU

AdaptationSimple solution – Equally Probable Model

Initially all symbols have frequency 1After symbol x is coded, increment its frequency by 1Use the new model for coding the next symbolExample in alphabet a, b, c, d

Page 21: Module 4 Arithmetic Coding

Module 4, Data Compression 21LISA, NTPU

Zero Frequency ProblemHow do we weight symbols that have not occurred yet?

Equal weight? Not so good with many symbolsEscape symbol, but what should its weight be?When a new symbol is encountered send the <esc>, followed by the symbol in the equally probable model (both encoded arithmetically)

Page 22: Module 4 Arithmetic Coding

Module 4, Data Compression 22LISA, NTPU

End of File ProblemSimilar to Zero Frequency ProblemReasonable solution:

Add EOF to the post-ESC equally-probable modelWhen done compressing:

First send ESCThen send EOF

What’s the cost of this approach?

Page 23: Module 4 Arithmetic Coding

Module 4, Data Compression 23LISA, NTPU

Arithmetic vs. HuffmanBoth compress very wellFor m symbol grouping

Huffman is within 1/m of entropyArithmetic is within 2/m of entropy

ContextHuffman needs a tree for every contextArithmetic needs a small table of frequencies for every context

AdaptationHuffman has an elaborate adaptive algorithmArithmetic has a simple adaptive mechanism