module 4 arithmetic coding

Post on 29-Jun-2015

2.214 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Module 4, Data Compression 1LISA, NTPU

Module 4Arithmetic Coding

Prof. Hung-Ta Pai

Module 4, Data Compression 2LISA, NTPU

Reals in BinaryAny real number x in the interval [0, 1) can be represented in binary as .b1b2... where bi is a bit

Module 4, Data Compression 3LISA, NTPU

First Conversion

L:=0; R:=1; i :=1;while x > L *

if x < (L+R)/2 then bi := 0; R := (L+R)/2;if x ≥ (L+R)/2 then bi := 1; L := (L+R)/2;i := i + 1;

end{while}bi := 0 for all j ≥ i;

* Invariant: x is always in the interval [L, R)

Module 4, Data Compression 4LISA, NTPU

Basic IdeasRepresent each string x of length n by a unique interval [L, R) in [0, 1)The width of the interval [L, R) represents the probability of x occurringThe interval [L, R) can itself be represented by any number, called a tag, within the half open intervalThe k significant bits of the tag .t1t2t3.... is the code of x

That is, .t1t2t3...tk000... is in the interval [L, R)

Module 4, Data Compression 5LISA, NTPU

Example

1. Tag must be in the half open interval2. Tag can be chosen to be (L+R)/23. Code is the significant bits of the tag

Module 4, Data Compression 6LISA, NTPU

Better Tag

Module 4, Data Compression 7LISA, NTPU

Example of CodesP(a) = 1/3, P(b) = 2/3

Module 4, Data Compression 8LISA, NTPU

Code Generation from TagIf binary tag is .t1t2t3... = (L+R)/2 in [L, R), then we want to choose k to form the code t1t2 ...tkShort code: choose k to be as small as possible so that L ≤ . t1t2 ...tk000... < RGuaranteed code:

Choose k = ⎡log2(1/(R-L))⎤ + 1L ≤ . t1t2 ...tkb1b2b3... < R for any bits b1b2b3... For fixed length strings provides a good prefix codeExample: [.000000000..., .000010010...), tag = .000001001...

Short code: 0Guaranteed code: 000001

Module 4, Data Compression 9LISA, NTPU

Guaranteed Code ExampleP(a) = 1/3, P(b) = 2/3

Guaranteed code -> Prefix code

Module 4, Data Compression 10LISA, NTPU

Coding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Encode x1x2...xn

Initialize L := 0; and R:=1;For i = 1 to n do

W := R - L;L := L + W * C(xi);R := L + W * P(xi);

end;t := (L+R)/2; choose code for the tag

Module 4, Data Compression 11LISA, NTPU

Coding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4abca

Module 4, Data Compression 12LISA, NTPU

Coding ExcerciseP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4bbbb

Module 4, Data Compression 13LISA, NTPU

Decoding (1/3)Assume the length is known to be 30001 which converts to the tag .0001000

Module 4, Data Compression 14LISA, NTPU

Decoding (2/3)Assume the length is known to be 30001 which converts to the tag .0001000

Module 4, Data Compression 15LISA, NTPU

Decoding (3/3)Assume the length is known to be 30001 which converts to the tag .0001000

Module 4, Data Compression 16LISA, NTPU

Decoding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Decode b1b2...bm, number of symbols is n

Initialize L := 0; and R:=1;t := b1b2...bm000...for i = 1 to n do

W := R - L;find j such that L + W * C(aj) ≤ t < L + W * (C(aj)+P(aj));output aj;L := L + W * C(aj); R = L + W * P(aj);

Module 4, Data Compression 17LISA, NTPU

Decoding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/400101

Module 4, Data Compression 18LISA, NTPU

Decoding IssuesThere are two ways for the decoder to know when to stop decoding

Transmit the length of the stringTransmit a unique end of string symbol

Module 4, Data Compression 19LISA, NTPU

Practical Arithmetic CodingScaling:

By scaling we can keep L and R in a reasonable range of values so that W = R–L does not underflowThe code can be produced progressively, not at the endComplicates decoding some

Integer arithmetic coding avoids floating point altogether

Module 4, Data Compression 20LISA, NTPU

AdaptationSimple solution – Equally Probable Model

Initially all symbols have frequency 1After symbol x is coded, increment its frequency by 1Use the new model for coding the next symbolExample in alphabet a, b, c, d

Module 4, Data Compression 21LISA, NTPU

Zero Frequency ProblemHow do we weight symbols that have not occurred yet?

Equal weight? Not so good with many symbolsEscape symbol, but what should its weight be?When a new symbol is encountered send the <esc>, followed by the symbol in the equally probable model (both encoded arithmetically)

Module 4, Data Compression 22LISA, NTPU

End of File ProblemSimilar to Zero Frequency ProblemReasonable solution:

Add EOF to the post-ESC equally-probable modelWhen done compressing:

First send ESCThen send EOF

What’s the cost of this approach?

Module 4, Data Compression 23LISA, NTPU

Arithmetic vs. HuffmanBoth compress very wellFor m symbol grouping

Huffman is within 1/m of entropyArithmetic is within 2/m of entropy

ContextHuffman needs a tree for every contextArithmetic needs a small table of frequencies for every context

AdaptationHuffman has an elaborate adaptive algorithmArithmetic has a simple adaptive mechanism

top related