ensc 424 - multimedia communications engineering topic...
TRANSCRIPT
9/18/2005J. Liang: SFU ENSC 424 1
ENSC 424 - Multimedia Communications EngineeringTopic 4: Huffman Coding 2
Jie LiangEngineering Science
Simon Fraser [email protected]
9/18/2005J. Liang: SFU ENSC 424 2
Outline
�Canonical Huffman code�Huffman encoding�Huffman decoding�Limitations of Huffman code�Context-adaptive Huffman coding
9/18/2005J. Liang: SFU ENSC 424 3
Canonical Huffman Code
� Huffman codes for a given data set is not unique
� Huffman algorithm is needed only to compute the optimal codeword lengths
� Canonical Huffman code is well structured� Given the codeword lengths, can find a
canonical Huffman code
http://portal.acm.org/citation.cfm?id=77566
9/18/2005J. Liang: SFU ENSC 424 4
Canonical Huffman Code
� Example:� Codeword lengths: 2, 2, 3, 3, 3, 4, 4� Verify that it satisfies Kraft-McMillan inequality
01
100
11
000 001
1010 1011
A non-canonical example
00 01
The Canonical Tree
� Rules:� Assign 0 to left branch and 1 to right branch� Build the tree from left to right in increasing order of depth� Each leaf is placed at the first available position
110100 101
10
1110 1111
111
121
≤∑=
−N
i
li
9/18/2005J. Liang: SFU ENSC 424 5
Canonical Huffman� Properties:
� The first code is a series of 0� Codes of same length
are consecutive: 100, 101, 110� If we pad zeros to the right side such that all codewords have the same length,
shorter codes would have lower value than longer codes:
0000 < 0100 < 1000 < 1010 < 1100 < 1110 < 1111
00 01
110100 101
1110 1111
� If from length n to n + 2 directly: e.g., 1, 3, 3, 3, 4, 4
C(n+2, 1) = 4( C(n, last) + 1)
0
110100 101
1110 1111
First code of length n+1
Last code of length n
� Formula from level n to level n+1:� C(n+1, 1) = 2 ( C(n, last) + 1): append a 0 to the next available level-n code
9/18/2005J. Liang: SFU ENSC 424 6
Advantages of Canonical Huffman1. Reducing memory requirement
� Non-canonical tree needs:�All codewords�Lengths of all codewords
� Need a lot of space for large table
01
100
11
000 001
1010 1011
00 01110100 101
1110 1111
� Canonical tree only needs:� Min: shortest codeword length� Max: longest codeword length� Distribution:
� Number of codewords in each level
� Min=2, Max=4, # in each level: 2, 3, 2
9/18/2005J. Liang: SFU ENSC 424 7
Advantages of Canonical Huffman
2. Simplifying the decodingSee Hirschberg & Lelewer’s paper for detailshttp://portal.acm.org/citation.cfm?id=77566(Not required in this course)
�Final note about Canonical Huffman:Different conventions of canonical form exist:�Can be built from the right to the left (exchanging
0 and 1). This convention is used in, e.g., “Practical Huffman coding” athttp://www.compressconsult.com/huffman
9/18/2005J. Liang: SFU ENSC 424 8
Outline�Review of last lecture�Huffman code�Canonical Huffman code�Huffman Encoding�Huffman Decoding�Limitations of Huffman code�Context-adaptive Huffman coding
9/18/2005J. Liang: SFU ENSC 424 9
Computer Implementation
� Need codewords: 8 bits, 16 bits, 32 bits …unsigned char Codewords[5] = {1, 1, 0, 2, 3};
�Also need to store length of each codes� unsigned char Codelength[5] = {2, 1, 3, 4, 4};
� Computer does not know the number of leading zeros.� 1 and 01 are different codes, 0 and 000 are different codes
�How to store the Huffman table?
�Decoder also needs to store this table.
Code: a: 01, b: 1, c: 000, d: 0010, e: 0011
9/18/2005J. Liang: SFU ENSC 424 10
Encoding
� Source alphabet A = {a, b, c, d, e}� Probability distribution: {0.2, 0.4, 0.2, 0.1, 0.1}� Code: a: 01, b: 1, c: 000, d: 0010, e: 0011� Sequence: abaddecade
� For each input symbol
Find its codeword
Output all bits of the codeword
End
� Needs bit-level operations: byte pointer, bit pointer
Compressed bitstream:01101001 00010001 10000100 100011
30 bits: 3 bits / symbolASCII: 10 bytes (80 bits)Compression ratio: 2.67 : 1
9/18/2005J. Liang: SFU ENSC 424 11
Outline�Review of last lecture�Huffman code�Canonical Huffman code�Huffman Encoding�Huffman Decoding
�Direct Approach�Table Look-up Method�Multilevel Table Look-up Method
�Limitations of Huffman code�Context-adaptive Huffman coding
9/18/2005J. Liang: SFU ENSC 424 12
Huffman Decoding� Direct Approach:
� Read one bit, compare with all codewords…� Slow
�Binary tree approach:�Embed the Huffman table into a binary tree data
structure�Read one bit:
� if it’s 0, go to left child. � If it’s 1, go to right child. � Decode a symbol when a leaf is reached.
�Still a bit-by-bit approach
9/18/2005J. Liang: SFU ENSC 424 13
Table Look-up� N: # of codewords� L: max codeword length� Expand to a full tree:
� Each Level-L node belongs to the subtree of a codeword.� Equivalent to dividing the range [0, 2^L] into N intervals, each
corresponding to one codeword.
000 010 011 100
� Read L bits, and find which internal it belongs to:Bar [4] = {010, 011, 100, 1000};
x = ReadBits (3);For (k = 0; k <= 3) {
if (x < Bar [k]) {Decode codeword k; break;}}
� Still needs conditional operators: slow
1
00
010 011
9/18/2005J. Liang: SFU ENSC 424 14
Table Look-up Method
000 010 011 100
a
b c
d
a: 00b: 010c: 011d: 1
char HuffDec[8][2] = {
{‘a’, 2},
{‘a’, 2},
{‘b’, 3},
{‘c’, 3},
{‘d’, 1},
{‘d’, 1},
{‘d’, 1},
{‘d’, 1}
};
x = ReadBits(3);
k = 0; //# of symbols decoded
While (not EOF) {
symbol[k++] = HuffDec[x][0];
length = HuffDec[x][1];
x = x << length;
newbits = ReadBits(length);
x = x | newbits;
x = x & 7;
}
9/18/2005J. Liang: SFU ENSC 424 15
Multi-level Table Look-Up� Look-up table size:
� 2^L� L=8: 256 entries� L=16: 65536 entries!
�Multi-level Table Look-Up�Divide-and-conquer�One for frequently
appeared codewords�Others for longer codewords
00 01110100 101
� Most codes can be decoded by looking at a few bits
� Long codes rarely appear16 entries
L = 8
� One-level: 256 entries� Two-level: 32 entries !
16 entries
9/18/2005J. Liang: SFU ENSC 424 16
More Complicated Cases
00 01100 101
�More than one hole in the first table:�Need one 2nd-layer table for each hole.
Table 1
Table 2
Table 3
9/18/2005J. Liang: SFU ENSC 424 17
Outline
�Huffman code�Canonical Huffman code�Huffman Decoding�Limitations of Huffman Code�Context-adaptive Huffman coding
9/18/2005J. Liang: SFU ENSC 424 18
Limitations of Huffman Code� Need a probability distribution
� Usually estimated from a training set� But the practical data could be quite different
� Hard to adapt to changing statistics� Must design new codes on the fly� Context-adaptive method still need predefined tables
� Minimum codeword length is 1 bit� Serious penalty for high-probability symbols
� Example: Binary source, P(0)=0.9� Entropy: -0.9*log2(0.9)-0.1*log2(0.1) = 0.469 bit� Huffman code: 0, 1 � Avg. code length: 1 bit� More than 100% redundancy !!!� Joint coding is not practical for large alphabet.
9/18/2005J. Liang: SFU ENSC 424 19
Outline
�Huffman code�Canonical Huffman code�Huffman Decoding�Limitations of Huffman Code�Context-adaptive Huffman coding
9/18/2005J. Liang: SFU ENSC 424 20
Context-adaptive Huffman code� Switch Huffman table based on previously encoded information
(context)� Decoder follows the same rules.
� Used in H.264 (revisited later)
� Example:� Binary source with P(X2i, X2i+1):� P(0, 0) = 3/8, P(0, 1) = 1/8� P(1, 0) = 1/8, P(1, 1) = 3/8� Encode 2 symbols together
� Code from previous example:1, 00, 010, 011
� 0 and 1 tend to appear in cluster:Use last two symbols (context) to predict the next two.Design one set of Huffman code for each possible context value.
x0 x1 x2 x3 x4 x5 x6 x7 … …
111100110(1,0)
110010111(0,1)
010110111(1,1)
111110100(0,0)
(1,1)(1,0)(0,1)(0,0)
CodewordsContext
9/18/2005J. Liang: SFU ENSC 424 21
Summary�Huffman code generation:
�Sort, merge assign code
�Canonical Huffman code�Left to right, short to long, take first valid leaf
�Huffman Decoding�Direct, tree, table look-up, multi-level table look-up
�Limitations of Huffman Coding�Adaptive is hard, not efficient
�Next: Golumb-Rice coding
Arithmetic coding