huffman coding
DESCRIPTION
Huffman CodingTRANSCRIPT
-
data compression basics
1
HUFFMAN CODING
1
2
In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm
Present a procedure for building Huffman codes when the probability model for the source is known
A procedure for building codes when the source statistics are unknown
Describe a new technique for code design that are in some sense similar to the Huffman coding approach
Overview
-
data compression basics
2
3
Huffman Coding Algorithm
4
Huffman Coding Algorithm
-
data compression basics
3
5
Huffman Coding Algorithm
6
Huffman Coding Algorithm
-
data compression basics
4
7
Huffman Coding Algorithm
8
Huffman Coding Algorithm
-
data compression basics
5
9
Huffman Coding Algorithm
10
Huffman Coding Algorithm
-
data compression basics
6
11
Minimum Variance Huffman Codes
12
Minimum Variance Huffman Codes
-
data compression basics
7
13
Minimum Variance Huffman Codes
14
Minimum Variance Huffman Codes
-
data compression basics
8
15
Minimum Variance Huffman Codes
16
Huffman Coding (using binary tree)
Algorithm in 5 steps:1. Find the grey-level probabilities for the image by
finding the histogram
2. Order the input probabilities (histogram magnitudes) from smallest to largest
3. Combine the smallest two by addition
4. GOTO step 2, until only two probabilities are left
5. By working backward along the tree, generate code by alternating assignment of 0 and 1
-
data compression basics
9
17
Coding Procedures for an N-symbol source Source reduction List all probabilities in a descending order
Merge the two symbols with smallest probabilities into a new compound symbol
Repeat the above two steps for N-2 steps
Codeword assignment Start from the smallest source and work back to the
original source
Each merging point corresponds to a node in binary codeword tree
Huffman Coding (using binary tree)
Example 1
We have an image with 2 bits/pixel, giving 4
possible gray levels. The image is 10 rows by 10
columns. In step 1 we find the histogram for the
image.
18
-
data compression basics
10
Example 1
Converted into probabilities by
normalizing to the total
number of pixels
Gray level 0 has 20 pixels
Gray level 1 has 30 pixels
Gray level 2 has 10 pixels
Gray level 3 has 40 pixels
19
a. Step 1: Histogram
Example 1
Step 2, the probabilities are ordered.
20
-
data compression basics
11
Example 1
Step 3, combine the smallest two by addition.
21
Example 1
Step 4 repeats steps 2 and 3, where reorder (if
necessary) and add the
two smallest
probabilities.
22
d. Step 4: Reorder and
add until only two values
remain.
-
data compression basics
12
Example 1
Step 5, actual code assignment is made. Start on the right-hand side of the tree and assign 0s &
1s
0 is assigned to 0.6 branch & 1 to 0.4 branch
23
Example 1
The assigned 0 & 1 are brought back along the tree & wherever a branch occurs the code is put on both
branches
24
-
data compression basics
13
Example 1
Assign the 0 & 1 to the branches labeled 0.3, appending to the existing code.
25
Example 1
Finally, the codes are brought back one more level, & where the branch splits another assignment 0 & 1 occurs
(at 0.1 & 0.2 branch)
26
-
data compression basics
14
Example 1
Now we have Huffman code for this image 2 gray levels have 3 bits to represent & 1 gray level has 1 bit
assigned
Gray level represented by 1 bit, g3, is the most likely to occur (40% of the time) & thus has least information in the
information theoretic sense.27
Exercise
Using the example 1, find a Huffman code using the minimum variance procedure.
EE465: Introduction to Digital Image Processing 28
-
data compression basics
15
29
symbol x p(x)
S
W
N
E
0.5
0.25
0.125
0.1250.25
0.25
0.5 0.5
0.5
Example 2
Step 1: Source reduction
(EW)
(NEW)
compound symbols
30
p(x)
0.5
0.25
0.125
0.1250.25
0.25
0.5 0.5
0.5 1
0
1
0
1
0
codeword
0
10
110
111
Example 2
Step 2: Codeword assignment
symbol x
S
W
N
E
NEW 0
10EW
110
EW
N
S
01
1 0
1 0111
-
data compression basics
16
31
Example 2
NEW 0
10EW
110
EW
N
S
01
1 0
1 0
NEW 1
01EW
000
EW
N
S
10
0 1
1 0001
The codeword assignment is not unique. In fact, at each
merging point (node), we can arbitrarily assign 0 and 1
to the two branches (average code length is the same).
or
32
symbol x p(x)
e
o
a
i
0.4
0.2
0.2
0.1
0.4
0.2
0.4 0.6
0.4
Example 2
Step 1: Source reduction
(iou)
(aiou)
compound symbolsu 0.1
0.2(ou)
0.4
0.2
0.2
-
data compression basics
17
33
symbol x p(x)
e
o
a
i
0.4
0.2
0.2
0.1
0.4
0.2
0.4 0.6
0.4
Example 2
(iou)
(aiou)
compound symbols
u 0.10.2(ou)
0.4
0.2
0.2
Step 2: Codeword assignment
codeword
0
1
1
01
000
0010
0011
34
Example 2
0 1
0100
000 001
0010 0011
e
o u
(ou)i
(iou) a
(aiou)
binary codeword tree representation
-
data compression basics
18
35
Example 2
symbol x p(x)
e
o
a
i
0.4
0.2
0.20.1
u 0.1
codeword
1
01
0000010
0011
length1
23
4
4
bpsppXHi
ii 122.2log)(5
1
2
bpslpli
ii 2.241.041.032.022.014.05
1
bpsXHlr 078.0)(
If we use fixed-length codes, we have to spend three bits per
sample, which gives code redundancy of 3-2.122=0.878bps
36
Example 3
Step 1: Source reduction
compound symbol
-
data compression basics
19
37
Example 3
Step 2: Codeword assignment
compound symbol
38
Adaptive Huffman Coding
-
data compression basics
20
39
Adaptive Huffman Coding
40
Update Procedure
-
data compression basics
21
41
Update Procedure
42
Update Procedure
-
data compression basics
22
43
Update Procedure
44
Update Procedure
-
data compression basics
23
45
Update Procedure
46
Dynamic Huffman Coding
-
data compression basics
24
47
T
Stage 1 (First occurrence of t )
r
/ \
0 t(1)
Order: 0,t(1)
* r represents the root
* 0 represents the null node
* t(1) denotes the occurrence of T with a frequency of 1
48
TE
Stage 2 (First occurrence of e)
r
/ \
1 t(1)
/ \
0 e(1)
Order: 0,e(1),1,t(1)
-
data compression basics
25
49
TEN
Stage 3 (First occurrence of n )r
/ \
2 t(1)
/ \
1 e(1)
/ \
0 n(1)
Order: 0,n(1),1,e(1),2,t(1) : Misfit
50
Reorder: TEN
r
/ \
t(1) 2
/ \
1 e(1)
/ \
0 n(1)
Order: 0,n(1),1,e(1),t(1),2
-
data compression basics
26
51
TENN
Stage 4 ( Repetition of n )r
/ \
t(1) 3
/ \
2 e(1)
/ \
0 n(2)
Order: 0,n(2),2,e(1),t(1),3 : Misfit
52
Reorder: TENN
r
/ \
n(2) 2
/ \
1 e(1)
/ \
0 t(1)
Order: 0,t(1),1,e(1),n(2),2
t(1),n(2) are swapped
-
data compression basics
27
53
TENNE
Stage 5 (Repetition of e )r
/ \
n(2) 3
/ \
1 e(2)
/ \
0 t(1)
Order: 0,t(1),1,e(2),n(2),3
54
TENNES
Stage 6 (First occurrence of s)r
/ \
n(2) 4
/ \
2 e(2)
/ \
1 t(1)
/ \
0 s(1)
Order: 0,s(1),1,t(1),2,e(2),n(2),4
-
data compression basics
28
55
TENNESS
Stage 7 (Repetition of s)r
/ \
n(2) 5
/ \
3 e(2)
/ \
2 t(1)
/ \
0 s(2)
Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit
56
Reorder: TENNESS
r
/ \
n(2) 5
/ \
3 e(2)
/ \
1 s (2)
/ \
0 t(1)
Order : 0,t(1),1,s(2),3,e(2),n(2),5
s(2) and t(1) are swapped
-
data compression basics
29
57
TENNESSE
Stage 8 (Second repetition of e )
r
/ \
n(2) 6
/ \
3 e(3)
/ \
1 s(2)
/ \
0 t(1)
Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit
58
Reorder: TENNESSE
r
/ \
e(3) 5
/ \
3 n(2)
/ \
1 s(2)
/ \
0 t(1)
Order : 1,t(1),1,s(2),3,n(2),e(3),5
N(2) and e(3) are swapped
-
data compression basics
30
59
TENNESSEE
Stage 9 (Second repetition of e )
r
0/ \1
e(4) 5
0/ \1
3 n(2)
0/ \1
1 s(2)
0/ \1
0 t(1)
Order : 1,t(1),1,s(2),3,n(2),e(4),5
60
ENCODING
The letters can be encoded as follows:
e : 0
n : 11
s : 101
t : 1001
-
data compression basics
31
61
Average Code Length
Average code length = i=0,n (length*frequency)/ i=0,n frequency
= { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1)
= 18 / 9 = 2
62
ENTROPY
Entropy = - i=1,n (pi log2 pi)
= - ( 0.44 * log20.44 + 0.22 * log20.22
+ 0.22 * log20.22 + 0.11 * log20.11 )
= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)
/ log2
= 1.8367
-
data compression basics
32
63
Ordinary Huffman Coding
TENNESSE
9
0/ \1
5 e(4)
0/ \1
s(2) 3
0/ \1
t(1) n(2)
ENCODING
E : 1
S : 00
T : 010
N : 011
Average code length = (1*4 + 2*2 +
2*3 + 3*1) / 9 = 1.89
64
SUMMARY
The average code length of ordinary Huffman coding seems to be
better than the Dynamic version,in this exercise.
But, actually the performance of dynamic coding is better. The problem
with static coding is that the tree has to be constructed in the transmitter
and sent to the receiver. The tree may change because the frequency
distribution of the English letters may change in plain text technical paper,
piece of code etc.
Since the tree in dynamic coding is constructed on the receiver as well, it
need not be sent. Considering this, Dynamic coding is better.
Also, the average code length will improve if the transmitted text is
bigger.
-
data compression basics
33
65
Summary of Huffman Coding Algorithm
Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time
Sorting symbols in descending probabilities is the key in the step of source reduction
The codeword assignment is not unique. Exchange the labeling of 0 and 1 at any node of binary codeword tree would produce another solution that equally works well
Only works for a source with finite number of symbols (otherwise, it does not know where to start)