design and analysis of algorithms - iit kanpur
TRANSCRIPT
Design and Analysis of Algorithms(CS345/CS345A)
Lecture 9
Greedy StrategiesHuffman code : A data compression algorithm (continued. From last lecture)
1
Prefix Coding
Definition:
A coding ๐ธ(๐จ) is called prefix coding if there does not exist ๐ฅ, ๐ฆ โ ๐จ such that
๐ธ(๐) is prefix of ๐ธ(๐)
Algorithmic Problem:
Given a set ๐จ of ๐ alphabets and their frequencies ๐, compute coding ๐ธ such that
โข ๐ธ is prefix coding
โข ๐๐๐ ๐ธ is minimum.
2
๐๐๐ ๐ธ =
๐ฅโ๐จ
๐ ๐ฅ . ๐ธ ๐ฅ
A labeled binary tree
leaf nodes โ alphabets
Code of an alphabet =
4
0 1
0
0
0
0
0
0
1
1
1
1
1
1
Label of path from root
A labeled binary tree
leaf nodes โ alphabets
Code of an alphabet =
5
0 1
0
0
0
0
0
0
1
1
1
1
1
1
Label of path from root
01,001,0000,11,100,10110,10111
Prefix Coding
Alphabets Frequency ๐
Encoding๐ธ
๐ 0.45
๐ 0.18
๐ 0.15
๐ 0.12
๐ 0.10
Question:
How to build the labeled tree for a prefix code ?
6
100
0
110
101
111
0 1
๐
{0, 100, 101, 110, 111}
{00,01,10,11}
Prefix Coding
Alphabets Frequency ๐
Encoding๐ธ
๐ 0.45
๐ 0.18
๐ 0.15
๐ 0.12
๐ 0.10
Question:
How to build the labeled tree for a prefix code ?
7
100
0
110
101
111
0 1
๐
{0,1}
10
{0,1}
Prefix Coding
Alphabets Frequency ๐
Encoding๐ธ
๐ 0.45
๐ 0.18
๐ 0.15
๐ 0.12
๐ 0.10
Question:
How to build the labeled tree for a prefix code ?
8
100
0
110
101
111
0 1
1
11
0
0 0
๐
๐ ๐ ๐ ๐
Theorem 1:
For each prefix code of a set ๐จ of ๐ alphabets,
there exists a binary tree ๐ป on ๐ leaves s.t.
โข There is a bijective mapping between the alphabets and the leaves.
โข The label of a path from root to a leaf node corresponds to the prefix code of the corresponding alphabet.
Question: Can you express Average bit length of ๐ธ in terms of its binary tree ๐ป ?
๐๐๐ ๐ธ =
๐ฅโ๐จ
๐ ๐ฅ . ๐ธ ๐ฅ
=
๐ฅโ๐จ
๐ ๐ฅ . ๐๐๐ฉ๐ญ๐ก๐ ๐ฅ
9
Observations on the binary tree of the optimal prefix code
Lemma:
The binary tree corresponding to optimal prefix coding must be a full binary tree:
Every internal node has degree exactly 2.
Question: What next ?
We need to see the influence of frequencies on the optimal binary tree.
12
๐๐ , ๐๐ , โฆ , ๐๐
Non-decreasing order of frequency
๐(๐๐) โค ๐(๐๐+๐)
Observations on the binary tree of the optimal prefix code
Intuitively, more frequent alphabets should be closer to the root and
less frequent alphabets should be farther from the root.
But how to organize them to achieve optimal prefix code ?
โข We shall now make some simple observations about the structure of the binary tree corresponding to the optimal prefix codes.
โข These observations will be about some local property in the tree.
โข Nevertheless, these observations will play a crucial role in the design of a binary tree with optimal prefix code for given ๐จ.
Please pay full attention on the next few slides.
13
More Observations
14
0 1
0 1 10
๐1
0
๐๐
1
0
. . .
Deepest level
Can ๐1be present at a smaller level than the
deepest node ? If not, how to prove it ?
More Observations
15
0 1
0 1 10
๐1
10
0
. . .
๐๐Deepest level
Swapping ๐1 with ๐๐can not increase ABL.
More Observations
16
0 1
0 1 10
๐๐ ๐๐
1
10
. . .
๐๐
0
Deepest level
Since the tree is full binary, ๐๐ must have a sibling. What can we
say about it ?
It must be a leaf node. Otherwise ๐๐ is not at
the deepest level.
What about ๐๐ ?
Swapping ๐2 with ๐๐can not increase ABL.
More Observations
Theorem 2: There exists an optimal prefix coding
in which ๐๐and ๐๐ appear as siblings 17
0 1
0 1 10
๐๐ ๐๐
1
10
. . .
Deepest level
The important observation
Theorem 2: There exists an optimal prefix coding in which ๐๐and ๐๐ appear as siblings.
Important note: It is inaccurate to claim that โIn every optimal prefix coding, ๐๐and ๐๐ appear as siblings in the labeled binary string.โ For example, if there are ๐ alphabets with same frequencies and ๐ is odd.
But algorithmic implication of Theorem 2 mentioned above is quite important:
โWe just need to focus on that binary tree of optimal prefix coding in which ๐๐and ๐๐ appear as siblings.
This lemma is a powerful hint to the design of optimal prefix code.18
๐๐๐๐๐๐ ๐จ = ๐๐๐๐๐๐(๐จโฒ) + ๐ ๐๐ + ๐ ๐๐
Observation:
If this relation is true, we have an algorithm for optimal prefix codes. 23
๐จ
๐จโฒ
๐๐, ๐๐, ๐๐ , โฆ โฆ ๐๐
Greedystep
๐๐ , โฆ โฆ ๐๐๐โฒ
๐ ๐โ = ๐ ๐๐ + ๐ ๐๐
?
Opt(๐จ)
Opt(๐จโฒ)
Relation ?
Proof for
๐๐๐๐๐๐ ๐จ = ๐๐๐๐๐๐(๐จโฒ) + ๐ ๐๐ + ๐ ๐๐
Spend some time thinking about the proof
before moving ahead.
โบ
24
How to prove ๐๐๐๐๐๐ ๐จ = ๐๐๐๐๐๐(๐จโฒ) + ๐ ๐๐ + ๐ ๐๐ ?
Question: Can we derive a prefix coding for ๐จ from ๐๐๐(๐จโฒ) ?
Question: Can we derive a prefix coding for ๐จโฒ from ๐๐๐(๐จ) ?
25
A prefix coding for ๐จ from ๐๐๐ ๐จโฒ
๐ปโฒ: the binary tree corresponding to ๐๐๐๐๐๐(๐จโฒ)
26
. . .
๐โ
1
0 1
0 1 10
๐ปโฒ
A prefix coding for ๐จ from ๐๐๐(๐จโฒ)
๐ปโฒ: the binary tree corresponding to ๐๐๐๐๐๐(๐จโฒ)
This gives a prefix coding for ๐จ with ๐๐๐ = ??
โ ๐๐๐๐๐๐ ๐จ โค ๐๐๐๐๐๐(๐จโฒ) + ๐ ๐๐ + ๐ ๐๐
27
0 1
0 1 10
. . .
1
๐โ
๐ปโฒ
๐๐ ๐๐
10
๐๐๐๐๐๐(๐จโฒ) + ๐ ๐๐ + ๐ ๐๐
A prefix coding for ๐จโฒ from ๐๐๐(๐จ)
๐ป: the binary tree corresponding to ๐๐๐๐๐๐(๐จ)
28
. . .
Deepest level ๐๐ ๐๐
1
10
0 1
0 1 10๐ป
Using Theorem 2
A prefix coding for ๐จโฒ from ๐๐๐(๐จ)
๐ป: the binary tree corresponding to ๐๐๐๐๐๐(๐จ)
This gives a prefix coding for ๐จโฒ with ๐๐๐ = ??
โ ๐๐๐๐๐๐ ๐จโฒ โค ๐๐๐๐๐๐(๐จ) โ ๐ ๐๐ โ ๐ ๐๐29
. . .
Deepest level
1
๐๐ ๐๐
10
0 1
0 1 10๐ป
Using Theorem 1
๐โ
๐๐๐๐๐๐ ๐จ โ ๐ ๐๐ โ ๐ ๐๐
We proved
โข ๐๐๐๐๐๐ ๐จ โค ๐๐๐๐๐๐ ๐จโฒ + ๐ ๐๐ + ๐ ๐๐
โข ๐๐๐๐๐๐ ๐จโฒ โค ๐๐๐๐๐๐(๐จ) โ ๐ ๐๐ โ ๐ ๐๐
๐๐๐๐๐๐ ๐จ โฅ ๐๐๐๐๐๐ ๐จโฒ + ๐ ๐๐ + ๐ ๐๐
Using (1) and (2)
๐๐๐๐๐๐ ๐จ = ๐๐๐๐๐๐ ๐จโฒ + ๐ ๐๐ + ๐ ๐๐
30
(1)
(2)
The algorithm based on๐๐๐๐๐๐ ๐จ = ๐๐๐๐๐๐(๐จโฒ) + ๐ ๐๐ + ๐ ๐๐
OPT(๐จ)
{ If |๐จ|=2, return ;
else
{ Let ๐๐ and ๐2 be the two alphabets with least frequencies.
Remove ๐๐ and ๐๐ from ๐จ;
Create a new alphabet ๐โฒ;
๐ ๐โฒ ๐ ๐๐ + ๐ ๐๐ ;
Insert ๐โฒ into ๐จ;
T OPT(๐จ);
Replace node in T by
return T ;
}
}31
๐โ๐๐ ๐๐
10
๐๐ ๐๐
10
๐ถ(๐๐) time complexity
Homework:Achieve ๐ถ(๐ ๐ฅ๐จ๐ ๐)
time comp lexity
A generic way to prove that a greedy strategy works
P: a given optimization problem
1. Try to establish a relation between OPT(๐จ) and OPT(๐จโฒ);
2. Try to prove the relation formally by
โ deriving a solution of ๐จ from OPT(๐จโฒ)
โ deriving a solution of ๐จโฒ from OPT(๐จ) 32
๐จ
๐จโฒ
instance of size ๐ of problem P
Greedystep
instance of size < ๐ of problem P
Opt(๐จ)
Opt(๐จโฒ)
use Theorem
Prove a suitable Theorem about Opt(๐จ) using this greedy step
If you succeed, you have an algorithm
from construction
instance ๐จ
Lemma 1 : There is a MST with edge (u,v)
35
170
33
3531
41 81
52
42
50
102
50 3060
70
54
57
49
49
80
78
6354
53
44
29
64
42
43
uv
instance ๐จโฒ
36
170
33
3531
41 81
52
42
50
102
50 3060
70
54
57
49
49
80
78
6354
5364
42
44
w
43
This is graph ๐ฎโฒ
How to compute instance ๐จโฒ
Let (u,v) be the least weight edge in ๐ฎ=(๐ฝ, ๐ฌ). Transform ๐ฎ into ๐ฎโฒ as follows.
โข Remove vertices u and v and add a new vertex w
โข For each edge (u,x)ฯต ๐ฌ, add edge (w,x) in ๐ฎโฒ.
โข For each edge (v,x)ฯต ๐ฌ, add edge (w,x) in ๐ฎโฒ.
โข In case of multiple edges between w and x, keep only the lighter weight edge.
Theorem : W_MST(๐ฎ) = W_MST(๐ฎโฒ) + w(u,v)
Proof: (by construction)
1. W_MST(๐ฎ) โค W_MST(๐ฎโฒ) + w(u,v)
2. W_MST(๐ฎโฒ) โค W_MST(๐ฎ) - w(u,v)
This gives an algorithm for MST with ๐ถ(๐๐) time complexity.
Improve it to ๐ถ(๐ ๐ฅ๐จ๐ ๐) by using data structures.
37
Use Lemma 1
From construction