design and analysis of algorithms - iit kanpur

36
Design and Analysis of Algorithms (CS345/CS345A) Lecture 9 Greedy Strategies Huffman code : A data compression algorithm (continued. From last lecture) 1

Upload: others

Post on 20-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Design and Analysis of Algorithms(CS345/CS345A)

Lecture 9

Greedy StrategiesHuffman code : A data compression algorithm (continued. From last lecture)

1

Prefix Coding

Definition:

A coding ๐œธ(๐‘จ) is called prefix coding if there does not exist ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘จ such that

๐œธ(๐’™) is prefix of ๐œธ(๐’š)

Algorithmic Problem:

Given a set ๐‘จ of ๐’ alphabets and their frequencies ๐‘“, compute coding ๐œธ such that

โ€ข ๐œธ is prefix coding

โ€ข ๐€๐๐‹ ๐œธ is minimum.

2

๐€๐๐‹ ๐œธ =

๐‘ฅโˆˆ๐‘จ

๐‘“ ๐‘ฅ . ๐œธ ๐‘ฅ

The novel idea of Huffman

3

Binary coding Binary tree

? 0 1

10 01

Labeled

A labeled binary tree

leaf nodes โ†’ alphabets

Code of an alphabet =

4

0 1

0

0

0

0

0

0

1

1

1

1

1

1

Label of path from root

A labeled binary tree

leaf nodes โ†’ alphabets

Code of an alphabet =

5

0 1

0

0

0

0

0

0

1

1

1

1

1

1

Label of path from root

01,001,0000,11,100,10110,10111

Prefix Coding

Alphabets Frequency ๐‘“

Encoding๐œธ

๐’‚ 0.45

๐’ƒ 0.18

๐’„ 0.15

๐’… 0.12

๐’† 0.10

Question:

How to build the labeled tree for a prefix code ?

6

100

0

110

101

111

0 1

๐’‚

{0, 100, 101, 110, 111}

{00,01,10,11}

Prefix Coding

Alphabets Frequency ๐‘“

Encoding๐œธ

๐’‚ 0.45

๐’ƒ 0.18

๐’„ 0.15

๐’… 0.12

๐’† 0.10

Question:

How to build the labeled tree for a prefix code ?

7

100

0

110

101

111

0 1

๐’‚

{0,1}

10

{0,1}

Prefix Coding

Alphabets Frequency ๐‘“

Encoding๐œธ

๐’‚ 0.45

๐’ƒ 0.18

๐’„ 0.15

๐’… 0.12

๐’† 0.10

Question:

How to build the labeled tree for a prefix code ?

8

100

0

110

101

111

0 1

1

11

0

0 0

๐’‚

๐’ƒ ๐’… ๐’„ ๐’†

Theorem 1:

For each prefix code of a set ๐‘จ of ๐’ alphabets,

there exists a binary tree ๐‘ป on ๐’ leaves s.t.

โ€ข There is a bijective mapping between the alphabets and the leaves.

โ€ข The label of a path from root to a leaf node corresponds to the prefix code of the corresponding alphabet.

Question: Can you express Average bit length of ๐œธ in terms of its binary tree ๐‘ป ?

๐€๐๐‹ ๐œธ =

๐‘ฅโˆˆ๐‘จ

๐‘“ ๐‘ฅ . ๐œธ ๐‘ฅ

=

๐‘ฅโˆˆ๐‘จ

๐‘“ ๐‘ฅ . ๐๐ž๐ฉ๐ญ๐ก๐“ ๐‘ฅ

9

FINDING THE LABELED BINARY TREE FORTHE OPTIMAL PREFIX CODES

10

Is the following prefix coding optimal ?

11

0 1

0

0

0

0

1

1

10

0 1

1

1

full binary tree

Observations on the binary tree of the optimal prefix code

Lemma:

The binary tree corresponding to optimal prefix coding must be a full binary tree:

Every internal node has degree exactly 2.

Question: What next ?

We need to see the influence of frequencies on the optimal binary tree.

12

๐’‚๐Ÿ , ๐’‚๐Ÿ , โ€ฆ , ๐’‚๐’

Non-decreasing order of frequency

๐’‡(๐’‚๐’Š) โ‰ค ๐’‡(๐’‚๐’Š+๐Ÿ)

Observations on the binary tree of the optimal prefix code

Intuitively, more frequent alphabets should be closer to the root and

less frequent alphabets should be farther from the root.

But how to organize them to achieve optimal prefix code ?

โ€ข We shall now make some simple observations about the structure of the binary tree corresponding to the optimal prefix codes.

โ€ข These observations will be about some local property in the tree.

โ€ข Nevertheless, these observations will play a crucial role in the design of a binary tree with optimal prefix code for given ๐‘จ.

Please pay full attention on the next few slides.

13

More Observations

14

0 1

0 1 10

๐’‚1

0

๐’‚๐’Š

1

0

. . .

Deepest level

Can ๐’‚1be present at a smaller level than the

deepest node ? If not, how to prove it ?

More Observations

15

0 1

0 1 10

๐’‚1

10

0

. . .

๐’‚๐’ŠDeepest level

Swapping ๐’‚1 with ๐’‚๐‘–can not increase ABL.

More Observations

16

0 1

0 1 10

๐’‚๐Ÿ ๐’‚๐’Š

1

10

. . .

๐’‚๐Ÿ

0

Deepest level

Since the tree is full binary, ๐’‚๐Ÿ must have a sibling. What can we

say about it ?

It must be a leaf node. Otherwise ๐’‚๐Ÿ is not at

the deepest level.

What about ๐’‚๐Ÿ ?

Swapping ๐’‚2 with ๐’‚๐‘–can not increase ABL.

More Observations

Theorem 2: There exists an optimal prefix coding

in which ๐’‚๐Ÿand ๐’‚๐Ÿ appear as siblings 17

0 1

0 1 10

๐’‚๐Ÿ ๐’‚๐Ÿ

1

10

. . .

Deepest level

The important observation

Theorem 2: There exists an optimal prefix coding in which ๐’‚๐Ÿand ๐’‚๐Ÿ appear as siblings.

Important note: It is inaccurate to claim that โ€œIn every optimal prefix coding, ๐’‚๐Ÿand ๐’‚๐Ÿ appear as siblings in the labeled binary string.โ€ For example, if there are ๐’ alphabets with same frequencies and ๐’ is odd.

But algorithmic implication of Theorem 2 mentioned above is quite important:

โž”We just need to focus on that binary tree of optimal prefix coding in which ๐’‚๐Ÿand ๐’‚๐Ÿ appear as siblings.

This lemma is a powerful hint to the design of optimal prefix code.18

19

๐‘จ

๐‘จโ€ฒ

instance of size ๐’ of problem P

Greedystep

instance of size < ๐’ of problem P

The binary tree of the optimal prefix code

21

0 1

0 1 10

. . .

Deepest level๐’‚๐Ÿ ๐’‚๐Ÿ

1

10

The binary tree of the optimal prefix code

22

0 1

0 1 10

. . .

Deepest level ๐’‚โ€™

1

๐Ž๐๐“๐€๐๐‹ ๐‘จ = ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ) + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

Observation:

If this relation is true, we have an algorithm for optimal prefix codes. 23

๐‘จ

๐‘จโ€ฒ

๐’‚๐Ÿ, ๐’‚๐Ÿ, ๐’‚๐Ÿ‘ , โ€ฆ โ€ฆ ๐’‚๐’

Greedystep

๐’‚๐Ÿ‘ , โ€ฆ โ€ฆ ๐’‚๐’๐’‚โ€ฒ

๐’‡ ๐’‚โ€™ = ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

?

Opt(๐‘จ)

Opt(๐‘จโ€ฒ)

Relation ?

Proof for

๐Ž๐๐“๐€๐๐‹ ๐‘จ = ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ) + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

Spend some time thinking about the proof

before moving ahead.

โ˜บ

24

How to prove ๐Ž๐๐“๐€๐๐‹ ๐‘จ = ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ) + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ ?

Question: Can we derive a prefix coding for ๐‘จ from ๐Ž๐๐“(๐‘จโ€ฒ) ?

Question: Can we derive a prefix coding for ๐‘จโ€ฒ from ๐Ž๐๐“(๐‘จ) ?

25

A prefix coding for ๐‘จ from ๐Ž๐๐“ ๐‘จโ€ฒ

๐‘ปโ€ฒ: the binary tree corresponding to ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ)

26

. . .

๐’‚โ€™

1

0 1

0 1 10

๐‘ปโ€ฒ

A prefix coding for ๐‘จ from ๐Ž๐๐“(๐‘จโ€ฒ)

๐‘ปโ€ฒ: the binary tree corresponding to ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ)

This gives a prefix coding for ๐‘จ with ๐€๐๐‹ = ??

โž” ๐Ž๐๐“๐€๐๐‹ ๐‘จ โ‰ค ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ) + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

27

0 1

0 1 10

. . .

1

๐’‚โ€™

๐‘ปโ€ฒ

๐’‚๐Ÿ ๐’‚๐Ÿ

10

๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ) + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

A prefix coding for ๐‘จโ€ฒ from ๐Ž๐๐“(๐‘จ)

๐‘ป: the binary tree corresponding to ๐Ž๐๐“๐€๐๐‹(๐‘จ)

28

. . .

Deepest level ๐’‚๐Ÿ ๐’‚๐Ÿ

1

10

0 1

0 1 10๐‘ป

Using Theorem 2

A prefix coding for ๐‘จโ€ฒ from ๐Ž๐๐“(๐‘จ)

๐‘ป: the binary tree corresponding to ๐Ž๐๐“๐€๐๐‹(๐‘จ)

This gives a prefix coding for ๐‘จโ€ฒ with ๐€๐๐‹ = ??

โž” ๐Ž๐๐“๐€๐๐‹ ๐‘จโ€ฒ โ‰ค ๐Ž๐๐“๐€๐๐‹(๐‘จ) โˆ’ ๐’‡ ๐’‚๐Ÿ โˆ’ ๐’‡ ๐’‚๐Ÿ29

. . .

Deepest level

1

๐’‚๐Ÿ ๐’‚๐Ÿ

10

0 1

0 1 10๐‘ป

Using Theorem 1

๐’‚โ€™

๐Ž๐๐“๐€๐๐‹ ๐‘จ โˆ’ ๐’‡ ๐’‚๐Ÿ โˆ’ ๐’‡ ๐’‚๐Ÿ

We proved

โ€ข ๐Ž๐๐“๐€๐๐‹ ๐‘จ โ‰ค ๐Ž๐๐“๐€๐๐‹ ๐‘จโ€ฒ + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

โ€ข ๐Ž๐๐“๐€๐๐‹ ๐‘จโ€ฒ โ‰ค ๐Ž๐๐“๐€๐๐‹(๐‘จ) โˆ’ ๐’‡ ๐’‚๐Ÿ โˆ’ ๐’‡ ๐’‚๐Ÿ

๐Ž๐๐“๐€๐๐‹ ๐‘จ โ‰ฅ ๐Ž๐๐“๐€๐๐‹ ๐‘จโ€ฒ + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

Using (1) and (2)

๐Ž๐๐“๐€๐๐‹ ๐‘จ = ๐Ž๐๐“๐€๐๐‹ ๐‘จโ€ฒ + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

30

(1)

(2)

The algorithm based on๐Ž๐๐“๐€๐๐‹ ๐‘จ = ๐Ž๐๐“๐€๐๐‹(๐‘จโ€ฒ) + ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ

OPT(๐‘จ)

{ If |๐‘จ|=2, return ;

else

{ Let ๐’‚๐Ÿ and ๐’‚2 be the two alphabets with least frequencies.

Remove ๐’‚๐Ÿ and ๐’‚๐Ÿ from ๐‘จ;

Create a new alphabet ๐’‚โ€ฒ;

๐’‡ ๐’‚โ€ฒ ๐’‡ ๐’‚๐Ÿ + ๐’‡ ๐’‚๐Ÿ ;

Insert ๐’‚โ€ฒ into ๐‘จ;

T OPT(๐‘จ);

Replace node in T by

return T ;

}

}31

๐’‚โ€™๐’‚๐Ÿ ๐’‚๐Ÿ

10

๐’‚๐Ÿ ๐’‚๐Ÿ

10

๐‘ถ(๐’๐Ÿ) time complexity

Homework:Achieve ๐‘ถ(๐’ ๐ฅ๐จ๐  ๐’)

time comp lexity

A generic way to prove that a greedy strategy works

P: a given optimization problem

1. Try to establish a relation between OPT(๐‘จ) and OPT(๐‘จโ€ฒ);

2. Try to prove the relation formally by

โ‘ deriving a solution of ๐‘จ from OPT(๐‘จโ€ฒ)

โ‘ deriving a solution of ๐‘จโ€ฒ from OPT(๐‘จ) 32

๐‘จ

๐‘จโ€ฒ

instance of size ๐’ of problem P

Greedystep

instance of size < ๐’ of problem P

Opt(๐‘จ)

Opt(๐‘จโ€ฒ)

use Theorem

Prove a suitable Theorem about Opt(๐‘จ) using this greedy step

If you succeed, you have an algorithm

from construction

EXAMPLE MINIMUM SPANNING TREE

33

instance ๐‘จ

34

170

33

3531

41 81

52

42

50

102

50 3060

70

54

57

49

49

80

78

6354

53

44

29

64

42

43

instance ๐‘จ

Lemma 1 : There is a MST with edge (u,v)

35

170

33

3531

41 81

52

42

50

102

50 3060

70

54

57

49

49

80

78

6354

53

44

29

64

42

43

uv

instance ๐‘จโ€ฒ

36

170

33

3531

41 81

52

42

50

102

50 3060

70

54

57

49

49

80

78

6354

5364

42

44

w

43

This is graph ๐‘ฎโ€ฒ

How to compute instance ๐‘จโ€ฒ

Let (u,v) be the least weight edge in ๐‘ฎ=(๐‘ฝ, ๐‘ฌ). Transform ๐‘ฎ into ๐‘ฎโ€ฒ as follows.

โ€ข Remove vertices u and v and add a new vertex w

โ€ข For each edge (u,x)ฯต ๐‘ฌ, add edge (w,x) in ๐‘ฎโ€ฒ.

โ€ข For each edge (v,x)ฯต ๐‘ฌ, add edge (w,x) in ๐‘ฎโ€ฒ.

โ€ข In case of multiple edges between w and x, keep only the lighter weight edge.

Theorem : W_MST(๐‘ฎ) = W_MST(๐‘ฎโ€ฒ) + w(u,v)

Proof: (by construction)

1. W_MST(๐‘ฎ) โ‰ค W_MST(๐‘ฎโ€ฒ) + w(u,v)

2. W_MST(๐‘ฎโ€ฒ) โ‰ค W_MST(๐‘ฎ) - w(u,v)

This gives an algorithm for MST with ๐‘ถ(๐’Ž๐’) time complexity.

Improve it to ๐‘ถ(๐’Ž ๐ฅ๐จ๐  ๐’) by using data structures.

37

Use Lemma 1

From construction