data structures' project

27
1

Upload: behappy-seehappy

Post on 24-May-2015

658 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data structures' project

1

Page 2: Data structures' project

COMSATS Institute of Information & Technology, Islamabad

Subject : Data Structures

A presentation

by

Ayesha Arif

FA10-BSB-014

&

Ghazanfar Ali BaigFA10-BSB-024

Instructor Name:

Ms. Nusrat Shaheen

Page 3: Data structures' project

Huffman’s Code

3

Page 4: Data structures' project

Background

• When we encode characters in computers, we assign each an 8-bit code based on an ASCII chart.

• But in most files, some characters appear more often than others.

• So wouldn't it make more sense to assign shorter codes for characters that appear more often and longer codes for characters that appear less often?

4

Page 5: Data structures' project

Data Compressions

• Lossless compressions:Less space is used, or energy to transmit it. –No information is lost –we can reconstruct

the original text–Often: be able to quickly reconstruct original

text • Lossy compressions.

5

Page 6: Data structures' project

Data Compressions

• Text representation:• ASCII: 1 byte per character (fixed length

code)– “M A R C”

1st Byte 2nd Byte 3rd Byte 4th byte77 97 114 99

01001101 01100001 01110010 01100011

6

Page 7: Data structures' project

• In the English language, the letter ‘e’ is, ‘quite common x’ is less common–Why should all characters be represented using the same number of bits?

7

Page 8: Data structures' project

Huffman

• 1951, David Huffman found the “most efficient method of representing numbers, letters, and other symbols using binary code”  

• Now standard method used for data compression

8

Page 9: Data structures' project

The Concept

• Huffman coding has the following properties:

• Codes for more probable characters are shorter than ones for less probable characters.

• Each code can be uniquely decoded .

9

Page 10: Data structures' project

• To read the codes from a Huffman tree, start from the root and add a '0' every time you go left to a child, and add a '1' every time you go right. So in this example, the code for the character 'b' is 01 and the code for 'd' is 110.

• As you can see, 'a' has a shorter code than 'd'. Notice that since all the characters are at the leafs of the tree (the ends), there is never a chance that one code will be the prefix of another one (eg. 'a' is 00 and 'b' is 011

• ).• Hence, this unique prefix property assures that each

code can be uniquely decoded.

The Concept10

Page 11: Data structures' project

Huffman Codes

• How to construct Huffman codes?• Build a binary tree!

11

Page 12: Data structures' project

Binary Tree

• A binary tree is a tree where each internal node has at most 2 children.

12

Page 13: Data structures' project

A

B

D

C

FE G

The “Root Node”

The “Left Node” The “Right Node”

Sibling Nodes

Internal Vertex

Leaf Vertex

13

Binary Tree

Page 14: Data structures' project

Algorithm

1.Take the characters and their frequencies, and this list by increasing frequency.

2 .All the characters are vertices of the tree.3.Take the first 2 vertices from the list and make them

children of a vertex having the sum of their frequencies.4.Insert the new vertex into the sorted list of vertices

waiting to be put into the tree.5.If there are at least 2 vertices in the list, go to step 3.6.Read the Huffman code from the tree.

14

Page 15: Data structures' project

Huffman’s Code

1.Take the characters and their frequencies, and sort this list by increasing frequency

A: 3, O: 5, T: 7, E:10

15

Page 16: Data structures' project

2. All the characters are the vertices of the tree:

A:3 O:5 E:10T:7

Huffman’s Code16

Page 17: Data structures' project

3 .Take the first two vertices from the list and make them children of a vertex having the sum of their frequencies

A:3 O:5

* : 8

Huffman’s Code17

Page 18: Data structures' project

4 .Insert the new vertex into the sorted list of vertices waiting to be put into tree

• List of remaining vertices

• New list, with the new vertex inserted

T:7

T:7 E:108

E:10

Huffman’s Code18

Page 19: Data structures' project

5. Take the first 2 vertices from the list and make them

Children of the vertex having the sum of their frequencies.

A:3

*: 15

*: 8

O:5

New vertex withfrequency 7 + 8 = 15

Huffman’s Code19

T:7

Page 20: Data structures' project

6. Insert the new vertex into the sorted list of vertices waiting to be put into tree.

• List of remaining vertices

• New list, with the new vertex inserted

E:10 * : 15

E:10

Huffman’s Code20

Page 21: Data structures' project

7. Take the

first two

Vertices from

the list and make

them children of

the vertex having

the sum of their frequencies.

E:10

T:7

*:25

*:15

A:3

*:8

O:5

New vertex with frequency 10 +15 = 25Huffman’s Code

21

Page 22: Data structures' project

• Left branch = 0 • Right branch = 1 • Huffman CodeE:0T:10A:110O: 111

0 1

0 1

0 1

Huffman’s Code

22

T:7

*:25

*:15

A:3

*:8

O:5

E:10

Page 23: Data structures' project

Advantage of Huffman Codes

• Reduce size of data by 20%-90% in general• If no characters occur more frequently than others,

then no advantage over ASCII.• Encoding: –Given the characters and their frequencies, perform

the algorithm and generate a code. Write the characters using the code.

• Decoding: Given the Huffman tree, figure out what each

character is (possible because of prefix property).

23

Page 24: Data structures' project

Applications

• Both the .mp3 and .jpg file formats use

Huffman coding at one stage of the compression.

• Alternative method that achieves higher compression but is slower is patented by IBM, making Huffman Codes attractively.

24

Page 25: Data structures' project

Program25

Page 26: Data structures' project

References

•http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html

• http://www.huffmancoding.com/david/algorithm.html

• http://www.cs.sfu.ca/CourseCentral/365/li/squeeze/Huffman.htm

26

Page 27: Data structures' project

27

TashakuR