data structures' project

COMSATS Institute of Information & Technology, Islamabad

Subject : Data Structures

A presentation

by

Ayesha Arif

FA10-BSB-014

&

Ghazanfar Ali BaigFA10-BSB-024

Instructor Name:

Ms. Nusrat Shaheen

Huffman’s Code

3

Background

• When we encode characters in computers, we assign each an 8-bit code based on an ASCII chart.

• But in most files, some characters appear more often than others.

• So wouldn't it make more sense to assign shorter codes for characters that appear more often and longer codes for characters that appear less often?

4

Data Compressions

• Lossless compressions:Less space is used, or energy to transmit it. –No information is lost –we can reconstruct

the original text–Often: be able to quickly reconstruct original

text • Lossy compressions.

5

Data Compressions

• Text representation:• ASCII: 1 byte per character (fixed length

code)– “M A R C”

1st Byte 2nd Byte 3rd Byte 4th byte77 97 114 99

01001101 01100001 01110010 01100011

6

• In the English language, the letter ‘e’ is, ‘quite common x’ is less common–Why should all characters be represented using the same number of bits?

7

Huffman

• 1951, David Huffman found the “most efficient method of representing numbers, letters, and other symbols using binary code”

• Now standard method used for data compression

8

The Concept

• Huffman coding has the following properties:

• Codes for more probable characters are shorter than ones for less probable characters.

• Each code can be uniquely decoded .

9

• To read the codes from a Huffman tree, start from the root and add a '0' every time you go left to a child, and add a '1' every time you go right. So in this example, the code for the character 'b' is 01 and the code for 'd' is 110.

• As you can see, 'a' has a shorter code than 'd'. Notice that since all the characters are at the leafs of the tree (the ends), there is never a chance that one code will be the prefix of another one (eg. 'a' is 00 and 'b' is 011

• ).• Hence, this unique prefix property assures that each

code can be uniquely decoded.

The Concept10

Huffman Codes

• How to construct Huffman codes?• Build a binary tree!

11

Binary Tree

• A binary tree is a tree where each internal node has at most 2 children.

12

A

B

D

C

FE G

The “Root Node”

The “Left Node” The “Right Node”

Sibling Nodes

Internal Vertex

Leaf Vertex

13

Binary Tree

Algorithm

1.Take the characters and their frequencies, and this list by increasing frequency.

2 .All the characters are vertices of the tree.3.Take the first 2 vertices from the list and make them

children of a vertex having the sum of their frequencies.4.Insert the new vertex into the sorted list of vertices

waiting to be put into the tree.5.If there are at least 2 vertices in the list, go to step 3.6.Read the Huffman code from the tree.

14

Huffman’s Code

1.Take the characters and their frequencies, and sort this list by increasing frequency

A: 3, O: 5, T: 7, E:10

15

2. All the characters are the vertices of the tree:

A:3 O:5 E:10T:7

Huffman’s Code16

3 .Take the first two vertices from the list and make them children of a vertex having the sum of their frequencies

A:3 O:5

* : 8

Huffman’s Code17

4 .Insert the new vertex into the sorted list of vertices waiting to be put into tree

• List of remaining vertices

• New list, with the new vertex inserted

T:7

T:7 E:108

E:10

Huffman’s Code18

5. Take the first 2 vertices from the list and make them

Children of the vertex having the sum of their frequencies.

A:3

*: 15

*: 8

O:5

New vertex withfrequency 7 + 8 = 15

Huffman’s Code19

T:7

6. Insert the new vertex into the sorted list of vertices waiting to be put into tree.

• List of remaining vertices

• New list, with the new vertex inserted

E:10 * : 15

E:10

Huffman’s Code20

7. Take the

first two

Vertices from

the list and make

them children of

the vertex having

the sum of their frequencies.

E:10

T:7

*:25

*:15

A:3

*:8

O:5

New vertex with frequency 10 +15 = 25Huffman’s Code

21

• Left branch = 0 • Right branch = 1 • Huffman CodeE:0T:10A:110O: 111

0 1

0 1

0 1

Huffman’s Code

22

T:7

*:25

*:15

A:3

*:8

O:5

E:10

Advantage of Huffman Codes

• Reduce size of data by 20%-90% in general• If no characters occur more frequently than others,

then no advantage over ASCII.• Encoding: –Given the characters and their frequencies, perform

the algorithm and generate a code. Write the characters using the code.

• Decoding: Given the Huffman tree, figure out what each

character is (possible because of prefix property).

23

Applications

• Both the .mp3 and .jpg file formats use

Huffman coding at one stage of the compression.

• Alternative method that achieves higher compression but is slower is patented by IBM, making Huffman Codes attractively.

24

Program25

References

•http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html

• http://www.huffmancoding.com/david/algorithm.html

• http://www.cs.sfu.ca/CourseCentral/365/li/squeeze/Huffman.htm

26

27

TashakuR

data structures' project

Technology

huffman tree

huffman code t

sorted list of vertices

list of remaining vertices

tree t

list andmake

probable characters

advantage of huffman