data structures' project
TRANSCRIPT
1
COMSATS Institute of Information & Technology, Islamabad
Subject : Data Structures
A presentation
by
Ayesha Arif
FA10-BSB-014
&
Ghazanfar Ali BaigFA10-BSB-024
Instructor Name:
Ms. Nusrat Shaheen
Huffman’s Code
3
Background
• When we encode characters in computers, we assign each an 8-bit code based on an ASCII chart.
• But in most files, some characters appear more often than others.
• So wouldn't it make more sense to assign shorter codes for characters that appear more often and longer codes for characters that appear less often?
4
Data Compressions
• Lossless compressions:Less space is used, or energy to transmit it. –No information is lost –we can reconstruct
the original text–Often: be able to quickly reconstruct original
text • Lossy compressions.
5
Data Compressions
• Text representation:• ASCII: 1 byte per character (fixed length
code)– “M A R C”
1st Byte 2nd Byte 3rd Byte 4th byte77 97 114 99
01001101 01100001 01110010 01100011
6
• In the English language, the letter ‘e’ is, ‘quite common x’ is less common–Why should all characters be represented using the same number of bits?
7
Huffman
• 1951, David Huffman found the “most efficient method of representing numbers, letters, and other symbols using binary code”
• Now standard method used for data compression
8
The Concept
• Huffman coding has the following properties:
• Codes for more probable characters are shorter than ones for less probable characters.
• Each code can be uniquely decoded .
9
• To read the codes from a Huffman tree, start from the root and add a '0' every time you go left to a child, and add a '1' every time you go right. So in this example, the code for the character 'b' is 01 and the code for 'd' is 110.
• As you can see, 'a' has a shorter code than 'd'. Notice that since all the characters are at the leafs of the tree (the ends), there is never a chance that one code will be the prefix of another one (eg. 'a' is 00 and 'b' is 011
• ).• Hence, this unique prefix property assures that each
code can be uniquely decoded.
The Concept10
Huffman Codes
• How to construct Huffman codes?• Build a binary tree!
11
Binary Tree
• A binary tree is a tree where each internal node has at most 2 children.
12
A
B
D
C
FE G
The “Root Node”
The “Left Node” The “Right Node”
Sibling Nodes
Internal Vertex
Leaf Vertex
13
Binary Tree
Algorithm
1.Take the characters and their frequencies, and this list by increasing frequency.
2 .All the characters are vertices of the tree.3.Take the first 2 vertices from the list and make them
children of a vertex having the sum of their frequencies.4.Insert the new vertex into the sorted list of vertices
waiting to be put into the tree.5.If there are at least 2 vertices in the list, go to step 3.6.Read the Huffman code from the tree.
14
Huffman’s Code
1.Take the characters and their frequencies, and sort this list by increasing frequency
A: 3, O: 5, T: 7, E:10
15
2. All the characters are the vertices of the tree:
A:3 O:5 E:10T:7
Huffman’s Code16
3 .Take the first two vertices from the list and make them children of a vertex having the sum of their frequencies
A:3 O:5
* : 8
Huffman’s Code17
4 .Insert the new vertex into the sorted list of vertices waiting to be put into tree
• List of remaining vertices
• New list, with the new vertex inserted
T:7
T:7 E:108
E:10
Huffman’s Code18
5. Take the first 2 vertices from the list and make them
Children of the vertex having the sum of their frequencies.
A:3
*: 15
*: 8
O:5
New vertex withfrequency 7 + 8 = 15
Huffman’s Code19
T:7
6. Insert the new vertex into the sorted list of vertices waiting to be put into tree.
• List of remaining vertices
• New list, with the new vertex inserted
E:10 * : 15
E:10
Huffman’s Code20
7. Take the
first two
Vertices from
the list and make
them children of
the vertex having
the sum of their frequencies.
E:10
T:7
*:25
*:15
A:3
*:8
O:5
New vertex with frequency 10 +15 = 25Huffman’s Code
21
• Left branch = 0 • Right branch = 1 • Huffman CodeE:0T:10A:110O: 111
0 1
0 1
0 1
Huffman’s Code
22
T:7
*:25
*:15
A:3
*:8
O:5
E:10
Advantage of Huffman Codes
• Reduce size of data by 20%-90% in general• If no characters occur more frequently than others,
then no advantage over ASCII.• Encoding: –Given the characters and their frequencies, perform
the algorithm and generate a code. Write the characters using the code.
• Decoding: Given the Huffman tree, figure out what each
character is (possible because of prefix property).
23
Applications
• Both the .mp3 and .jpg file formats use
Huffman coding at one stage of the compression.
• Alternative method that achieves higher compression but is slower is patented by IBM, making Huffman Codes attractively.
24
Program25
References
•http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html
• http://www.huffmancoding.com/david/algorithm.html
• http://www.cs.sfu.ca/CourseCentral/365/li/squeeze/Huffman.htm
26
27
TashakuR