priority queues, trees, and huffman encoding cs 244 this presentation requires audio enabled brent...

80
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program Department of Mathematics, Statistics, and Computer Science University of Wisconsin – Stout Content derived from: Data Structures Using C++ (D.S. Malik) and Data Structures and Algorithms in C++ (Goodrich, Tamassia, Mount)c

Upload: dominick-snow

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Object Oriented Programming and C++

Priority Queues, Trees, and Huffman EncodingCS 244This presentation requires Audio EnabledBrent M. Dingle, Ph.D.Game Design and Development ProgramDepartment of Mathematics, Statistics, and Computer ScienceUniversity of Wisconsin Stout

Content derived from: Data Structures Using C++ (D.S. Malik)and Data Structures and Algorithms in C++ (Goodrich, Tamassia, Mount)c1PreviouslyPriority Queuesnodes have keys and dataMin PQs and Max PQs

HeapsThought of as a binary tree (pile of stuff = heap)but implemented using a vector arrayTop Down Build takes O(n lg n) timeBottom Up Build takes O(n) time2 Major properties: Structure and OrderComplete Binary TreeMin Heaps and Max Heaps

PQ SortingImplemented using an unsorted list (selection sort)O(n2)Implemented using a sorted list (insertion sort)O(n2)Implemented using a Heap (heap sort)O(n * lg n)Priority Queue ADT A priority queue stores a collection of itemsAn item is a pair(key, element)Main methods of the Priority Queue ADTinsertItem(k, o)inserts an item with key k and element oremoveMin()removes the item with the smallest keyAdditional methodsminKey()returns, but does not remove, the smallest key of an itemminElement()returns, but does not remove, the element of an item with smallest keysize(), isEmpty()Applications:Standby flyersAuctionsWe will be using these two functions heavily in the very near futureMarker SlideGeneral Questions?

NextHuffman EncodingData CompressionPQs and TreesWhat can they be used for?

How about data compression?Putting Trees into PQs

PQs and TreesPutting Trees into PQsMixing Data Structures with Data Structures?No good can come from this. =)

Definition: File CompressionCompression: the process of encoding information in fewer bitsWasting space is bad, so compression is good

Compression is useful for many thingsstoring photosreducing email attachment sizereduction of other media files (mp3s, DVD, Blu-Ray)allows voice calls over low band-width connection cell phones, Skype, etc

Common compression programsWinZip, WinRAR, 7Zip, etcASCII EncodingAmerican Standard Code for Information InterchangeEncodes 128 characters7 bit encoding scheme2^7 = 128Implemented using 8 bitsfirst bit always 0

ObservationSome characters in the English alphabet occur more frequently than othersThe table below is based on Robert Lewand's Cryptological Mathematics

Huffman EncodingHuffman encoding: Uses variable lengths for different characters to take advantage of their relative frequenciesSome characters occur more often than othersIf those characters use < 8 bits each, the file will be smallerOther characters may need > 8 bitsbut thats ok they dont show up often

Huffmans AlgorithmThe idea: Create a Huffman Tree that will tell us a good binary representation for each characterLeft means 0Right means 1Example 'b is 10

More frequent characters will be higher in the tree (have a shorter binary value).

To build this tree, we must do a few steps firstCount occurrences of each unique character in the file to compress

Use a priority queue to order them from least to most frequent

Make a tree and use it

Huffman Compression OverviewStep 1Count characters (frequency of characters in the message)

Step 2Create a Priority Queue

Step 3Build a Huffman Tree

Step 4Traverse the Tree to find the Character to Binary Mapping

Step 5Use the mapping to encode the messageHuffman Compression OverviewCount the occurrences of each character in the fileSay: ' ' = 2, 'a' = 3, 'b' = 3, 'c' = 1, EOF = 1

Place character and counts in a priority queue

Use priority queue to create Huffman tree

Traverse tree to find (char binary) map' ' = 00, 'a' = 11, 'b' = 10, 'c' = 010, EOF = 011

For each char in file, convert to compressed binary version11 10 00 11 10 00 010 1 1 10 011

Step 1: Count CharactersExample message (input file) contents:ab ab cabcounts: { ' ' = 2, 'b'=3, 'a' =3, 'c' =1, EOF=1 }

File size currently = 10 bytes = 80 bitsfile ends with an invisible EOF characterbyte12345678910char'a''b'' ''a''b'' ''c''a''b'EOFASCII979832979832999798256binary011000010110001000100000011000010110001000100000011000110110000101100010N/AStep 2: Create a Priority QueueEach node of the PQ is a treeThe root of the tree is the keyThe other internal nodes hold subkeysThe leaves hold the character values

Insert each into the PQ using the PQs functioninsertItem(count, character)

The PQ should organize them into ascending orderSo the largest value is lowest priorityYou can think of this as an ordered listBut the PQ could be implemented in whatever way works bestcould be a minheap

.

These are all one node treesfront and back used to show priority orderingi.e. the order things would be removed from the PQ1Step 2: Create a Priority QueueEach node of the PQ is a treeThe root of the tree is the keyThe other internal nodes hold subkeysThe leaves hold the character values

Insert each into the PQ using the PQs functioninsertItem(count, character)

The PQ should organize them into ascending orderSo the smallest value is highest priorityWe will use an example with the PQ implemented as an ordered listBut the PQ could be implemented in whatever way works bestcould be a minheap, unordered list, or otherStep 2: PQ Creation, An IllustrationFrom step 1 we havecounts: { ' ' = 2, 'b'=3, 'a'=3, 'c'=1, EOF=1 }Make these into treesAdd the trees to a Priority QueueAssume PQ is implemented as a sorted listStep 2: PQ Creation, An IllustrationFrom step 1 we havecounts: { ' ' = 2, 'b'=3, 'a'=3, 'c'=1, EOF=1 }Make these into treesAdd the trees to a Priority QueueAssume PQ is implemented as a sorted list1EOF1c3a3b2' 'Step 2: PQ Creation, An IllustrationFrom step 1 we havecounts: { ' ' = 2, 'b'=3, 'a'=3, 'c'=1, EOF=1 }Make these into treesAdd the trees to a Priority QueueAssume PQ is implemented as a sorted list1EOF1c3a3b2' 'Recall: Each Insert is an O(n) operationThis is done n times. So this is O(n2)Step 3: Build the Huffman TreeAll nodes should be in the PQ

While PQ.size() > 1Remove the two highest priority (rarest) nodes Nodes removed first time through will be single node treesRemoval done using PQs removeMin() functionCombine the two nodes into a single nodeFirst time through this will create a tree of 3 nodesroot node = sum of the counts of the 2 selected charsSo the new node is a tree with root having key value = sum of keys of nodes removedleft subtree is the first removed noderight subtree is the second removed nodeExample next slideStep 3: Build the Huffman TreeAside: All nodes should be in the PQWhile PQ.size() > 1Remove the two highest priority (rarest) nodes Removal done using PQs removeMin() functionCombine the two nodes into a single nodeSo the new node is a tree with root has key value = sum of keys of nodes being combinedleft subtree is the first removed noderight subtree is the second removed nodeInsert the combined node back into the PQend While

Remove the one node from the PQThis is the Huffman TreeExample next slideStep 3a: Building Huffman Tree, Illus.Remove the two highest priority (rarest) nodes1EOF1c3b3a2' 'Step 3b: Building Huffman Tree, Illus.Combine the two nodes into a single node3b3a2' '1EOF1c2Step 3c: Building Huffman Tree, Illus.Insert the combined node back into the PQ3b3a2' '1EOF1c2Step 3d: Building Huffman Tree, Illus.PQ has 4 nodes still, so repeat3b3a2' '1EOF1c2Step 3a: Building Huffman Tree, Illus.Remove the two highest priority (rarest) nodes3b3a2' '1EOF1c2Step 3b: Building Huffman Tree, Illus.Combine the two nodes into a single node3b3a2' '1EOF1c24Step 3c: Building Huffman Tree, Illus.Insert the combined node back into the PQ3b3a2' '1EOF1c24Step 3d: Building Huffman Tree, Illus.3 nodes remain in PQ, repeat again3b3a2' '1EOF1c24Step 3a: Building Huffman Tree, Illus.Remove the two highest priority (rarest) nodes3b3a2' '1EOF1c24Step 3b: Building Huffman Tree, Illus.Combine the two nodes into a single node3b3a2' '1EOF1c246Step 3c: Building Huffman Tree, Illus.Insert the combined node back into the PQ2' '1EOF1c243b3a6Step 3d: Building Huffman Tree, Illus.2 nodes still in PQ, repeat one more time2' '1EOF1c243b3a6Step 3a: Building Huffman Tree, Illus.Remove the two highest priority (rarest) nodes2' '1EOF1c243b3a6Step 3b: Building Huffman Tree, Illus.Combine the two nodes into a single node2' '1EOF1c243b3a610Step 3c: Building Huffman Tree, Illus.Insert the combined node back into the PQ2' '1EOF1c243b3a610Step 3d: Building Huffman Tree, Illus.Only 1 node remains in the PQ, so while loop ends2' '1EOF1c243b3a610Step 3: Building Huffman Tree, Illus.Huffman tree is complete2' '1EOF1c243b3a610Huffman Compression OverviewCount the occurrences of each character in the fileSay: ' ' = 2, 'a' = 3, 'b' = 3, 'c' = 1, EOF = 1

Place character and counts in a priority queue

Use priority queue to create Huffman tree

Traverse tree to find (char binary) map' ' = 00, 'a' = 11, 'b' = 10, 'c' = 010, EOF = 011

For each char in file, convert to compressed binary version11 10 00 11 10 00 010 1 1 10 011

Huffman Compression OverviewStep 1Count characters (frequency of characters in the message)

Step 2Create a Priority Queue

Step 3Build a Huffman Tree

Step 4Traverse the Tree to find the Character to Binary Mapping

Step 5Use the mapping to encode the messageStep 4: Traverse Tree to Find the Character to Binary Mapping' ' ='c' =EOF ='b' ='a' =

2' '1EOF1c243b3a610RecallLeft is 0Right is 1Step 4: Traverse Tree to Find the Character to Binary Mapping' ' ='c' =EOF ='b' ='a' =

2' '1EOF1c243b3a610RecallLeft is 0Right is 10000Step 4: Traverse Tree to Find the Character to Binary Mapping' ' ='c' =EOF ='b' ='a' =

2' '1EOF1c243b3a610RecallLeft is 0Right is 101000010Step 4: Traverse Tree to Find the Character to Binary Mapping' ' ='c' =EOF ='b' ='a' =

2' '1EOF1c243b3a610RecallLeft is 0Right is 101001010011Step 4: Traverse Tree to Find the Character to Binary Mapping' ' ='c' =EOF ='b' ='a' =

2' '1EOF1c243b3a610RecallLeft is 0Right is 11000010011 10Step 4: Traverse Tree to Find the Character to Binary Mapping' ' ='c' =EOF ='b' ='a' =

2' '1EOF1c243b3a610RecallLeft is 0Right is 11100010011 10 11Huffman Compression OverviewCount the occurrences of each character in the fileSay: ' ' = 2, 'a' = 3, 'b' = 3, 'c' = 1, EOF = 1

Place character and counts in a priority queue

Use priority queue to create Huffman tree

Traverse tree to find (char binary) map' ' = 00, 'a' = 11, 'b' = 10, 'c' = 010, EOF = 011

For each char in file, convert to compressed binary version11 10 00 11 10 00 010 1 1 10 011

Huffman Compression OverviewStep 1Count characters (frequency of characters in the message)

Step 2Create a Priority Queue

Step 3Build a Huffman Tree

Step 4Traverse the Tree to find the Character to Binary Mapping

Step 5Use the mapping to encode the messageStep 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

2' '1EOF1c243b3a610Example message (input file) contents:ab ab cabfile ends with an invisible EOF characterSuggested Assignment: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF characterICA310_EncodeMessage

Save the encoded message as a text (.txt) file and upload to the appropriate D2L dropboxbefore class ends todayStep 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character11Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character1110Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character111000Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character11100011Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character1110001110Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character111000111000Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character111000111000010Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character11100011100001011Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character1110001110000101110Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character1110001110000101110011Step 5: Encode the Message' ' = 00'c' = 010EOF = 011'b' = 10'a' = 11

Example message (input file) contents:ab ab cabfile ends with an invisible EOF character1110001110000101110011Count the bits used = 22 bitsversus the 80previously neededFile is almost the sizelots of savingsDecompressionFrom the previous tree shown we now have the message characters encoded as:

Which compresses to bytes 3 like so:

How to decompress?Hint: Lookup table is not the best answer, what is the first symbol?... 1=? or is it 11? or 111? or 1110? orchar'a''b'' ''a''b'' ''c''a''b'EOFbinary1110001110000101110011byte123chara b ab c a b EOFbinary11 10 00 1110 00 010 11 10 011Decompression via TreeThe tree is known to the recipient of the messageSo use it

To identify symbols we willApply the Prefix PropertyNo encoding A is the prefix of another encoding BNever will have x011 and y011100110Decompression via TreeApply the Prefix PropertyNo encoding A is the prefix of another encoding BNever will have x011 and y011100110

the AlgorithmRead each bit one at a time from the inputIf the bit is 0 go left in the treeElse if the bit is 1 go rightIf you reach a leaf nodeoutput the character at that leafand go back to the tree rootDecompressing ExampleSay the encrypted message was:1011010001101011011note: this is NOT the same message as the encryption just done (but the tree the is same)Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root2' '1EOF1c243b3a610Class Activity: Decompressing ExampleSay the encrypted message was:1011010001101011011note: this is NOT the same message as the encryption just done (but the tree the is same)Pause for students to completeRead each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root2' '1EOF1c243b3a610Decompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root1Decompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root01bDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root1bDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root1b1aDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root0baDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root01baDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root010bacDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root0bacDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root0bac0Decompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbac1Decompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbac11aDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbac0aDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbac01aDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbac01a0cDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbac1bacacDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root11bacacaDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root0bacacaDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root0bacaca1Decompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree root0bacaca11EOFDecompressing ExampleSay the encrypted message was:10110100011010110112' '1EOF1c243b3a610Read each bit one at a time

If it is 0 go leftIf it is 1 go right

If you reach a leaf, output the character there and go back to the tree rootbacacaEOFAnd the file is now decodedThe End of This PartEnd