9 trees iv

88
TREES Table of Contents: Heapsort B Tree Huffman’s Algorithm

Upload: shankar-bishnoi

Post on 15-Jul-2015

81 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: 9 trees  iv

TREES

Table of Contents:

•Heapsort

• B Tree

• Huffman’s Algorithm

Page 2: 9 trees  iv

Heap

• Suppose H is a complete binary tree with n elements. Then H is called a heap or a maxheap if each node N of H has the property that value of N is greater than or equal to value at each of the children of N.

• Analogously, a minheap is a heap such that value at N is less than or equal to the value of each of its children

Page 3: 9 trees  iv

97

88 95

66 55 87 48

25 3877623524

26

48

40 39

30

18 17

Example of Max Heap

Page 4: 9 trees  iv

Inserting an Element in a Heap Suppose H is a heap with N elements, and suppose an ITEM of

information is given. We insert ITEM into the heap H as follows:

• First adjoin the ITEM at the end of H so that H is still a complete tree but not necessarily a heap.

• Then let the ITEM rise to its appropriate place in H so that H is finally a heap.

[Heap is more efficiently implemented through array rather than linked list. In a heap, the location of parent of a node PTR is given by PTR/2 ]

Page 5: 9 trees  iv

Build a Maxheap

Following are the elements:

44,30,50, 22,60,55,77,55

Page 6: 9 trees  iv

Algorithm: INSHEAP( TREE, N, ITEM)

A heap H with N elements is stored in the array TREE and an ITEM of information is given. This procedure inserts the ITEM as the new element of H. PTR gives the location of ITEM as it rises in the tree and PAR denotes the parent of ITEM

1. [Add new node to H and Initialize PTR]

Set N:= N +1 and PTR:=N

2. [Find Location to Insert ITEM

Repeat steps 3 to 6 while PTR > 1

3. Set PAR:= └PTR/2 ┘ [Location of Parent node]

4. If ITEM ≤ TREE[PAR], then:

Set TREE[PTR]:=ITEM and Return

[End of If Structure]

5. Set TREE[PTR]:=TREE[PAR] [Moves node down]

6. Set PTR:=PAR [updates PTR]

[End of step 2 Loop]

7. Set TREE[1]:=ITEM

8. Return

Page 7: 9 trees  iv

Deleting the Root node in a heap

Suppose H is a heap with N elements and suppose we want to delete the root R of H. This is accomplished as follows:

• Assign the root R to some variable ITEM

• Replace the deleted node R by last node L of H so that H is still a complete tree but not necessarily a heap.

• Let L sink to its appropriate place in H so that H is finally a heap.

Page 8: 9 trees  iv

95

85 70

55 33 30 65

15 20 15 22

DELETE 95

Page 9: 9 trees  iv

Algorithm: DELHEAP( TREE, N , ITEM )

A heap H with N elements is stored in the array TREE. This algorithm assigns the root TREE[1] of H to the variable ITEM and then reheaps the remaining elements. The variable LAST stores the value of the original last node of H. The pointers PTR, LEFT and RIGHT give the Location of LAST and its left and right children as LAST sinks into the tree.

Page 10: 9 trees  iv

1: Set ITEM:=TREE[1] [removes root of H]

2: Set LAST:=TREE[N] and N:=N-1 [removes last node of H]

3: Set PTR:=1, LEFT:=2 and RIGHT:=3

4: Repeat step 5 to 7 while RIGHT ≤ N:

5: If LAST ≥ TREE[LEFT] and LAST ≥ TREE [RIGHT] , then:

Set TREE[PTR]:=LAST and Return

6: If TREE[RIGHT]≤ TREE[LEFT], then:

Set TREE[PTR]:=TREE[LEFT]

Set PTR:=LEFT

Else:

Set TREE[PTR]:=TREE[RIGHT] and PTR:=RIGHT

[End of If structure]

Set LEFT:= 2* PTR and RIGHT:=LEFT + 1

[End of Loop]

7: If LEFT=N and If LAST < TREE[LEFT], then:

Set TREE[PTR]:=TREE[LEFT] and Set PTR:=LEFT

8: Set TREE[PTR]:=LAST

9: Return

Page 11: 9 trees  iv

Application of Heap

HeapSort- One of the important applications of heap is sorting of an array using heapsort method. Suppose an array A with N elements is to be sorted. The heapsort algorithm sorts the array in two phases:

• Phase A: Build a heap H out of the elements of A

• Phase B: Repeatedly delete the root element of H

Since the root element of heap contains the largest element of the heap, phase B deletes the elements in decreasing order. Similarly, using heapsort in minheap sorts the elements in increasing order as then the root represents the smallest element of the heap.

Page 12: 9 trees  iv

Algorithm: HEAPSORT(A,N)

An array A with N elements is given. This algorithm sorts the elements of the array

• Step 1: [Build a heap H, call the procedure ]

Repeat for J=1 to N-1:

Call INSHEAP(A, J, A[J+1])

[End of Loop]• Step 2: [Sort A repeatedly deleting the root of H]

Repeat while N > 1:

(a) Call DELHEAP( A, N, ITEM)

(b) Set A[N + 1] := ITEM [Store the elements deleted from

the heap]

[End of loop]• Step 3: Exit

Page 13: 9 trees  iv

Complexity of HeapSort

• Phase1 (Build a heap H out of the ‘n’ elements of A):

g(n) ≤ nlog2n

• Phase 2 (Repeatedly delete the root element of H):

h(n) ≤ nlog2n

Therefore, f(n) = O(nlog2n) (In worst case)

• Better than Bubblesort (O(n2 )) and Quicksort (Avg- O(nlog2n), Worst O(n2 ))

Page 14: 9 trees  iv

• Problem: Create a Heap out of the following data:

jan feb mar apr may jun jul aug sept oct nov dec

Page 15: 9 trees  iv

B-Trees

Page 16: 9 trees  iv

Motivation for B-Trees

• Index structures for large datasets cannot be stored in main memory

• Storing it on disk requires different approach to efficiency

• Assuming that a disk spins at 3600 RPM, one revolution occurs in 1/60 of a second, or 16.7ms

• Crudely speaking, one disk access takes about the same time as 200,000 instructions

Page 17: 9 trees  iv

Motivation (cont.)

• Assume that we use an AVL tree to store about 20 million records

• We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds

• We know we can’t improve on the log n lower bound on search for a binary tree

• But, the solution is to use more branches and thus reduce the height of the tree!

– As branching increases, depth decreases

Page 18: 9 trees  iv

Definition of a B-tree• A B-tree of order m is an m-way tree (i.e., a tree

where each node may have up to m children) in which:

1.the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree

2.all leaves are on the same level

3.all non-leaf nodes except the root have at least m / 2 children

4.the root is either a leaf node, or it has from two to m children

5.a leaf node contains no more than m – 1 keys

• The number m should always be odd

Page 19: 9 trees  iv

An example B-Tree

6251426 12

26

55 60 7064 9045

1 2 4 7 8 1513 18 25

27 29 46 48 53

A B-tree of order 5 containing 26 items

Note that all the leaves are at the same levelNote that all the leaves are at the same level

Page 20: 9 trees  iv

• Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45

• We want to construct a B-tree of order 5

• The first four items go into the root:

• To put the fifth item in the root would violate condition 5

• Therefore, when 25 arrives, pick the middle key to make a new root

Constructing a B-tree

1 2 8 12

Page 21: 9 trees  iv

Constructing a B-tree (contd.)

1 2

8

12 25

6, 14, 28 get added to the leaf nodes:

1 2

8

12 146 25 28

Page 22: 9 trees  iv

Constructing a B-tree (contd.)

Adding 17 to the right leaf node would over-fill it, so we take the middle key, promote it (to the root) and split the leaf

8 17

12 14 25 281 2 6

7, 52, 16, 48 get added to the leaf nodes

8 17

12 14 25 281 2 6 16 48 527

Page 23: 9 trees  iv

Constructing a B-tree (contd.)Adding 68 causes us to split the right most leaf, promoting 48 to the root, and adding 3 causes us to split the left most leaf, promoting 3 to the root; 26, 29, 53, 55 then go into the leaves

3 8 17 48

52 53 55 6825 26 28 291 2 6 7 12 14 16

Adding 45 causes a split of 25 26 28 29

and promoting 28 to the root then causes the root to split

Page 24: 9 trees  iv

Constructing a B-tree (contd.)

17

3 8 28 48

1 2 6 7 12 14 16 52 53 55 6825 26 29 45

Page 25: 9 trees  iv

Inserting into a B-Tree

• Attempt to insert the new key into a leaf

• If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent

• If this would result in the parent becoming too big, split the parent into two, promoting the middle key

• This strategy might have to be repeated all the way to the top

• If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher

Page 26: 9 trees  iv

Exercise in Inserting a B-Tree

• Insert the following keys to a 5-way B-tree:

3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

Page 27: 9 trees  iv

Removal from a B-tree

• During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:

CASE: 1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.

CASE: 2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case we can delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position.

Page 28: 9 trees  iv

Removal from a B-tree (2)• If (1) or (2) lead to a leaf node containing less than

the minimum number of keys then we have to look at the siblings immediately adjacent to the leaf in question:

CASE: 3- If one of them has more than the min. number of keys then we can promote one of its keys to the parent and take the parent key into our lacking leaf

CASE:4 - If neither of them has more than the min. number of keys then the lacking leaf and one of its neighbours can be combined with their shared parent (the opposite of promoting a key) and the new leaf will have the correct number of keys; if this step leave the parent with too few keys then we repeat the process up to the root itself, if required

Page 29: 9 trees  iv

Type #1: Simple leaf deletion

1212 2929 5252

22 77 99 1515 2222 5656 6969 72723131 4343

Delete 2: Since there are enoughkeys in the node, just delete it

Assuming a 5-wayB-Tree, as before...

Page 30: 9 trees  iv

Type #2: Simple non-leaf deletion

1212 2929 5252

77 99 1515 2222 5656 6969 72723131 4343

Delete 52

Borrow the predecessoror (in this case) successor

5656

Delete 52Delete 72

Page 31: 9 trees  iv

Type #4: Too few keys in node and its siblings

1212 2929 5656

77 99 1515 2222 6969 72723131 4343

Delete 72Too few keys!

Join back together

Page 32: 9 trees  iv

Type #4: Too few keys in node and its siblings

1212 2929

77 99 1515 2222 696956563131 4343

Delete 22

Page 33: 9 trees  iv

Type #3: Enough siblings

1212 2929

77 99 1515 2222 696956563131 4343

Delete 22

Demote root key andpromote leaf key

Page 34: 9 trees  iv

Type #4: Too few keys in node and its siblings

1212 2929 5656

77 99 1515 2222 6969 72723131 4343

Delete 72Too few keys!

Join back together

Page 35: 9 trees  iv

Type #3: Enough siblings

1212

292977 99 1515

3131

696956564343

Page 36: 9 trees  iv

Deletion Exercise

Delete 95,226

Given a B tree of Order 5

Page 37: 9 trees  iv

Result after deletion of 95,226

Delete 221

Page 38: 9 trees  iv

Result after deletion of 221

Delete 70

Page 39: 9 trees  iv

B Tree after Deletion of 70

Page 40: 9 trees  iv

Exercise to do

• Given 5-way B-tree created by these data (last exercise):

3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

• Further Add the following keys:

– 2, 6,12

• Delete the following keys:

– 4, 5, 7, 3, 14

Page 41: 9 trees  iv

Comparing Trees• Binary trees

– Can become unbalanced and lose their good time complexity (big O)

– AVL trees are strict binary trees that overcome the balance problem

– Heaps remain balanced but only prioritise (not order) the keys

• Multi-way trees

– B-Trees can be m-way, they can have any (odd) number of children

– One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a permanently balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations

Page 42: 9 trees  iv

Huffman Coding: An Application of Binary Trees and

Priority Queues

Page 43: 9 trees  iv

Encoding and Compression of Data

• ASCII

• Variations on ASCII

– min number of bits needed

– cost of savings

– patterns

– modifications

Page 44: 9 trees  iv

Purpose of Huffman Coding

• Proposed by Dr. David A. Huffman in 1952

– “A Method for the Construction of Minimum Redundancy Codes”

• Applicable to many forms of data transmission

– Our example: text files

Page 45: 9 trees  iv

The Basic Algorithm

• Huffman coding is a form of statistical coding

• Not all characters occur with the same frequency!

• Yet all characters are allocated the same amount of space

– 1 char = 1 byte, be it e or x

Page 46: 9 trees  iv

The Basic Algorithm

• Any savings in tailoring codes to frequency of character?

• Code word lengths are no longer fixed like ASCII.

• Code word lengths vary and will be shorter for the more frequently used characters.

Page 47: 9 trees  iv

The (Real) Basic Algorithm

1. Scan text to be compressed and tally occurrence of all characters.

2. Sort or prioritize characters based on number of occurrences in text.

3. Build Huffman code tree based on prioritized list.

4. Perform a traversal of tree to determine all code words.

5. Scan text again and create new file using the Huffman codes.

Page 48: 9 trees  iv

Building a TreeScan the original text

• Consider the following short text:

Eerie eyes seen near lake.• Count up the occurrences of all characters in the text

Page 49: 9 trees  iv

Building a TreeScan the original text

Eerie eyes seen near lake.

• What characters are present?

E e r i space y s n a r l k .

Page 50: 9 trees  iv

Building a TreeScan the original text

Eerie eyes seen near lake.• What is the frequency of each character in the text?

Page 51: 9 trees  iv

Building a TreePrioritize characters

• Create binary tree nodes with character and frequency of each character

• Place nodes in a priority queue

– The lower the occurrence, the higher the priority in the queue

Page 52: 9 trees  iv

Building a Tree

• The queue after inserting all nodes

• Null Pointers are not shown

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Page 53: 9 trees  iv

Building a Tree

• While priority queue contains two or more nodes

– Create new node

– Dequeue node and make it left subtree

– Dequeue next node and make it right subtree

– Frequency of new node equals sum of frequency of left and right children

– Enqueue new node back into queue

Page 54: 9 trees  iv

Building a Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Page 55: 9 trees  iv

Building a Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

Page 56: 9 trees  iv

Building a Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

Page 57: 9 trees  iv

Building a Tree

E

1

i

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

Page 58: 9 trees  iv

Building a Tree

E

1

i

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

Page 59: 9 trees  iv

Building a Tree

E

1

i

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

Page 60: 9 trees  iv

Building a Tree

E

1

i

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

Page 61: 9 trees  iv

Building a Tree

E

1

i

1

n

2

a

2sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

Page 62: 9 trees  iv

Building a Tree

E

1

i

1

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

Page 63: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

Page 64: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

Page 65: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4

Page 66: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2k

1

.

1

2

r

2

s

2

4

n

2

a

2

4 4

Page 67: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4 4

6

Page 68: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2r

2

s

2

4

n

2

a

2

4 4 6

What is happening to the characters with a low number of occurrences?

Page 69: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4 6

8

Page 70: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4 6 8

Page 71: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10

Page 72: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2r

2

s

2

4

n

2

a

2

4 46

8 10

Page 73: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10

16

Page 74: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10 16

Page 75: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

Page 76: 9 trees  iv

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

After enqueueing this node there is only one node left in priority queue.

Page 77: 9 trees  iv

Building a Tree

Dequeue the single node left in the queue.

This tree contains the new code words for each character.

Frequency of root node should equal number of characters in text.

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Eerie eyes seen near lake. � 26 characters

Page 78: 9 trees  iv

Encoding the FileTraverse Tree for Codes

• Perform a traversal of the tree to obtain new code words

• Going left is a 0 going right is a 1

• code word is only completed when a leaf node is reached

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Page 79: 9 trees  iv

Encoding the FileTraverse Tree for Codes

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space011e 10r 1100s 1101n 1110a 1111

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Page 80: 9 trees  iv

Encoding the File

• Rescan text and encode file using new code words

Eerie eyes seen near lake.

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space011e 10r 1100s 1101n 1110a 1111

0000101100000110011100010101101101001111101011111100011001111110100100101

• Why is there no need for a separator character?

.

Page 81: 9 trees  iv

Encoding the FileResults

• Have we made things any better?

• 73 bits to encode the text

• ASCII would take 8 * 26 = 208 bits

0000101100000110011100010101101101001111101011111100011001111110100100101

If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great.

Page 82: 9 trees  iv

Decoding the File

• How does receiver know what the codes are?

• Tree constructed for each text file.

– Considers frequency for each file

– Big hit on compression, especially for smaller files

• Tree predetermined

– based on statistical analysis of text files or file types

• Data transmission is bit based versus byte based

Page 83: 9 trees  iv

Decoding the File

• Once receiver has tree it scans incoming bit stream

• 0 ⇒ go left

• 1 ⇒ go right

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

10100011011110111101111110000110101

Page 84: 9 trees  iv

Summary

• Huffman coding is a technique used to compress files for transmission

• Uses statistical coding

– more frequently used symbols have shorter code words

• Works well for text and fax transmissions

• An application that uses several data structures

Page 85: 9 trees  iv

HUFFMAN’S Algorithm

• Data ITEM : A B C D E F G H

• Weight 22 5 11 19 2 11 25 5

Page 86: 9 trees  iv

A

BCF

DE

G

Page 87: 9 trees  iv

G

CBA

E DF

H

Page 88: 9 trees  iv

Z

Y

W S

TPR

X Q