heaps and basic data structures david kauchak cs161 summer 2009
TRANSCRIPT
Heaps and basic data structures
David Kauchak
cs161
Summer 2009
Administrative
Homework 2 due date extended to Fri. 7/10 at 5pm
Midterm 7/20 in class. Closed book, etc. Review sessions SCPD students
Discussion board – thanks
Quicksort partitions – the good vs. the bad
Quicksort average case: take 2cn
“good” 50/50 split
)()2
1()1
2
1()( n
nT
nTnT
)1(T )1( nc
)12
1(
nT )
2
1(
nT
“bad” split
We absorb the “bad” partition. In general, we can absorb any constant number of “bad” partitions
Quicksort partitions – the good vs. the bad For Quicksort to “absorb” the cost of bad partitions,
as n grows, the proportion of bad to good partitions cannot grow
Why? If as we increase the size of n, we proportionately
increase the number of good and bad partitions, then there is still a constant number of “bad” partitions to be absorbed by a given “good” partition
If, however, as we increase n the proportion of “bad” partitions increases, then we can no longer absorb the cost since of the “bad” partitions since it depends on n
)()2
1()1
2
1()( n
nT
nTnT
Decision-tree model Full binary tree representing the comparisons
between elements by a sorting algorithm Internal nodes contain indices to be compared
Leaves contain a complete permutation of the input
Tracing a path from root to leave gives the correct reordering/permutation of the input for an input
1:3
| 1,3,2 |
≤ >
| 1,3,2 |
| 2,1,3 |
[3, 12, 7]
[7, 3, 12]
[3, 7, 12]
[3, 7, 12]
Comparison-based sorting
Sorted order is determined based only on a comparison between input elements A[i] < A[j] A[i] > A[j] A[i] = A[j] A[i] ≤ A[j] A[i] ≥ A[j]
This is why most built-in sorting approaches only require you to define the comparison operator (i.e. compareTo in Java)
Can we do better than O(n log n)?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
Is 12 ≤ 7 or is 12 > 7?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
Is 12 ≤ 3 or is 12 > 3?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
Is 12 ≤ 3 or is 12 > 3?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
Is 12 ≤ 3 or is 12 > 3?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
Is 7 ≤ 3 or is 7 > 3?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]
Is 7 ≤ 3 or is 7 > 3?
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]3, 2, 1
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[12, 7, 3]3, 2, 1
[3, 7, 12]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[7, 12, 3]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[7, 12, 3]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[7, 12, 3]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[7, 12, 3]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[7, 12, 3]
A decision tree model
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
[7, 12, 3] [3, 7, 12]
How many leaves are in a decision tree?
Leaves must have all possible permutations of the input
Input of size n, n! leaves What if decision tree model didn’t? Some input would exist that didn’t have a
correct reordering
A lower bound
What is the worst-case number of comparisons for a tree?
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
A lower bound
The longest path in the tree, i.e. the height
1:2≤ >
2:3≤ >
1:3≤ >
|1,2,3|1:3
≤ >2:3
≤ >
|1,3,2| |3,1,2| |2,3,1| |3,2,1|
|2,1,3|
A lower bound What is the maximum number of leaves
a binary tree of height h can have? A complete binary tree has 2h leaves
!2 nh
!lognh log is monotonically increasing
from hw1 )log( nnh
Can we do better than O(n logn) for sorting?
What if I told you the maximum value k that any number could take
and k = O(n)
In some situation (like above) we can sort in Θ(n) counting sort radix sort bucket sort
Leverage additional knowledge about the data besides comparisons
Why don’t we hear about these more?
Constants can be large and running times therefore may be larger for modest input sizes
Cache friendliness Memory (Quicksort sorts in place) Hardware considerations
Data Structures
What is a data structure? Way of storing data that facilitates particular
operations Dynamic set operations: For a set S
Search(S,k) – Does k exist in S? Insert(S,k) – Add k to S Delete(S,x) – Given a pointer/reference, x, to an
elkement, delete it from S Min(S) – Return the smallest element of S Max(S) – Return the largest element of S
Array
Sequential locations in memory in linear order Elements are accessed via index Cost of operations:
Search(S,k) – Insert(S,k) – InsertIndex(S,k) – Delete(S,x) – Min(S) – Max(S) –
O(n)
Θ(n)
Θ(1)
Θ(n)
Θ(n)
Θ(n)
Array
Uses? constant time access of particular indices
Linked list
Elements are arranged linearly. An element in list points to the next element
in the list Cost of operations:
Search(S,k) – Insert(S,k) – InsertIndex(S,k) – Delete(S,x) – Min(S) – Max(S) –
O(n)
Θ(1)
Θ(1)
O(n)Θ(n)
Θ(n)
Linked list
Uses? constant time insertion at the cost of linear time
access
Double linked list
Elements are arranged linearly. An element in list points to the next element
and previous element in the list What does the back link get us? Θ(1) deletion
Stack
LIFO Picture the stack of plates at a buffet Can implement with an array or a linked list
Stack
LIFO Picture the stack of plates at a buffet Can implement with an array or a linked list
push(1)
push(2)
push(3)
pop()
pop()
pop()
3
2
1
top
Stack
Empty – check if stack is empty Array: check if “top” is at index 0 Linked list: check if “top” pointer is null Runtime: Θ(1)
Stack
Pop – removes the top element from the list check if empty, if so, “underflow” Array: return element at “top” and decrement “top” Linked list: return and remove at front of linked list Runtime:
Push – add an element to the list Array: increment “top” and insert element. Must
check for overflow! Linked list: insert element at front of linked list Runtime:
Θ(1)
Θ(1)
Stack
Array or linked list? Array: more memory efficient Linked list: don’t have to worry about “overflow”
Uses? runtime “stack” graph search algorithms (depth first search) syntactic parsing (i.e. compilers)
Queue
FIFO Picture a line at the grocery store Can implement with array or double linked list
Enqueue(1)
Enqueue(2)
Enqueue(3)
Dequeue()
Dequeue()
Dequeue()
1
2
3
head tail
Queue
Operations Empty – Θ(1) Enqueue – add element to end of queue - Θ(1) Dequeue – remove element from the front of the
queue - Θ(1) Uses?
scheduling graph traversal (breadth first search)
Binary heap
A binary tree where the value of a parent is greater than or equal to the value of it’s children
Additional restriction: all levels of the tree are complete except the last
Max heap vs. min heap
Binary heap - operations
Maximum(S) - return the largest element in the set
ExtractMax(S) – Return and remove the largest element in the set
Insert(S, val) – insert val into the set IncreaseElement(S, x, val) – increase the
value of element x to val BuildHeap(A) – build a heap from an array of
elements
Binary heap - pointers
16
14 10
8
2 4 1
7 9 3
parent ≥ child
complete tree
level does not indicate size
all nodes in a heap are themselves heaps
Binary heap - array
Binary heap - array
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
Binary heap - array
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
Left child of A[3]?
Binary heap - array
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
Left child of A[3]?
2*3 = 6
Binary heap - array
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
Parent of A[8]?
Binary heap - array
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
Parent of A[8]?
42/8
Binary heap - array
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
16
14 10
8
2 4 1
7 9 3
Identify the valid heaps
8
[15, 12, 3, 11, 10, 2, 1, 7, 8]
[20, 18, 10, 17, 16, 15, 9, 14, 13]
16
10 15
9 3
Heapify Assume left and right children are heaps,
turn current set into a valid heap
Heapify Assume left and right children are heaps,
turn current set into a valid heap
Heapify Assume left and right children are heaps,
turn current set into a valid heap
find out which is largest: current, left of right
Heapify Assume left and right children are heaps,
turn current set into a valid heap
Heapify Assume left and right children are heaps,
turn current set into a valid heap
if a child is larger, swap and recurse
Heapify
16
3 10
8
2 4 1
7 9 5
16 3 10 8 7 9 5 2 4 1
1 2 3 4 5 6 7 8 9 10
Heapify
16
3 10
8
2 4 1
7 9 5
16 3 10 8 7 9 5 2 4 1
1 2 3 4 5 6 7 8 9 10
Heapify
16
8 10
3
2 4 1
7 9 5
16 8 10 3 7 9 5 2 4 1
1 2 3 4 5 6 7 8 9 10
Heapify
16
8 10
3
2 4 1
7 9 5
16 8 10 3 7 9 5 2 4 1
1 2 3 4 5 6 7 8 9 10
Heapify
16
8 10
4
2 3 1
7 9 5
16 8 10 4 7 9 5 2 3 1
1 2 3 4 5 6 7 8 9 10
Heapify
16
8 10
4
2 3 1
7 9 5
16 8 10 4 7 9 5 2 3 1
1 2 3 4 5 6 7 8 9 10
Heapify
16
8 10
4
2 3 1
7 9 5
16 8 10 4 7 9 5 2 3 1
1 2 3 4 5 6 7 8 9 10
Correctness of Heapify
Remember both the children are valid heaps Three cases: Case 1: A[i] (current node) is the largest
parent is greater than both children both children are heaps current node is a valid heap
Correctness of heapify Case 2: left child is the largest
When Heapify returns: Left child is a valid heap Right child is unchanged and therefore a valid heap Current node is larger than both children since we selected
the largest node of current, left and right current node is a valid heap
Case 3: right child is largest similar to above
Running time of Heapify What is the cost of each call to Heapify?
Θ(1) How many calls are made to Heapify?
O(height of the tree) What is the height of the tree?
Complete binary tree, except for the last level
nh 2
nh 2log
O(log n)
Binary heap - operations
Maximum(S) - return the largest element in the set
ExtractMax(S) – Return and remove the largest element in the set
Insert(S, val) – insert val into the set IncreaseElement(S, x, val) – increase the
value of element x to val BuildHeap(A) – build a heap from an array of
elements
Maximum
Return the largest element from the set
Return A[1]
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
ExtractMax Return and remove the largest element in the set
16
14 10
8
2 4 1
7 9 3
ExtractMax Return and remove the largest element in the set
14 10
8
2 4 1
7 9 3
?
ExtractMax Return and remove the largest element in the set
14
10
8
2 4 1
7 9 3
?
ExtractMax Return and remove the largest element in the set
14
108
2
4
1
7 9 3
?
ExtractMax Return and remove the largest element in the set
14 10
8
2 4 1
7 9 3
?
ExtractMax Return and remove the largest element in the set
14 10
8
2 4
1
7 9 3
ExtractMax Return and remove the largest element in the set
14 10
8
2 4
1
7 9 3
Heapify
ExtractMax Return and remove the largest element in the set
ExtractMax running time
Constant amount of work plus one call to Heapify – O(log n)
IncreaseElement
Increase the value of element x to val
16
14 10
8
2 4 1
7 9 315
IncreaseElement
Increase the value of element x to val
16
14 10
8
2 15 1
7 9 3
IncreaseElement
Increase the value of element x to val
16
14 10
15
2 8 1
7 9 3
IncreaseElement
Increase the value of element x to val
16
14 10
15
2 8 1
7 9 3
IncreaseElement
Increase the value of element x to val
16
15 10
14
2 8 1
7 9 3
IncreaseElement
Increase the value of element x to val
Correctness of IncreaseElement
Why is it ok to swap values with parent?
Correctness of IncreaseElement
Stop when heap property is satisfied
Running time of IncreaseElement
Follows a path from a node to the root Worst case O(height of the tree) O(log n)
Insert Insert val into the set
16
14 10
8
2 4 1
7 9 3
6
Insert Insert val into the set
16
14 10
8
2 4 1
7 9 3
6
Insert Insert val into the set
16
14 10
8
2 4 1
7 9 3
6
propagate value up
Insert
Running time of Insert Constant amount of work plus one call to
IncreaseElement – O(log n)
Building a heap
Can we build a heap using the functions we have so far? Maximum(S) ExtractMax(S) Insert(S, val)| IncreaseElement(S, x, val)
Building a heap
Running time of BuildHeap1
n calls to Insert – O(n log n) Can we get a better bound?
…
Building a heap: take 2
Start with n/2 “simple” heaps call Heapify on element n/2-1, n/2-2, n/2-3 … all children have smaller indices building from the bottom up, makes sure that
all the children are heaps
4 1 3 2 16 9 10 14 8 7
1 2 3 4 5 6 7 8 9 10
4
1 3
2
14 8 7
16 9 10
4 1 3 2 16 9 10 14 8 7
1 2 3 4 5 6 7 8 9 10
4
1 3
2 16
heapify
14 8 7
9 10
4 1 3 2 16 9 10 14 8 7
1 2 3 4 5 6 7 8 9 10
4
1 3
2 16
heapify
14 8 7
9 10
4 1 3 14 16 9 10 2 8 7
1 2 3 4 5 6 7 8 9 10
4
1 3
14
2
heapify
8 7
16 9 10
4 1 3 14 16 9 10 2 8 7
1 2 3 4 5 6 7 8 9 10
4
1 3
14
2
heapify
8 7
16 9 10
4 1 10 14 16 9 3 2 8 7
1 2 3 4 5 6 7 8 9 10
4
1 10
14
2
heapify
8 7
16 9 3
4 1 10 14 16 9 3 2 8 7
1 2 3 4 5 6 7 8 9 10
4
1 10
14
2
heapify
8 7
16 9 3
4 16 10 14 7 9 3 2 8 1
1 2 3 4 5 6 7 8 9 10
4
16 10
14
2
heapify
8 1
7 9 3
4 16 10 14 7 9 3 2 8 1
1 2 3 4 5 6 7 8 9 10
4
16 10
14
2
heapify
8 1
7 9 3
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
16
14 10
8
2
heapify
4 1
7 9 3
Correctness of BuildHeap2
Invariant:
Correctness of BuildHeap2
Invariant: elements A[i+1…n] are all heaps Base case: i = floor(n/2). All elements i+1,
i+2, …, n are “simple” heaps Inductive case: We know i+1, i+2, .., n are all
heaps, therefore the call to Heapify(A,i) generates a heap at node i
Termination?
Running time of BuildHeap2
n/2 calls to Heapify – O(n log n) Can we get a tighter bound?
Running time of BuildHeap2
16
14 10
8
2 4 1
7 9 3all nodes at the same level will have the same cost
How many nodes are at level d? 2d
Running time of BuildHeap2
n
d
d dOnTlog
0)(2)(
?
Nodes at height h
h=0
h=1
h=2
h
< ceil(n/2) nodes
< ceil(n/4) nodes
< ceil(n/8) nodes
< ceil(n/2h+1) nodes
Running time of BuildHeap2
n
h hhO
nnT
log
0 1)(
2)(
n
h hhnO
log
0 12
1
n
h h
hnO
log
0 2
0 2h h
hnO
nO
0 22
)2/11(
2/1
2h h
h
BuildHeap1 vs. BuildHeap2
Runtime Both O(n) BuildHeap2 may have smaller constants (only n/2 calls)
Memory Both O(n) BuildHeap1 requires an additional array, i.e. 2n memory
Complexity/Ease of implementation
Heap uses
Heapsort Build a heap Call ExtractMax for all the elements O(n log n) running time
Priority queues scheduling tasks: jobs, processes, network traffic A* search algorithm
Other heaps
Other heaps
Other heaps