applications of datastructures

47
Christopher Moh 2005 Application of Data Structures

Upload: ratsietsi-mokete

Post on 06-May-2015

168 views

Category:

Technology


0 download

DESCRIPTION

Some applications of datastructures

TRANSCRIPT

Page 1: Applications of datastructures

Christopher Moh 2005

Application of Data Structures

Page 2: Applications of datastructures

Christopher Moh 2005

Overview

Priority Queue structures Heaps Application: Dijkstra’s algorithm

Cumulative Sum Data Structures on Intervals

Augmenting data structures with extra info to solve questions

Page 3: Applications of datastructures

Christopher Moh 2005

Priority Queue (PQ) Structures

Stores elements in a list by comparing a key field Often has other satellite data For example, when sorting pixels by

their R value, we consider the R as the key field and GB as satellite data

Priority queues allow us to sort elements by their key field.

Page 4: Applications of datastructures

Christopher Moh 2005

Common PQ operations Create()

Creates an empty priority queue Find_Min()

Returns the smallest element (by key field) Insert(x)

Insert element x (with predefined key field) Delete(x)

Delete position x from the queue Change(x, k)

Change key field of position x to k

Page 5: Applications of datastructures

Christopher Moh 2005

Optional PQ operations

Union (a,b) Combines two PQs a and b

Search (k) Returns the position of the element in

the heap with key value k

Page 6: Applications of datastructures

Christopher Moh 2005

Considerations when implementing a PQ in competition How complicated is it?

Is the code likely to be buggy? How fast does it need to be?

Does a constant factor also come into the equation?

Do I need to store extra data to do a Search? During the course of this presentation, we

shall assume that there exists existing extra data which allows us to do a search in O(1) time. The handling of this data structure will be assumed and not covered.

Page 7: Applications of datastructures

Christopher Moh 2005

Linear Array

Unsorted Array Create, Insert, Change in O(1) time Find_min, Delete in O(n) time

Sorted Array Create, Find_min in O(1) time Insert, Delete, Change in O(n + log n)

= O(n) time

Page 8: Applications of datastructures

Christopher Moh 2005

Binary Heaps Will be the most common structure that

will be implemented in competition setting Efficient for most applications Easy to implement

A heap is a structure where the value of a node is less than the value of all of its children

A binary heap is a heap where the maximum number of children for each node is 2.

Page 9: Applications of datastructures

Christopher Moh 2005

Array implementation Consider a heap of size nheap in an array

BHeap[1..nheap] (Define BHeap[nheap+1 .. (nheap*2)+1] to be INFINITY for practical reasons)

The children of BHeap[x] are BHeap[x*2] and BHeap[x*2+1]

The parent of BHeap[x] are BHeap[x/2] This allows a near uniform Binary Heap where we

can ensure that the number of levels in this heap is O(log n)

Some properties wrt Key values: BHeap[x] >= BHeap[x/2], BHeap[x] <= BHeap[x*2], BHeap[x] <= BHeap[x*2+1], BHeap[x*2] ?? BHeap[x*2+1]

Page 10: Applications of datastructures

Christopher Moh 2005

PQ Operations on a BHeap We define BTree(x) to be the Binary Tree

rooted at BHeap[x] We define Heapify(x) to be an operation that

does the following: Assume: BTree(x*2) and BTree(x*2+1) are binary

heaps but BTree(x) is not necessarily a binary heap Produce: BTree(x) binary heap Details of Heapify in later slides – but for now, we

assume Heapify is O(log n) For the rest of the presentation, we assume

the variable n refers to nheap

Page 11: Applications of datastructures

Christopher Moh 2005

Operations on a BHeap Create is trivial – O(1) time Find_min:

1. Return BHeap[1] O(1) time

Insert (element with key value x)1. nheap++2. BHeap[nheap] = x3. T = nheap4. While (T != 1 && Bheap[T] < BHeap[T/2])

1. Swap (Bheap[T], BHeap[T/2]2. T = T / 2

O(log n) time as the number of levels is O(log n)

Page 12: Applications of datastructures

Christopher Moh 2005

Operations on a BHeap ChangeDown (position x, new key value

k) Assume: k < existing BHeap[x]1. BHeap[x] = k2. T = x3. While (T != 1 && BHeap[T] < BHeap[T/2])

1. Swap (BHeap[T], BHeap[T/2])2. T = T/2

Complexity: O(log n) This procedure is known as “bubbling up”

the heap

Page 13: Applications of datastructures

Christopher Moh 2005

Operations on a BHeap

ChangeUp (position x, new key value k)

Assume: k > existing BHeap[x]1. BHeap[x] = k2. Heapify(x) O(log n) as complexity of Heapify is

O(log n)

Page 14: Applications of datastructures

Christopher Moh 2005

Operations on a BHeap Delete (position x on the heap)

1. BHeap[x] = BHeap[nheap]2. nheap—3. Heapify(x)4. T = x5. While (T != 1 && BHeap[T] < BHeap[T/2])

1. Swap (BHeap[T], BHeap[T/2])2. T = T / 2

Complexity is O(log n) Why must I do both Heapify and “bubble

up”?

Page 15: Applications of datastructures

Christopher Moh 2005

Operations on a BHeap Heapify (position x on the heap)

1. T = min(BHeap[x], BHeap[x*2], BHeap[x*2+1])

2. If (T == BHeap[x]) return;3. K = position where BHeap[K] = T4. Swap(BHeap[x], BHeap[K])5. Heapify(K) O(log n) as the maximum number of levels

in the heap is O(log n) and Heapify only goes through each level at most once

Page 16: Applications of datastructures

Christopher Moh 2005

BHeap Operations: Summary

Create, Find_min in O(1) time Change (includes both ChangeUp

and ChangeDown), Insert, and Delete are O(log n) time

Union operations are how long? Insertion: O(n log n) union Heapify: O(n) union

Page 17: Applications of datastructures

Christopher Moh 2005

Corollary: Heapsort We can convert an unsorted array to a

heap using Heapify (why does this work?):

1. For (i = n/2; i >= 1; i--)1. Heapify(i)

We can then return a sorted list (list initially empty):

1. For (i = 1; i <= n; i++)1. Append the value of find_min to the list2. Delete(1)

Complexity is O(n log n)

Page 18: Applications of datastructures

Christopher Moh 2005

Binomial Trees Define Binomial Tree B(k) as follows:

B(0) is a single node B(n), n != 0, is formed by merging two B(n-

1) trees in the following way: The root of the B(n) tree is the root of one of the

B(n-1) trees, and the (new) leftmost child of this root is the root of the other B(n-1) tree.

Within the tree, the heap property holds i.e. that the key field of any node is greater than the key field of all its children.

Page 19: Applications of datastructures

Christopher Moh 2005

Properties of Binomial Trees

The number of nodes in B(k) is exactly 2^k.

The height of B(k) is exactly (k + 1) For any tree B(k)

The root of B(k) has exactly k children If we take the children of B(k) from left

to right, they form the roots of a B(k-1), B(k-2), …, B(0) tree in that order

Page 20: Applications of datastructures

Christopher Moh 2005

Binomial Heaps Binomial Heaps are a forest of binomial trees

with the following properties: All the binomial trees are of different sizes The binomial trees are ordered (from left to right) by

increasing size If we consider the fact that the size of B(k) is

2^k, the binomial tree B(k) exists in a binomial heap of n nodes iff the bit representing 2^k is “1” in the binary representation of n

For example: 13 (decimal) = 1101 (binary), so the binomial heap with 13 nodes consists of the binomial trees B(0), B(2), and B(3).

Page 21: Applications of datastructures

Christopher Moh 2005

Binomial Heap Implementation Each node will store the following data:

Key field Pointers (if non-existent, points to NIL) to

Parent Next Sibling (ordered left to right; a sibling must have

the same parent); For roots of binomial trees, next sibling points to the root of the next binomial tree

Leftmost child Number of children in field degree Any other data that might be useful for the program

The binomial heap is represented by a head pointer that points to the root of the smallest binomial tree (which is the leftmost binomial tree)

Page 22: Applications of datastructures

Christopher Moh 2005

Operations on Binomial Trees Link (h1, h2)

Links two binomial trees with root h1 and h2 of the same order k to form a new binomial tree of order (k+1)

We assume h1->key < h2->key which implies that h1 is the root of the new tree

1. T = h1->leftchild2. h1->leftchild = h23. h2->parent = h14. H2->next_sibling= T O(1) time

Page 23: Applications of datastructures

Christopher Moh 2005

Operations on binomial heaps Create – Create a new binomial heap with

one node (key field set) Set Parent, Leftchild, Next sibling to NIL O(1) time

Find_min1. X = head, min = INFINITY2. While (X != nil)

1. If (X->key < min) min = X->key2. X = X->next_sibling

3. Return min O(log n) time as there are at most log n binomial

trees (log n bits)

Page 24: Applications of datastructures

Christopher Moh 2005

More Operations

Merge (h1, h2, L) Given binomial heaps with head

pointers h1 and h2, create a list L of all the binomial trees of h1 U h2 arranged in ascending order of size

For any order k, there may be zero, one, or two binomial trees of order k in this list.

Page 25: Applications of datastructures

Christopher Moh 2005

More Operations Merge (h1, h2, L)

Assume that NIL is a node of infinitely small order

1. L = empty2. While (h1 != NIL || h2 != NIL)

1. If (h1->degree < h2->degree)1. Append the (binomial)tree with root h1 to L2. h1 = h1->next_sibling

2. Else1. Apply above steps to h2 instead

Page 26: Applications of datastructures

Christopher Moh 2005

More Operations

Union (h1, h2) The fundamental operation involving

binomial heaps Takes two binomial heaps with head

pointers h1 and h2 and creates a new binomial heap of the union of h1 and h2

Page 27: Applications of datastructures

Christopher Moh 2005

More Operations Union (h1, h2)

1. Start with empty binomial heap2. Merge (h1, h2, L)3. Go by increasing k in the list L until L is

empty1. If there is exactly one or exactly three (how can

this happen?) binomial trees of order k in L, append one binomial tree of order k to the binomial heap and remove that tree from L

2. If there are two trees of order k, remove both trees, use Link to form a tree of order (k+1) and pre-pend this tree to L

Union is O(log n)

Page 28: Applications of datastructures

Christopher Moh 2005

More Operations Inserting a new node with key field set

Create a new binomial heap with that one node Union (existing heap with head h, new heap) O (log n) time

ChangeDown (node at position x, new value) Decreasing the key value of a node Same idea as binary heap: “Bubble” up the

binomial tree containing this node (exchange only key fields and satellite data! What’s the complexity if you physically change the node?)

O (log n) time

Page 29: Applications of datastructures

Christopher Moh 2005

More Operations Delete (node at position x)

Deleting position x from the heap1. ChangeDown(x, -INFINITY) Now x is at the root of its binomial tree Supposing that the binomial tree is of order k Recall that the children of the root of the binomial

tree, from right to left, are binomial trees of order 0, 1, 2, 3, 4, …, k-1

2. Form a new binomial heap with the children of the root of this binomial tree the roots in the new binomial heap

3. Remove the original binomial tree from the original binomial heap

4. Union (original heap, new heap) O(log n) complexity

Page 30: Applications of datastructures

Christopher Moh 2005

More Operations

ChangeUp (node at position X, new value)

1. Delete (X)2. Insert (new value) O (log n) time

Page 31: Applications of datastructures

Christopher Moh 2005

Summary – Binomial Heaps Create in O(1) time Union, Find_min, Delete, Insert, and

Change operations take O(log n) time In general, because they are more

complicated, in competition it is far more prudent (saves time coding and debugging) to use a binary heap instead Unless there are MANY Union operations

Page 32: Applications of datastructures

Christopher Moh 2005

Application of heaps: Dijkstra The following describes how Dijkstra’s

algorithm can be coded with a binary heap

Initializing phase:1. Let n be the number of nodes2. Create a heap of size n, all key fields

initialized to INFINITY3. Change_val (s, 0) where s is the source

node

Page 33: Applications of datastructures

Christopher Moh 2005

Running of Dijkstra’s algorithm

1. While (heap is not empty)1. X = node corresponding to find_min

value2. Delete (position of X in heap = 1)3. For all nodes k that are adjacent to

X1. If (cost[X] + distance[X][k] < cost[k])

1. ChangeDown (position of k in heap, cost[X] + distance[X][k])

Page 34: Applications of datastructures

Christopher Moh 2005

Analysis of running time At most n nodes are deleted

O(n log n) Let m be the number of edges. Each

edge is relaxed at most once. O(m log n)

Total running time O([m+n] log n) This is faster than using a basic array

list unless the graph is very dense, in which case m is about O(n^2) which leads to a running time of O(n^2 log n)

Page 35: Applications of datastructures

Christopher Moh 2005

Cumulative Sum on Intervals Problem: We have a line that runs from

x coordinate 1 to x coordinate N. At x coordinate X [X an integer between 0 and N], there is g(X) gold. Given an interval [a,b], how much gold is there between a and b?

How efficiently can this be done if we dynamically change the amount of gold and the interval [a,b] keeps changing?

Page 36: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Array Let us define C(0) = 0, and C(x) = C(x-1) +

g(x) where g(x) is the amount of gold at position x

C(x) then defines the total amount of gold from position 1 to position x

The amount of gold in interval [a,b] is simply C(b) – C(a-1)

For any change in a or b, we can perform the update in O(1) time

However, if we change g(x), we will have to change C(x), C(x+1), C(x+2), …, C(N)

Any change in gold results in an update in O(N) time

Page 37: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Tree We can use the binary representation of any

number to come up with a cumulative sum tree

For example, let say we take 13 (decimal) = 1101 (binary)

The cumulative sum of g(1) + g(2) + … g(13) can be represented as the sum of:

g(1) + g(2) + … + g(8) [ 8 elements ] g(9) + g(10) + … + g(12) [ 4 elements ] g(13) [ 1 element ]

Notice that the number of elements in each case represents a bit that is “1” in the binary representation of the number

Page 38: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Tree

Another example: C(19) 19 (decimal) is 10011 (binary)

C(19) is the sum of the following: g(1) + g(2) + … + g(16) [ 16 elements ] g(17) + g(18) [ 2 elements ] g(19) [ 1 element ]

Page 39: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Tree Let us define C2(x) to be the sum of g(x) +

g(x-1) + … + g(p + 1) where p is a number with the same binary representation as x except the least significant bit of x (the rightmost bit of x that is “1”) is “0”

Examples of x and the corresponding p: x = 6 [110], p = 4 [100] x = 13 [1101], p = 12 [1100] x = 16 [10000], p = 0 [00000]

Page 40: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Tree If we want to find the cumulative sum C(x) =

g(1) + g(2) + … + g(x), we can trace through the values of C2 using the binary representation of x

Examples: C(13) = C2(8) + C2(8+4) + C2(8+4+1) C(16) = C2(16) C(21) = C2(16) + C2(16+4) + C2(16+4+1) C(99) = C2(64) + C2(64+32) + C2(64+32+2) +

C2(64+32+2+1) This allows us to find C(x) in log x time

Hence the amount of gold in interval [a,b] = C(b) – C(a-1) can be found in log N time, which implies updates of a and b can be done in O(log N)

Page 41: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Tree What happens when we change g(x)?

If g(x) is changed, we only need to update C2(y) where C2(y) covers g(x)

We can go through all necessary C2(y) in the following way:

1. While (x <= N)1. Update C2(x)2. Add the value of the least significant bit of x to x

This runs in O(log N) time Hence updates to g can also be done in

O(log n) time, which is a great improvement over the O(N) needed for an array.

Page 42: Applications of datastructures

Christopher Moh 2005

Cumulative Sum Tree Examples [binary representation in brackets]

Change to g(5) [ 101 ] : Update C2(5), C2(6), C2(8), C2(16) and all C2(power of 2 > 16)

Change to g(13) [ 1101 ]: Update C2(13), C2(14), C2(16), and all C2(power of 2 > 16)

Change to g(35) [ 100011 ]: Update C2(35), C2(36), C2(40), C2(48), C2(64), and all C2(power of 2 > 64)

We can implement a cumulative sum tree very simply: By simply using a linear array to store the values of C2.

Can we extend a cumulative sum tree to 2 or more dimensions?

See IOI 2001 Day 1 Question 1

Page 43: Applications of datastructures

Christopher Moh 2005

Sum of Intervals Tree Another way to solve the question is to use a

“Sum of Intervals” Binary Tree Each node in the tree is represented by (L, R)

and the value of (L,R) is g(L) + g(L+1) + … + g(R)

The root of the tree has L = 1 and R = N Every leaf has L = R Every non-leaf has children (L, [L+R]/2) [left

child] and ([L+R]/2+1, R) [right child] The number of nodes in the tree is O(2*N)

[ why? ] In an implementation, every node should have

pointers to its children and its parent

Page 44: Applications of datastructures

Christopher Moh 2005

Sum of Intervals Tree How to find C(x) = g(1) + g(2) + … +

g(x)? We trace from the root downwards1. L = 1, R = N, C = 02. While (L != R)

1. M = (L + R) / 22. If (M < x)

1. C += value of (L,R)2. Set L and R to the left child of the current node

3. Else1. Set L and R to the right child of the current node

3. C += value at (L,R) [ or (L,L) or (R,R) as L = R ] Time complexity: O(log n)

Page 45: Applications of datastructures

Christopher Moh 2005

Sum of Intervals Tree What happens when g(x) is changed?

Trace from (x,x) upwards to the root1. Let L = R = x2. While (L,R) is not the root

1. Update the value of (L,R)2. Set (L,R) to the parent of (L,R)

3. Update the root Complexity of O(log N) Hence all updates of interval [a,b] and

g(x) can be done in O(log N) time

Page 46: Applications of datastructures

Christopher Moh 2005

Augmenting Data Structures It is often useful to change the data

structure in some way, by adding additional data in each node or changing what each node represents.

This allows us to use the same data structure to solve problems

For example, we can use so-called “interval trees” to solve not just cumulative sum problems We can use properties of elements in the

interval (L,R) that are related to L and R.

Page 47: Applications of datastructures

Christopher Moh 2005

Other data structures

Balanced (and unbalanced) binary trees Red-Black trees 2-3-4 trees Splay trees

Suffix Trees Fibonacci Heaps