cse 326: data structures part 10 advanced data structures

69
1 CSE 326: Data Structures Part 10 Advanced Data Structures Henry Kautz Autumn Quarter 2002

Upload: myra-ellison

Post on 31-Dec-2015

58 views

Category:

Documents


0 download

DESCRIPTION

CSE 326: Data Structures Part 10 Advanced Data Structures. Henry Kautz Autumn Quarter 2002. Outline. Multidimensional search trees Range Queries k -D Trees Quad Trees Randomized Data Structures & Algorithms Treaps Primality testing Local search for NP-complete problems. 5,2. 2,5. 8,4. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE 326: Data Structures Part 10 Advanced Data Structures

1

CSE 326: Data StructuresPart 10

Advanced Data Structures

Henry Kautz

Autumn Quarter 2002

Page 2: CSE 326: Data Structures Part 10 Advanced Data Structures

2

Outline

• Multidimensional search trees– Range Queries

– k-D Trees

– Quad Trees

• Randomized Data Structures & Algorithms– Treaps

– Primality testing

– Local search for NP-complete problems

Page 3: CSE 326: Data Structures Part 10 Advanced Data Structures

3

Multi-D Search ADT• Dictionary operations

– create

– destroy

– find

– insert

– delete

– range queries

• Each item has k keys for a k-dimensional search tree• Searches can be performed on one, some, or all the keys

or on ranges of the keys

9,13,64,2

5,78,21,94,4

8,42,5

5,2

Page 4: CSE 326: Data Structures Part 10 Advanced Data Structures

4

Applications of Multi-D Search

• Astronomy (simulation of galaxies) - 3 dimensions• Protein folding in molecular biology - 3 dimensions• Lossy data compression - 4 to 64 dimensions• Image processing - 2 dimensions• Graphics - 2 or 3 dimensions• Animation - 3 to 4 dimensions• Geographical databases - 2 or 3 dimensions• Web searching - 200 or more dimensions

Page 5: CSE 326: Data Structures Part 10 Advanced Data Structures

5

Range Query

A range query is a search in a dictionary in which the exact key may not be entirely specified.

Range queries are the primary interface

with multi-D data structures.

Page 6: CSE 326: Data Structures Part 10 Advanced Data Structures

6

Range Query Examples:Two Dimensions

• Search for items based on just one key

• Search for items based on ranges for all keys

• Search for items based on a function of several keys: e.g., a circular range query

Page 7: CSE 326: Data Structures Part 10 Advanced Data Structures

7

x

Range Querying in 1-DFind everything in the rectangle…

Page 8: CSE 326: Data Structures Part 10 Advanced Data Structures

8

x

Range Querying in 1-D with a BSTFind everything in the rectangle…

Page 9: CSE 326: Data Structures Part 10 Advanced Data Structures

9x

y

1-D Range Querying in 2-D

Page 10: CSE 326: Data Structures Part 10 Advanced Data Structures

10x

y

2-D Range Querying in 2-D

Page 11: CSE 326: Data Structures Part 10 Advanced Data Structures

11

k-D Trees

• Split on the next dimension at each succeeding level• If building in batch, choose the median along the

current dimension at each level– guarantees logarithmic height and balanced tree

• In general, add as in a BSTk-D tree node

dimension

left right

keys value The dimension thatthis node splits on

Page 12: CSE 326: Data Structures Part 10 Advanced Data Structures

12

Find in a k-D Treefind(<x1,x2, …, xk>, root) finds the node which

has the given set of keys in it or returns null if there is no such nodeNode find(keyVector keys,

Node root) {

int dim = root.dimension;

if (root == NULL)

return NULL;

else if (root.keys == keys)

return root;

else if (keys[dim] < root.keys[dim])

return find(keys, root.left);

else

return find(keys, root.right);

}

runtime:

Page 13: CSE 326: Data Structures Part 10 Advanced Data Structures

13

Find Examplefind(<3,6>)find(<0,10>)

5,78,21,94,4

8,42,5

5,2

9,13,64,2

Page 14: CSE 326: Data Structures Part 10 Advanced Data Structures

14x

Building a 2-D Tree (1/4)y

Page 15: CSE 326: Data Structures Part 10 Advanced Data Structures

15x

y

Building a 2-D Tree (2/4)

Page 16: CSE 326: Data Structures Part 10 Advanced Data Structures

16x

y

Building a 2-D Tree (3/4)

Page 17: CSE 326: Data Structures Part 10 Advanced Data Structures

17x

y

Building a 2-D Tree (4/4)

Page 18: CSE 326: Data Structures Part 10 Advanced Data Structures

18

k-D Tree

ac

ih

m

d

e

f

b

jk

g

l

ldkf

hg

e

cj i mb a

Page 19: CSE 326: Data Structures Part 10 Advanced Data Structures

19

x

y

2-D Range Querying in 2-D Trees

Search every partition that intersects the rectangle. Check whether each node (including leaves) falls into the range.

Page 20: CSE 326: Data Structures Part 10 Advanced Data Structures

20

Range Query in a 2-D Tree

runtime: O(N)

print_range(int xlow, xhigh, ylow, yhigh, Node root) {

if (root == NULL) return;

if ( xlow <= root.x && root.x <= xhigh &&

ylow <= root.y && root.y <= yhigh ){

print(root);

if ((root.dim == “x” && xlow <= root.x ) ||

(root.dim == “y” && ylow <= root.y ))print_range(root.left);

if ((root.dim == “x” && root.x <= xhigh) ||

(root.dim == “y” && root.y <= yhigh)print_range(root.right);

}

Page 21: CSE 326: Data Structures Part 10 Advanced Data Structures

21

Range Query in a k-D Tree

runtime: O(N)

print_range(int low[MAXD], high[MAXD], Node root) {

if (root == NULL) return;

inrange = true;

for (i=0; i<MAXD;i++){

if ( root.coord[i] < low[i] ) inrange = false;

if ( high[i] < root.coord[i] ) inrange = false; }

if (inrange) print(root);

if ((low[root.dim] <= root.coord[root.dim] ) print_range(root.left);

if (root.coord[root.dim] <= high[root.dim])

print_range(root.right);

}

Page 22: CSE 326: Data Structures Part 10 Advanced Data Structures

22

x

y

Other Shapes for Range Querying

Search every partition that intersects the shape (circle). Check whether each node (including leaves) falls into the shape.

Page 23: CSE 326: Data Structures Part 10 Advanced Data Structures

23

k-D Trees Can Be Inefficient(but not when built in batch!)

insert(<5,0>)

insert(<6,9>)

insert(<9,3>)

insert(<6,5>)

insert(<7,7>)

insert(<8,6>)

6,9

5,0

6,5

9,3

8,6

7,7

suck factor:

Page 24: CSE 326: Data Structures Part 10 Advanced Data Structures

24

k-D Trees Can Be Inefficient(but not when built in batch!)

insert(<5,0>)

insert(<6,9>)

insert(<9,3>)

insert(<6,5>)

insert(<7,7>)

insert(<8,6>)

6,9

5,0

6,5

9,3

8,6

7,7

suck factor: O(n)

Page 25: CSE 326: Data Structures Part 10 Advanced Data Structures

25

Quad Trees

• Split on all (two) dimensions at each level• Split key space into equal size partitions (quadrants)• Add a new node by adding to a leaf, and, if the leaf is

already occupied, split until only one node per leafquad tree node

Quadrants:

0,1 1,1

0,0 1,0

quadrant

0,01,0 0,11,1

keys value

Center

x yCenter:

Page 26: CSE 326: Data Structures Part 10 Advanced Data Structures

26

Find in a Quad Treefind(<x, y>, root) finds the node which has the

given pair of keys in it or returns quadrant where the point should be if there is no such node

Node find(Key x, Key y, Node root) {

if (root == NULL)

return NULL; // Empty tree

if (root.isLeaf())

return root; // Key may not actually be here

int quad = getQuadrant(x, y, root);

return find(x, y, root.quadrants[quad]);

}

runtime: O(depth)

Compares against center; always makes the same choice on ties.

Page 27: CSE 326: Data Structures Part 10 Advanced Data Structures

27

Find Example

a

g

b

ef

d

cga

fed

cb

find(<10,2>) (i.e., c)find(<5,6>) (i.e., d)

Page 28: CSE 326: Data Structures Part 10 Advanced Data Structures

28x

Building a Quad Tree (1/5)y

Page 29: CSE 326: Data Structures Part 10 Advanced Data Structures

29x

Building a Quad Tree (2/5)y

Page 30: CSE 326: Data Structures Part 10 Advanced Data Structures

30x

Building a Quad Tree (3/5)y

Page 31: CSE 326: Data Structures Part 10 Advanced Data Structures

31x

Building a Quad Tree (4/5)y

Page 32: CSE 326: Data Structures Part 10 Advanced Data Structures

32x

Building a Quad Tree (5/5)y

Page 33: CSE 326: Data Structures Part 10 Advanced Data Structures

33

Quad Tree Example

a

g

b

ef

d

cga

fed

cb

Page 34: CSE 326: Data Structures Part 10 Advanced Data Structures

34

Quad Trees Can Suck

b

a

suck factor:

Page 35: CSE 326: Data Structures Part 10 Advanced Data Structures

35

Quad Trees Can Suck

b

a

suck factor: O(log (1/minimum distance between nodes))

Page 36: CSE 326: Data Structures Part 10 Advanced Data Structures

36x

2-D Range Querying in Quad Trees

y

Page 37: CSE 326: Data Structures Part 10 Advanced Data Structures

37

2-D Range Query in a Quad Treeprint_range(int xlow, xhigh, ylow, yhigh, Node root){

if (root == NULL) return;

if ( xlow <= root.x && root.x <= xhigh &&

ylow <= root.y && root.y <= yhigh ){

print(root);

if (xlow <= root.x && ylow <= root.y)

print_range(root.lower_left);

if (xlow <= root.x && root.y <= yhigh)

print_range(root.upper_left);

if (root.x <= x.high && ylow <= root.x)

print_range(root.lower_right);

if (root.x <= xhigh && root.y <= yhigh)

print_range(root.upper_right);

}runtime: O(N)

Page 38: CSE 326: Data Structures Part 10 Advanced Data Structures

38

Find in a Quad Treefind(<x, y>, root) finds the node which has the

given pair of keys in it or returns quadrant where the point should be if there is no such node

Node find(Key x, Key y, Node root) {

if (root == NULL)

return NULL; // Empty tree

if (root.isLeaf())

return root; // Key may not actually be here

int quad = getQuadrant(x, y, root);

return find(x, y, root.quadrants[quad]);

}

runtime: O(depth)

Compares against center; always makes the same choice on ties.

Page 39: CSE 326: Data Structures Part 10 Advanced Data Structures

39

Delete Example

a

g

b

ef

d

c

ga

fed

cb

delete(<10,2>)(i.e., c)

• Find and delete the node.• If its parent has just one child, delete it.• Propagate!

Page 40: CSE 326: Data Structures Part 10 Advanced Data Structures

40

Nearest Neighbor Search

ga

fed

cb

getNearestNeighbor(<1,4>)

g

b

f

d

c

• Find a nearby node (do a find).• Do a circular range query.• As you get results, tighten the circle.• Continue until no closer node in query.

a

e

Works on k-D Trees, too!

Page 41: CSE 326: Data Structures Part 10 Advanced Data Structures

41

Quad Trees vs. k-D Trees

• k-D Trees– Density balanced trees

– Number of nodes is O(n) where n is the number of points

– Height of the tree is O(log n) with batch insertion

– Supports insert, find, nearest neighbor, range queries

• Quad Trees– Number of nodes is O(n(1+ log(/n))) where n is the number of points and

is the ratio of the width (or height) of the key space and the smallest distance between two points

– Height of the tree is O(log n + log )

– Supports insert, delete, find, nearest neighbor, range queries

Page 42: CSE 326: Data Structures Part 10 Advanced Data Structures

42

To Do

• Read (a little) about k-D trees in Weiss 12.6

Page 43: CSE 326: Data Structures Part 10 Advanced Data Structures

43

CSE 326: Data StructuresPart 10, continued

Data StructuresHenry Kautz

Autumn Quarter 2002

Page 44: CSE 326: Data Structures Part 10 Advanced Data Structures

44

Pick a Card

Warning! The Queen of Spades is a very unlucky card!

Page 45: CSE 326: Data Structures Part 10 Advanced Data Structures

45

Randomized Data Structures

• We’ve seen many data structures with good average case performance on random inputs, but bad behavior on particular inputs– Binary Search Trees

• Instead of randomizing the input (since we cannot!), consider randomizing the data structure– No bad inputs, just unlucky random numbers

– Expected case good behavior on any input

Page 46: CSE 326: Data Structures Part 10 Advanced Data Structures

46

What’s the Difference?

• Deterministic with good average time– If your application happens to always use the “bad” case,

you are in big trouble!

• Randomized with good expected time– Once in a while you will have an expensive operation, but

no inputs can make this happen all the time

• Kind of like an insurance policy for your algorithm!

Page 47: CSE 326: Data Structures Part 10 Advanced Data Structures

47

Treap Dictionary Data Structure

• Treaps have the binary search tree– binary tree property

– search tree property

• Treaps also have the heap-order property!– randomly assigned

priorities

1512

1030

915

78

418

67

29

heap in yellow; search tree in blue

prioritykey

Legend:

Page 48: CSE 326: Data Structures Part 10 Advanced Data Structures

48

Treap Insert• Choose a random priority• Insert as in normal BST• Rotate up until heap order is restored (maintaining

BST property while rotating)

67

insert(15)

78

29

1412

67

78

29

1412

915

67

78

29

915

1412

Page 49: CSE 326: Data Structures Part 10 Advanced Data Structures

49

Tree + Heap… Why Bother?

Insert data in sorted order into a treap; what shape tree comes out?

67

insert(7)

67

insert(8)

78

67

insert(9)

78

29

67

insert(12)

78

29

1512

prioritykey

Legend:

Page 50: CSE 326: Data Structures Part 10 Advanced Data Structures

50

Treap Delete• Find the key• Increase its value to • Rotate it to the fringe• Snip it off

delete(9)

67

78

29

915

1512

78

67

9

915

1512

rotate left

78

67

9

91515

12

rotate left

rotate right

Page 51: CSE 326: Data Structures Part 10 Advanced Data Structures

51

Treap Delete, cont.

78

67

9

91515

12

rotate right

78

67

9

915

1512

rotate right

78

67

9

915

1512

snip!

Page 52: CSE 326: Data Structures Part 10 Advanced Data Structures

52

Treap Summary

• Implements Dictionary ADT– insert in expected O(log n) time

– delete in expected O(log n) time

– find in expected O(log n) time

– but worst case O(n)

• Memory use– O(1) per node

– about the cost of AVL trees

• Very simple to implement, little overhead – less than AVL trees

Page 53: CSE 326: Data Structures Part 10 Advanced Data Structures

53

Other Randomized Data Structures & Algorithms

• Randomized skip list– cross between a linked list and a binary search tree

– O(log n) expected time for finds, and then can simply follow links to do range queries

• Randomized QuickSort– just choose pivot position randomly

– expected O(n log n) time for any input

Page 54: CSE 326: Data Structures Part 10 Advanced Data Structures

54

Randomized Primality Testing

• No known polynomial time algorithm for primality testing

– but does not appear to be NP-complete either – in between?

• Best known algorithm:1. Guess a random number 0 < A < N

2. If (AN-1 % N) 1, then N is not prime

3. Otherwise, 75% chance N is prime– or is a “Carmichael number” – a slightly more complex test

rules out this case

4. Repeat to increase confidence in the answer

Page 55: CSE 326: Data Structures Part 10 Advanced Data Structures

55

Randomized Search Algorithms

• Finding a goal node in very, very large graphs using DFS, BFS, and even A* (using known heuristic functions) is often too slow

• Alternative: random walk through the graph

Page 56: CSE 326: Data Structures Part 10 Advanced Data Structures

56

N-Queens Problem• Place N queens on an N by N chessboard so that

no two queens can attack each other• Graph search formulation:

– Each way of placing from 0 to N queens on the chessboard is a vertex

– Edge between vertices that differ by adding or removing one queen

– Start vertex: empty board– Goal vertex: any one with N non-attacking queens

(there are many such goals)

• Demo

Page 57: CSE 326: Data Structures Part 10 Advanced Data Structures

57

Random Walk – Complexity?

• Random walk – also known as an “absorbing Markov chain”, “simulated annealing”, the “Metropolis algorithm” (Metropolis 1958)

• Can often prove that if you run long enough will reach a goal state – but may take exponential time

• In some cases can prove that with high probability a goal is reached in polynomial time– e.g., 2-SAT, Papadimitriou 1997

• Widely used for real-world problems where actual complexity is unknown – scheduling, optimization

Page 58: CSE 326: Data Structures Part 10 Advanced Data Structures

58

Traveling Salesman

Recall the Traveling Salesperson (TSP) Problem: Given a fully connected, weighted graph G = (V,E), is there a cycle that visits all vertices exactly once and has total cost K?– NP-complete: reduction from Hamiltonian circuit

• Occurs in many real-world transportation and design problems

• Randomized simulated annealing algorithm demo

Page 59: CSE 326: Data Structures Part 10 Advanced Data Structures

59

Latin Squares• Randomization can be combined with depth first

search• When a branch of the search terminates without

finding a solution, algorithm backs up to the last choice point: backtracking search

• Instead of make choice of branch to follow systematically, make it randomly– If your random choices are unlucky, give up and start

over again

• Demo

Page 60: CSE 326: Data Structures Part 10 Advanced Data Structures

60

Final Review

(“We’ve covered way too much in this course…

What do I really need to know?”)

Page 61: CSE 326: Data Structures Part 10 Advanced Data Structures

61

Be Sure to Bring

• 1 page of notes

• A hand calculator

• Several #2 pencils

Page 62: CSE 326: Data Structures Part 10 Advanced Data Structures

62

Final Review: What you need to know

• Basic Math– Logs, exponents, summation of series– Proof by induction

• Asymptotic Analysis– Big-oh, Theta and Omega– Know the definitions and how to show f(N) is

big-O/Theta/Omega of (g(N))– How to estimate Running Time of code fragments

• E.g. nested “for” loops

• Recurrence Relations– Deriving recurrence relation for run time of a recursive

function– Solving recurrence relations by expansion to get run time

N

i

NNi

1 2

)1(

1

11

0

A

AA

NN

i

i

Page 63: CSE 326: Data Structures Part 10 Advanced Data Structures

63

• Lists, Stacks, Queues– Brush up on ADT operations – Insert/Delete, Push/Pop etc.

– Array versus pointer implementations of each data structure

– Amortized complexity of stretchy arrays

• Trees– Definitions/Terminology: root, parent, child, height, depth

etc.

– Relationship between depth and size of tree• Depth can be between O(log N) and O(N) for N nodes

Final Review: What you need to know

Page 64: CSE 326: Data Structures Part 10 Advanced Data Structures

64

• Binary Search Trees– How to do Find, Insert, Delete

• Bad worst case performance – could take up to O(N) time

– AVL trees• Balance factor is +1, 0, -1• Know single and double rotations to keep tree balanced• All operations are O(log N) worst case time

– Splay trees – good amortized performance• A single operation may take O(N) time but in a sequence of

operations, average time per operation is O(log N)• Every Find, Insert, Delete causes accessed node to be moved to the

root• Know how to zig-zig, zig-zag, etc. to “bubble” node to top

Final Review: What you need to know

Page 65: CSE 326: Data Structures Part 10 Advanced Data Structures

65

• Priority Queues– Binary Heaps: Insert/DeleteMin, Percolate up/down

• Array implementation• BuildHeap takes only O(N) time (used in heapsort)

– Binomial Queues: Forest of binomial trees with heap order• Merge is fast – O(log N) time• Insert and DeleteMin based on Merge

• Hashing– Hash functions based on the mod function– Collision resolution strategies

• Chaining, Linear and Quadratic probing, Double Hashing– Load factor of a hash table

Final Review: What you need to know

Page 66: CSE 326: Data Structures Part 10 Advanced Data Structures

66

• Sorting Algorithms: Know run times and how they work– Elementary sorting algorithms and their run time

• Selection sort

– Heapsort – based on binary heaps (max-heaps)• BuildHeap and repeated DeleteMax’s

– Mergesort – recursive divide-and-conquer, uses extra array

– Quicksort – recursive divide-and-conquer, Partition in-place• fastest in practice, but O(N2) worst case time

• Pivot selection – median-of-three works best

– Know which of these are stable and in-place

– Lower bound on sorting, bucket sort, and radix sort

Final Review: What you need to know

Page 67: CSE 326: Data Structures Part 10 Advanced Data Structures

67

• Disjoint Sets and Union-Find– Up-trees and their array-based implementation– Know how Union-by-size and Path compression work– No need to know run time analysis – just know the result:

• Sequence of M operations with Union-by-size and P.C. is (M (M,N)) – just a little more than (1) amortized time per op

• Graph Algorithms– Adjacency matrix versus adjacency list representation of

graphs– Know how to Topological sort in O(|V| + |E|) time using a

queue– Breadth First Search (BFS) for unweighted shortest path

Final Review: What you need to know

Page 68: CSE 326: Data Structures Part 10 Advanced Data Structures

68

Final Review: What you need to know

• Graph Algorithms (cont.)– Dijkstra’s shortest path algorithm – Depth First Search (DFS) and Iterated DFS

• Use of memory compared to BFS– A* - relation of g(n) and h(n)– Minimum Spanning trees – Kruskal’s & Prim’s algorithms– Connected components using DFS or union/find

• NP-completeness– Euler versus Hamiltonian circuits – Definition of P, NP, NP-complete– How one problem can be “reduced” to another (e.g. input to HC

can be transformed into input for TSP)

Page 69: CSE 326: Data Structures Part 10 Advanced Data Structures

69

Final Review: What you need to know

• Multidimensional Search Trees– k-d Trees – find and range queries

• Depth logarithmic in number of nodes– Quad trees – find and range queries

• Depth logarithmic in inverse of minimal distance between nodes

• But higher branching fractor means shorter depth if points are well spread out (log base 4 instead of log base 2)

• Randomized Algorithms– expected time vs. average time vs. amortized time– Treaps, randomized Quicksort, primality testing