data structure & algorithm 11 – minimal spanning tree jjcao steal some from prof. yoram moses...

52
Data Structure & Algorithm 11 – Minimal Spanning Tree JJCAO Steal some from Prof. Yoram Moses & Princeton COS 226

Upload: amice-mcbride

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Data Structure & Algorithm

11 – Minimal Spanning Tree

JJCAO

Steal some from Prof. Yoram Moses & Princeton COS 226

2

Weighted Graphs

G =(V,E),wtwt: E → R

wt(G) =

3

Sub-Graphs

Note: G' is not a spanning sub-graph of G

4

Minimum Spanning Tree

• A Subgraph• A tree• Spans G• Of minimal weight

5

MST Origin

Otakar Boruvka (1926).• Electrical Power Company of Western Moravia in

Brno.• Most economical construction of electrical power

network.• Concrete engineering problem is now a

cornerstone problem in combinatorial optimization.

6

MST describes arrangement of nuclei in the epithelium for cancer research

http://www.bccrc.ca/ci/ta01_archlevel.html

7

Normal Consistency

[Hoppe et al. 1992] • Based on angles between unsigned normals• May produce errors on close-by surface sheets

8

MST is fundamental problem with diverse applications

• Network design.– telephone, electrical, hydraulic, TV cable, computer, road

• Approximation algorithms for NP-hard problems.– traveling salesperson problem, Steiner tree

• Indirect applications.– max bottleneck paths– LDPC codes for error correction– image registration with Renyi entropy– learning salient features for real-time face verification– reducing data storage in sequencing amino acids in a protein– model locality of particle interactions in turbulent fluid flows– autoconfig protocol for Ethernet bridging to avoid cycles in a network

• Cluster analysis.

9

Minimum Spanning Tree on Surface of Sphere 5000 Vertices

10

Minimum Spanning Tree

Input: a connected, undirected graph - G, with a weight function on the edges – wt

Goal: find a Minimum-weight Spanning Tree for G

Fact:If all edge weights are distinct, the MST is unique

Brute force: Try all possible spanning trees• problem 1: not so easy to implement• problem 2: far too many of them

Ex: [Cayley, 1889]: V^{V-2} spanning trees on the complete graph on V vertices.

11

Main algorithms of MST

1. Kruskal’s algorithm 2. Prim’s algorithm

Both O(ElgV) using ordinary binary heapsBoth greedy algorithms => Global solution

3. …

12

Two Greedy Algorithms

• Kruskal's algorithm. Consider edges in ascending order of cost. Add the next edge to T unless doing so would create a cycle.

• Prim's algorithm. Start with any vertex s and greedily grow a tree T from s. At each step, add the cheapest edge to T that has exactly one endpoint in T.

Greed is good. Greed is right. Greed works. Greedclarifies, cuts through, and captures the essence of theevolutionary spirit."

- Gordon Gecko

13

Cycle Property• Let T be a minimum spanning tree

of a weighted graph G• Let e be an edge of G that is not in

T and C be the cycle formed by e with T

• For every edge f of C, weight(f) ≤ weight(e)

Proof:• By contradiction• If weight(f) > weight(e) we can

get a spanning tree of smaller weight by replacing e with f

14

Edges cross the cut

15

Cut (/Partition) PropertyLemma:Let G =(V,E) and X ⊂ V.If e = a lightest edge connecting X and V-Xthen e appears in some MST of G.

Proof:• Let T be an MST of G• If T does not contain e, consider the cycle C

formed by e with T and let f be an edge of C across the partition

• By the cycle property, weight(f) ≤ weight(e)

• Thus, weight(f) = weight(e)• We obtain another MST by replacing f with e

locally optimal choice(of lightest edges)

globally optimal solution(MST)

16

Disjoint Set ADT

17

An application of disjoint-set data structures

18

Linked List Implementation

19

Unionin Linked List Implementation

20

21

Worst-Case Example• n: the number of MAKE-SET operations,• m: the total number of MAKE-SET, UNION, and FIND-

SET operations• we can easily construct a sequence of m operations

on n objects that requires (n^2) time

22

Weighted Union Heuristic

• Each set id includes the length of the list• In Union - append shorter list at end of

longer

Theorem: Performing m > n operations takes O(m + nlgn) time

23

Simple Forest Implementation

Find-Set(x) -follow pointersfrom x up to root

Union(c,f) - make c a child of f and return f

24

Worst-Case Examplen

3

2

1

25

Weighted Union Heuristic

• Each node includes a weight fieldweight = # elements in sub-tree rooted at node

• Find-Set(x) - as before O(depth(x))

• Union(x,y) - always attach smaller tree below the root of larger tree O(1)

26

Weighted Union

Theorem:Any k-node tree created using theweighted-union heuristic, has height ≤ lg(k)

Proof: By induction on k

Find-Set Running Time: O(lg n)

27

2nd heuristic: Path Compression

28

The function lg n

lg n = the number of times we have to take the log2 n repeatedly to reach root node

Lg 2 = 1Lg 2^2 = 2Lg 2^16 = lg 65536 = 16

=> Lg n < 16 for all practical values of n

29

Theorem(Tarjan): IfS = a sequence of O(n) Unions and Find-SetsThe worst-case time for S with

– Weighted Unions, and– Path Compressions

is O(nlgn)

The average time is O(lgn) per operation

in Linked List Implementation

30

Theorem(Tarjan): LetS = a sequence of O(n) Unions and Find-SetsThe worst-case time for S with

– Weighted Unions, and– Path Compressions

is O(nα(n))

The average time is O(α(n)) per operation, α(n) < 5 in practice

31

Connected Components usingUnion-Find

Reminder:

• Every node v is connected to itself• if u and v are in the same connected

component then v is connected to u and u is connected to v

• Connected components form a partition of the nodes and so are disjoint:

32

MST-Kruskal

Kruskal's algorithm for minimum spanning tree works by inserting edges in order of increasing cost, adding as edges to the tree those which connect two previously disjoint components.

Kruskal's algorithm on a graph of distances between 128 North American cities

The minimum spanning tree describes the cheapest network to connect all of a given set of vertices

33

Example

34

35

MST-Kruskal

36

MST-Kruskal

37

MST-Kruskal

Running Time:

38

MST-Prim-Jarnik

39

Example

40

MST-Prim-Jarnik

41

MST-Prim

42

MST-Prim

43

MST-Prim

44

Decrease_key(v,x)

We use a min-Heap to hold the edges in G-THow can we implement Decrease key(v,x)?

Simple solution:• Change value for v• Follow strategy for Heap_insert from v

upwards

• Cost: O(lgV)

45

MST-Prim

Running Time:

46

Does a linear-time MST algorithm exist?

47

Euclidean MSTGiven N points in the plane, find MST connecting them, where the distances between point pairs are their Euclidean distances.

Brute force. Compute ~ /2 distances and run Prim's algorithm.Ingenuity. Exploit geometry and do it in ~ c N lg N.

48

Scientific application: clusteringk-clustering. Divide a set of objects classify into k coherent groups.Distance function. Numeric value specifying "closeness" of two objects.

Goal. Divide into clusters so that objects in different clusters are far apart.

Applications.• Routing in mobile ad hoc networks.• Document categorization for web search.• Similarity searching in medical image databases.• Skycat: cluster 109 sky objects into stars, quasars, galaxies.

outbreak of cholera deaths in London in 1850s (Nina Mishra)

49

Single-link clustering

k-clustering. Divide a set of objects classify into k coherent groups.Distance function. Numeric value specifying "closeness" of two objects.

Goal. Divide into clusters so that objects in different clusters are far apart.

Single link. Distance between two clusters equals the distance between the two closest objects (one in each cluster).

Single-link clustering. Given an integer k, find a k-clustering that maximizes the distance between two closest clusters.

50

Single-link clustering algorithm

“Well-known” algorithm for single-link clustering:• Form V clusters of one object each.• Find the closest pair of objects such that each object is

in a different cluster, and merge the two clusters.• Repeat until there are exactly k clusters.

Observation. This is Kruskal's algorithm (stop when k connected components).

Alternate solution. Run Prim's algorithm and delete k-1 max weight edges.

51

DendrogramTree diagram that illustrates arrangement of clusters.

52

Dendrogram of cancers in humanTumors in similar tissues cluster together