minimum spanning trees and clustering by swee-ling tang april 20, 2010 04/20/20101

27
Minimum Spanning Trees and Clustering By Swee-Ling Tang April 20, 2010 04/20/2010 1

Upload: marilyn-jones

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Minimum Spanning Trees and Clustering

By Swee-Ling TangApril 20, 2010

04/20/2010 1

Spanning Tree

• A spanning tree of a graph is a tree and is a subgraph that contains all the vertices.

• A graph may have many spanning trees; for example, the complete graph on four vertices has sixteen spanning trees:

04/20/2010 2

Spanning Tree – cont.

04/20/2010 3

Minimum Spanning Trees

• Suppose that the edges of the graph have weights or lengths. The weight of a tree will be the sum of weights of its edges.

• Based on the example, we can see that different trees have different lengths.

• The question is: how to find the minimum length spanning tree?

04/20/2010 4

Minimum Spanning Trees

• The question can be solved by many different algorithms, here is three classical minimum-spanning tree algorithms :– Boruvka's Algorithm– Kruskal's Algorithm– Prim's Algorithm

04/20/2010 5

04/20/2010 6

Kruskal's Algorithm

• Joseph Bernard Kruskal, Jr• Kruskal Approach:

– Select the minimum weight edge that does not form a cycle

Kruskal's Algorithm:

sort the edges of G in increasing order by length

keep a subgraph S of G, initially emptyfor each edge e in sorted order if the endpoints of e are disconnected

in S add e to S return S

Kruskal's Algorithm - Example

04/20/2010 7

Kruskal's Algorithm - Example

04/20/2010 8

04/20/2010 9

Prim’s Algorithm

• Robert Clay Prim• Prim Approach:– Choose an arbitrary start node v– At any point in time, we have connected

component N containing v and other nodes V-N– Choose the minimum weight edge from N to V-NPrim's Algorithm:

let T be a single vertex xwhile (T has fewer than n vertices){

find the smallest edge connecting T to G-Tadd it to T

}

04/20/2010 10

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 11

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 12

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 13

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 14

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 15

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 16

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 17

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 18

Prim's Algorithm - Example

13

50

117

2

8 12

9

10

1440

31

6

20

04/20/2010 19

Boruvka's Algorithm

• Otakar Borůvka– Inventor of MST– Czech scientist– Introduced the problem– The original paper was written in Czech in

1926.– The purpose was to efficiently provide

electric coverage of Bohemia.

04/20/2010 20

Boruvka’s Algorithm

• Boruvka Approach:– Prim “in parallel”– Repeat the following procedure until the resulting

graph becomes a single node.• For each node u, mark its lightest incident edge. • Now, the marked edges form a forest F. Add the edges

of F into the set of edges to be reported.• Contract each maximal subtree of F into a single node.

04/20/2010 21

Boruvka’s Algorithm - Example

2.1

1.3

2.3

1.2

2.2

3.1

2.4

3

1

1.5

1.4

2.6

2.7

2.5

3.2

5

3.34

4.1

5.1

Usage of Minimum Spanning Trees

• Network design:– telephone, electrical, hydraulic, TV cable,

computer, road

• Approximation algorithms for NP-hard (non-deterministic polynomial-time hard) problems:– traveling salesperson problem

• Cluster Analysis

04/20/2010 22

Clustering

• Definition– Clustering is “the process of organizing

objects into groups whose members are similar in some way”.

– A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

– A data mining technique

04/20/2010 23

Why Clustering?

• Unsupervised learning process• Pattern detection• Simplifications• Useful in data concept

construction

04/20/2010 24

The use of Clustering

• Data Mining• Information Retrieval• Text Mining• Web Analysis• Marketing• Medical Diagnostic• Image Analysis• Bioinformatics

04/20/2010 25

Image Analysis – MST Clustering

04/20/2010 26

An image file before/after colorclustering using HEMST (Hierarchical EMST clustering algorithm) and SEMST (Standard EMST clustering algorithm).

References• Minimum Spanning Tree Based Clustering Algorithms -

http://www4.ncsu.edu/~zjorgen/ictai06.pdf• Minimal Spanning Tree based Fuzzy Clustering -

http://www.waset.org/journals/waset/v8/v8-2.pdf• Minimum Spanning Tree – Wikipedia -

http://en.wikipedia.org/wiki/Minimum_spanning_tree• Kruskal's algorithm - http://en.wikipedia.org/wiki/Kruskal

%27s_algorithm• Prim's Algorithm – Wikipedia -

http://en.wikipedia.org/wiki/Prim%27s_algorithm• Design and Analysis of Algorithms -

http://www.ics.uci.edu/~eppstein/161/960206.html

04/20/2010 27