cluster analysis

88
Cluster Analysis Summer School “Achievements and Applications of Contemporary Informatics, Mathematics and Physics” (AACIMP 2011) August 8-20, 2011, Kiev, Ukraine Erik Kropat University of the Bundeswehr Munich Institute for Theoretical Computer Science, Mathematics and Operations Research Neubiberg, Germany

Upload: ssa-kpi

Post on 27-Jan-2015

2.474 views

Category:

Education


3 download

DESCRIPTION

AACIMP 2011 Summer School. Operational Research Stream. Lecture by Erik Kropat.

TRANSCRIPT

Page 1: Cluster Analysis

Cluster Analysis

Summer School

“Achievements and Applications of Contemporary Informatics,

Mathematics and Physics” (AACIMP 2011)

August 8-20, 2011, Kiev, Ukraine

Erik Kropat

University of the Bundeswehr Munich Institute for Theoretical Computer Science,

Mathematics and Operations Research

Neubiberg, Germany

Page 2: Cluster Analysis

The Knowledge Discovery Process

Page 3: Cluster Analysis

PRE-

PROCESSING

DATA MINING

PATTERN EVALUATION

RawData

Preprocessed Data

Patterns

Knowledge

Standardizing Missing values / outliers

Strategic planning

Patterns, clusters, correlations automated classification outlier / anomaly detection association rule learning…

Page 4: Cluster Analysis

Clustering

Page 5: Cluster Analysis

Clustering

… is a tool for data analysis, which solves classification problems.

Problem

Given n observations, split them into K similar groups.

Question

How can we define “similarity” ?

Page 6: Cluster Analysis

Similarity

A cluster is a set of entities which are alike,

and entities from different clusters are not alike.

Page 7: Cluster Analysis

Distance

A cluster is an aggregation of points such that

the distance between any two points in the cluster

is less than

the distance between any point in the cluster and any point not in it.

Page 8: Cluster Analysis

Density

Clusters may be described as

connected regions of a multidimensional space containing a relatively high density of points,

separated from other such regions by a region containing a relatively low density of points.

Page 9: Cluster Analysis

Min Max-Problem

Homogeneity: Objects within the same cluster should be similar to each other.

Separation: Objects in different clusters should be dissimilar from each other.

similarity ⇔ distance

Distance between clusters Distance between

objects

Page 10: Cluster Analysis

Types of Clustering

Clustering

Hierarchical Clustering

Partitional Clustering

agglomerative divisive

Page 11: Cluster Analysis

Similarity and Distance

Page 12: Cluster Analysis

Distance Measures

A metric on a set G is a function d: G x G → R+ that satisfies the following conditions:

(D1) d(x, y) = 0 ⇔ x = y (identity) (D2) d(x, y) = d(y, x) ≥ 0 for all x, y ∈ G (symmetry & non-negativity) (D3) d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ G (triangle inequality)

x

y

z

Page 13: Cluster Analysis

Examples Minkowski-Distance

o r = 1: Manhatten distance

o r = 2: Euklidian distance

Σ i = 1

_

, r ∈ [1, ∞) n

1 r

, x, y ∈ Rn. d r (x, y) = | xi − yi | r

Page 14: Cluster Analysis

Euclidean Distance

d2 (x, y) = , x, y ∈ Rn

x = (1, 1)

y = (4, 3)

d2 (x, y) = (1 - 4) + (1 - 3) = √13

x

y

Σ i = 1

_ n 1

2 ( xi − yi ) 2

2 2 _ 1 2 ____

Page 15: Cluster Analysis

Manhatten Distance

d1 (x, y) = , x, y ∈ Rn

x = (1, 1)

y = (4, 3)

d1 (x, y) = 1 - 4 + 1 - 3 = 3 + 2 = 5

Σ i = 1

n | xi − yi |

x

y

| | | |

Page 16: Cluster Analysis

Maximum Distance

d∞ (x, y) = max | xi − yi | 1 ≤ i ≤ n

, x, y ∈ Rn

x = (1, 1)

y = (4, 3)

d∞ (x, y) = max (3, 2) = 3

x

y

Page 17: Cluster Analysis

Similarity Measures

A similarity function on a set G is a function S: G x G → R that satisfies the following conditions:

(S1) S (x, y) ≥ 0 for all x, y ∈ G (positive defined) (S2) S (x, y) ≤ S (x, x) for all x, y ∈ G (auto-similarity) (S3) S (x, y) = S (x, x) ⇔ x = y for all x, y ∈ G (identity) The value of the similarity function is greater when two points are closer.

Page 18: Cluster Analysis

Similarity Measures

• There are many different definitions of similarity. • Often used

(S4) S (x, y) = S (y, x) for all x, y ∈ G (symmetry)

Page 19: Cluster Analysis

Hierachical Clustering

Page 20: Cluster Analysis

Dendrogram

www.isa.uni-stuttgart.de/lehre/SAHBD

Gross national product of EU countries – agriculture (1993)

Eucl

idea

ndi

stan

ce (

com

plet

elin

kage

)Eu

clid

ean

dist

ance

(co

mpl

ete

linka

ge)

Eucl

idea

n di

stan

ce

(com

plet

e lin

kage

)

Cluster Dendrogram

Page 21: Cluster Analysis

Hierarchical Clustering

Hierarchical clustering creates a hierarchy of clusters of the set G.

Agglomerative clustering: Clusters are successively merged together

Divisive clustering: Clusters are recursively split

Hierarchical Clustering

agglomerative divisive

Page 22: Cluster Analysis

Agglomerative Clustering

e3 e4 Step 0

Step 1

Step 2

, e3 , e4 Step 3

4 clusters

3 clusters

2 clusters

1 cluster e1 , e2

Merge clusters with smallest distance between the two clusters

e4

e1 , e2 , e3 e4

e3 e1 , e2

e1 e2

Page 23: Cluster Analysis

Divisive Clustering

e4 Step 3

Step 2

Step 1

, e3 , e4 Step 0

4 clusters

3 clusters

2 clusters

1 cluster e1 , e2

Chose a cluster, that optimally splits in two particular clusters according to a given criterion.

e3

e1 e2

e1 , e2

e1 , e2 e3 , e4

e4

e3

Page 24: Cluster Analysis

Agglomerative Clustering

Page 25: Cluster Analysis

INPUT

Given n objects G = { e1,...,en }

represented by p-dimensional feature vectors x1,...,xn ∈ Rp Object

Feat

ure

1

Feat

ure

2

Feat

ure

3

Feat

ure

p

x1 = ( x11 x12 x13 . . . x1p )

x2 = ( x21 x22 x23 . . . x2p )

⁞ ⁞ ⁞ ⁞ ⁞

xn = ( xn1 xn2 xn3 . . . xnp )

Page 26: Cluster Analysis

Example I

An online shop collects data from its customers. For each of the n customers it exists a p-dimensional feature vector

Object

Page 27: Cluster Analysis

Example II

In a clinical trial laboratory values of a large number of patients are gathered. For each of the n patients it exists a p-dimensional feature vector

Page 28: Cluster Analysis

Agglomerative Algorithms

• Begin with disjoint clustering C1 = { {e1}, {e2}, ... , {en} } • Terminate when all objects are in one cluster Cn = { {e1, e2, ... , en} } • Iterate find the most similar pair of clusters

and merge them into a single cluster.

Sequence of clusterings (Ci )i=1,...n of G with

C i 1 ⊂ C i for i = 2,...,n.

e1 e2 e3 e4

Page 29: Cluster Analysis

What is the distance between two clusters?

d (A,B) A B

⇒ Various hierarchical clustering algorithms

Page 30: Cluster Analysis

Agglomerative Hierarchical Clustering

There exist many metrics to measure the distance between clusters. They lead to particular agglomerative clustering methods: • Single-Linkage Clustering

• Complete-Linkage Clustering

• Average Linkage Clustering

• Centroid Method

• . . .

Page 31: Cluster Analysis

Single-Linkage Clustering

Nearest-Neighbor-Method

The distance between the clusters A und B is the minimum distance between the elements of each cluster: d(A,B) = min { d (a, b) | a ∈ A, b ∈ B }

a b d(A,B)

Page 32: Cluster Analysis

Single-Linkage Clustering

• Advantage: Can detect very long and even curved clusters.

Can be used to detect outliers.

• Drawback: Chaining phenomen

Clusters that are very distant to each other may be forced together due to single elements being close to each other.

A

B

C

Page 33: Cluster Analysis

Complete-Linkage Clustering

Furthest-Neighbor-Method

The distance between the clusters A and B is the maximum distance between the elements of each cluster:

d(A,B) = max { d(a,b) | a ∈ A, b ∈ B }

a b d (A, B)

Page 34: Cluster Analysis

Complete-Linkage Clustering

• … tends to find compact clusters of approximately equal diameters. • … avoids the chaining phenomen.

• … cannot be used for outlier detection.

Page 35: Cluster Analysis

Average-Linkage Clustering

The distance between the clusters A and B is the mean distance between the elements of each cluster: d (A, B) = d (a, b)

|A| ⋅ |B| 1 Σ

a ∈ A, b ∈ B

d(A,B)

a b

A B

Page 36: Cluster Analysis

Centroid Method

The distance between the clusters A and B is the (squared) Euclidean distance of the cluster centroids.

d (A, B) x x

Page 37: Cluster Analysis

d (A, B) x x

Agglomerative Hierarchical Clustering

a b d (A, B) d (A, B)

d (A, B) d (A, B)

d (A, B)

Page 38: Cluster Analysis

Bioinformatics

Alizadeh et al., Nature 403 (2000): pp.503–511

Page 39: Cluster Analysis

Exercise

Paris

Berlin Kiev

Odessa

Page 40: Cluster Analysis

Exercise

Kiev Odessa Berlin Paris

Kiev 440 1200 2000

Odessa 440 1400 2100

Berlin 1200 1400 900

Paris 2000 2100 900

The following table shows the distances between 4 cities:

Determine a hierarchical clustering with

the single linkage method.

Page 41: Cluster Analysis

Solution - Single Linkage

Kiev Odessa Berlin Paris

Kiev 440 1200 2000

Odessa 440 1400 2100

Berlin 1200 1400 900

Paris 2000 2100 900

Step 0: Clustering

{Kiev}, {Odessa}, {Berlin}, {Paris}

Distances between clusters

Page 42: Cluster Analysis

Solution - Single Linkage

Kiev Odessa Berlin Paris

Kiev 440 1200 2000

Odessa 440 1400 2100

Berlin 1200 1400 900

Paris 2000 2100 900

Step 0: Clustering

{Kiev}, {Odessa}, {Berlin}, {Paris}

Distances between clusters minimal distance

⇒ Merge clusters { Kiev } and { Odessa } Distance value: 440

Page 43: Cluster Analysis

Solution - Single Linkage

Kiev, Odessa Berlin Paris

Kiev, Odessa 1200 2000

Berlin 1200 900

Paris 2000 900

Step 1: Clustering

{Kiev, Odessa}, {Berlin}, {Paris}

Distances between clusters

Page 44: Cluster Analysis

Solution - Single Linkage

Kiev, Odessa Berlin Paris

Kiev, Odessa 1200 2000

Berlin 1200 900

Paris 2000 900

Step 1: Clustering

{Kiev, Odessa}, {Berlin}, {Paris}

Distances between clusters minimal distance

⇒ Merge clusters { Berlin } and { Paris } Distance value: 900

Page 45: Cluster Analysis

Solution - Single Linkage

Kiev, Odessa Berlin, Paris

Kiev, Odessa 1200

Berlin, Paris 1200

Step 2: Clustering

{Kiev, Odessa}, {Berlin, Paris}

Distances between clusters minimal distance

⇒ Merge clusters { Kiev, Odessa } and { Berlin, Paris } Distance value: 1200

Page 46: Cluster Analysis

Solution - Single Linkage

Step 3: Clustering

{Kiev, Odessa, Berlin, Paris}

Page 47: Cluster Analysis

Solution - Single Linkage

Hierarchy

Kiev Odessa Berlin Paris 0

440

4 clusters

3 clusters

2 clusters

1 cluster

1340

2540

440

900

1200

Distance values

Page 48: Cluster Analysis

Divisive Clustering

Page 49: Cluster Analysis

Divisive Algorithms

• Begin with one cluster C1 = { {e1, e2, ... , en} } • Terminate when all objects are in disjoint clusters Cn = { {e1}, {e2}, ... , {en} } • Iterate Chose a cluster Cf , that optimally splits two particular clusters Ci and Cj according to a given criterion.

Sequence of clusterings (Ci )i=1,...n of G with

C i ⊃ C i + 1 for i = 1,...,n-1.

e1 e2 e3 e4

Page 50: Cluster Analysis

Partitional Clustering

Minimal Distance Methods

Page 51: Cluster Analysis

Partitional Clustering

• Aims to partition n observations into K clusters. • The number of clusters and

an initial partition are given.

• The initial partition is considered as

“not optimal“ and should be

iteratively repartitioned.

The number of clusters is given !!!

K = 2

K = 2

initial partition

final partition

Page 52: Cluster Analysis

Partitional Clustering

Difference to hierarchical clustering

• number of clusters is fixed.

• an object can change the cluster.

Initial partition is obtained by

• random or

• the application of an hierarchical clustering algorithm in advance.

Estimation of the number of clusters

• specialized methods (e.g., Silhouette) or

• the application of an hierarchical clustering algorithm in advance.

Page 53: Cluster Analysis

Partitional Clustering - Methods

• K-Means and

• Fuzzy-C-Means

In this course we will introduce the minimal distance methods . . .

Page 54: Cluster Analysis

K-Means

Page 55: Cluster Analysis

K-Means

Find K cluster centroids µ1 ,..., µK

that minimize the objective function

dist ( µi, x ) i = 1 Σ K

x ∈ C i Σ J =

2

Aims to partition n observations into K clusters

in which each observation belongs to the cluster with the nearest mean.

G

C1

C2

C3

Page 56: Cluster Analysis

K-Means

Find K cluster centroids µ1 ,..., µK

that minimize the objective function

dist ( µi, x ) i = 1 Σ K

x ∈ C i Σ J =

2

Aims to partition n observations into K clusters

in which each observation belongs to the cluster with the nearest mean.

G

C1

C2

C3

x x

x

Page 57: Cluster Analysis

K-Means - Minimal Distance Method

x x

Given: n objects, K clusters

1. Determine initial partition.

2. Calculate cluster centroids.

3. For each object, calculate the distances to all cluster centroids.

4. If the distance to the centroid of another cluster is smaller than the distance to the actual cluster centroid, then assign the object to the other cluster.

repartition

5. If clusters are repartitioned: GOTO 2.

ELSE: STOP.

Page 58: Cluster Analysis

Example

Initial Partition

ₓ ₓ ₓ ₓ

Final Partition

Page 59: Cluster Analysis

Exercise

Initial Partition

ₓ ₓ

Final Partition

Page 60: Cluster Analysis

K-Means

• K-Means does not determine the global optimal partition.

• The final partition obtained by K-Means depends on the initial partition.

Page 61: Cluster Analysis

Hard Clustering / Soft Clustering

Hard Clustering

Clustering

Soft Clustering

Each object is a member of exactly one cluster

Each object has a fractional membership in all clusters

K-Means Fuzzy-c-Means

Page 62: Cluster Analysis

Fuzzy-c-Means

Page 63: Cluster Analysis

• When clusters are well separated, hard clustering (K-Means) makes sense.

• In many cases, clusters are not well separated.

In hard clustering, borderline objects are assigned to a cluster in an arbitrary manner.

Fuzzy Clustering vs. Hard Clustering

Page 64: Cluster Analysis

• Fuzzy Theory was introduced by Lofti Zadeh in 1965.

• An object can belong to a set with a degree of membership

between 0 and 1.

• Classical set theory is a special case of fuzzy theory

that restricts membership values to be either 0 or 1.

Fuzzy Set Theory

Page 65: Cluster Analysis

• Is based on fuzzy logic and fuzzy set theory.

• Objects can belong to more than one cluster.

• Each object belongs to all clusters with some weight (degree of membership)

Fuzzy Clustering

1

0

Cluster 1

Cluster 2

Cluster 3

Page 66: Cluster Analysis

Hard Clustering

• K-Means

Object

Cluster e1 e2 e3 e4

C1 0 1 0 0

C2 1 0 0 0

C3 0 0 1 1 C1

e2 C2

e1

C3

e3 e4

− The number K of clusters is given.

− Each object is assigned to exactly one cluster.

Partition

Page 67: Cluster Analysis

Fuzzy Clustering

• Fuzzy-c-means

Object

Cluster e1

e2

e3

e4

C1 0.8 0.2 0.1 0.0

C2 0.2 0.2 0.2 0.0

C3 0.0 0.6 0.7 1.0

Σ 1 1 1 1

− The number c of clusters is given.

− Each object has a fractional membership in all clusters

Fuzzy-Clustering

There is no strict sub-division of clusters.

Page 68: Cluster Analysis

Fuzzy-c-Means

• Membership Matrix

The entry u i k denotes the degree of membership of object k in cluster i .

U = ( u i k ) ∈ [0, 1]c x n

Object 1 Object 2 … Object n

Cluster 1 u11 u12 … u1n

Cluster 2 u21 u22 … u2n

Cluster c uc1 uc2 … ucn

Page 69: Cluster Analysis

Restrictions (Membership Matrix)

1. All weights for a given object, ek, must add up to 1.

2. Each cluster contains – with non-zero weight – at least one object,

but does not contain – with a weight of one – all the objects.

i = 1 Σ c

u i k = 1 (k = 1,...,n)

k = 1 Σ n

u i k < n (i = 1,...,c) 0 <

Page 70: Cluster Analysis

Fuzzy-c-Means

• Vector of prototypes (cluster centroids) Remark

The cluster centroids and the membership matrix are initialized randomly.

Afterwards they are iteratively optimized.

V = ( v1,...,vc ) ∈ Rc T

Page 71: Cluster Analysis

Fuzzy-c-Means

ALGORITHM

1. Select an initial fuzzy partition U = (u i k )

⇒ assign values to all u i k 2. Repeat

3. Compute the centroid of each cluster using the fuzzy partition

4. Update the fuzzy partition U = (u i k )

5. Until the centroids do not change.

Other stopping criterion: “change in the u i k is below a given threshold”.

Page 72: Cluster Analysis

Fuzzy-c-Means

• K-Means and Fuzzy-c-Means attempt to minimize the sum of the squared errors (SSE). • In K-Means:

• In Fuzzy-c-Means:

dist ( vi, xk ) u i k m

i = 1 Σ c

k = 1 Σ n . SSE =

2

dist ( vi, x ) i = 1 Σ K

x ∈ C i Σ SSE =

2

m ∈ [1, ∞] is a parameter (fuzzifier) that determines the influence of the weights.

u1k v1

v2

v3

xk u3k

u2k

Page 73: Cluster Analysis

Computing Cluster Centroids

• For each cluster i = 1,...,c the centroid is defined by • This is an extension of the definition of centroids of k-means.

• All points are considered and the contribution of each point

to the centroid is weighted by its membership degrees.

u1k v1

v2

v3

xk u3k

u2k

k = 1 Σ n v i =

u i k m

_________________ xk u i k

m

k = 1 Σ n

( i = 1,...,c )

(V)

Page 74: Cluster Analysis

• Minimization of SSE subject to the constraints leads to the following update formula:

s = 1 Σ c

u i k =

dist ( v i , xk ) 2

dist ( vs , xk ) __________

1 m – 1 _____

______________________________________ 1

Update of the Fuzzy Partition (Membership Matrix)

2 (U)

Page 75: Cluster Analysis

Fuzzy-c-Means

Iteration

Calculate updates of

• Matrix U of membership grades with (U)

• Matrix V of cluster centroids with (V)

until cluster centroids are stable or the maximum number of iterations is reached.

Initialization

Determine (randomly)

• Matrix U of membership grades

• Matrix V of cluster centroids.

Page 76: Cluster Analysis

Fuzzy-c-means

• Fuzzy-c-means depends on the Euclidian metric

⇒ spherical clusters. • Other metrics can be applied to obtain different cluster shapes.

• Fuzzy covariance matrix (Gustafson/Kessel 1979)

⇒ ellipsoidal clusters.

Page 77: Cluster Analysis

Cluster Validity Indizes

Page 78: Cluster Analysis

Cluster Validity Indexes

Fuzzy-c-means requires the number of clusters as input.

Question: How can we determine the “optimal” number of clusters?

Method: For all possible number of clusters calculate the cluster validity index. Then, determine the optimal number of clusters.

Note: CVIs usually do not depend on the clustering algorithm.

Idea: Determine the cluster partition for a given number of clusters. Then, evaluate the cluster partition by a cluster validity index.

Page 79: Cluster Analysis

Cluster Validity Indexes

• Partition Coefficient (Bezdek 1981)

• Optimal number of clusters c∗ :

PC (c) = i = 1 Σ c

k = 1 Σ n

u i k 2

PC (c∗) = max 2 ≤ c ≤ n-1

PC (c)

1 n

__ , 2 ≤ c ≤ n-1

Page 80: Cluster Analysis

Cluster Validity Indexes

• Partition Entropy (Bezdek 1974)

• Optimal number of clusters c∗ :

• Drawback of PC and PE: Only degrees of memberships are considered. The geometry of the data set is neglected.

PC (c∗) = min 2 ≤ c ≤ n-1

PC (c)

PC (c) = i = 1 Σ c

k = 1 Σ n

u i k , 2 ≤ c ≤ n-1 1 n

__ _ log2 u i k

Page 81: Cluster Analysis

Cluster Validity Indexes

• Fukuyama-Sugeno Index (Fukuyama/Sugeno 1989)

• Optimal number of clusters c∗ :

FS (c) = Compactness of clusters

Separation of clusters

_

i =1

c 1 c __ v = vi PC (c∗) = max

2 ≤ c ≤ n-1 PC (c) Σ

i = 1 Σ c

k = 1 Σ n

u i k m

dist ( vi , xk )

i = 1 Σ c

k = 1 Σ n

u i k m

dist ( vi , v ) _

2

2 _

Page 82: Cluster Analysis

Application

Page 83: Cluster Analysis

Data Mining and Decision Support Systems Landslide Events (UniBw, Geoinformatics Group: W. Reinhardt, E. Nuhn)

• Measurements (pressure values, tension, deformation vectors)

• Simulations (finite-element model)

→ Spatial Data Mining / Early Warning Systems for Landslide Events

→ Fuzzy clustering approaches (feature weighting)

Page 84: Cluster Analysis

Problem: Uncertain data from measurements and simulations

Partition

Hard Clustering

Data

Page 85: Cluster Analysis

Fuzzy Clustering

Fuzzy-Cluster Fuzzy-Partition

Data

Page 86: Cluster Analysis

Fuzzy Clustering

Page 87: Cluster Analysis

Feature Weighting

Nuhn/Kropat/Reinhardt/Pickl: Preparation of complex landslide simulation results with clustering approaches for decision support and early warning. Submitted to Hawaii International Conference on System Sciences (HICCS 45), Grand Wailea, Maui, 2012.

Page 88: Cluster Analysis

Thank you very much!