cluster analysis

39
Cluster analysis

Upload: ryu

Post on 05-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it. K-means. Criteria. Same criteria with multivariate data:. Justifying the criteria. Anova: decomposition of the variance. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cluster analysis

Cluster analysis

Page 2: Cluster analysis

• Partition MethodsDivide data into disjoint clusters

• Hierarchical Methods

Build a hierarchy of the observations and deduce the clusters from it.

Page 3: Cluster analysis

K-means

Page 4: Cluster analysis

Criteria

Page 5: Cluster analysis

Same criteria with multivariate data:

Page 6: Cluster analysis

Justifying the criteria• Anova: decomposition of the variance.

Univariate:

SST=SSW+SSB

Multivariate:

Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).

Page 7: Cluster analysis

K-means algorithm

Page 8: Cluster analysis

Number of clusters

Page 9: Cluster analysis

Consequences of standardization

Page 10: Cluster analysis

Ruspini example

Page 11: Cluster analysis
Page 12: Cluster analysis
Page 13: Cluster analysis
Page 14: Cluster analysis
Page 15: Cluster analysis

Problems of k-means

• Very sensitive to outliers

• Euclidean distances not appropriate for eliptical clusters

• It does not give the number of clusters.

Page 16: Cluster analysis

Hierarchical Algoritms

Page 17: Cluster analysis

Agglomerative algorithms

Page 18: Cluster analysis

Nearest neighbour distance

Page 19: Cluster analysis

Farthest neighbour distance

Page 20: Cluster analysis

Average distance

Page 21: Cluster analysis

Centroid method distance

Page 22: Cluster analysis

Ward’s method distance

Page 23: Cluster analysis

Dendograms

Page 24: Cluster analysis

Example

Page 25: Cluster analysis
Page 26: Cluster analysis
Page 27: Cluster analysis
Page 28: Cluster analysis
Page 29: Cluster analysis
Page 30: Cluster analysis
Page 31: Cluster analysis
Page 32: Cluster analysis

Problems of hierarchical cluster

• If n is large, slow. Each time n(n-1)/2 comparisons.

• Euclidean distances not always appropriate

• If n is large, dendogram difficult to interpret

Page 33: Cluster analysis

Clustering by variables

Page 34: Cluster analysis
Page 35: Cluster analysis

Distances between quantitative variables

Page 36: Cluster analysis

Distances between qualitative variables

Page 37: Cluster analysis

Similarity between attributes

Page 38: Cluster analysis
Page 39: Cluster analysis