cluster analysis

Download cluster analysis

Post on 28-May-2015

2.725 views

Category:

Documents

4 download

Embed Size (px)

TRANSCRIPT

  • 1. 1. Cluster Analysis Cluster Analysis Case ( ) 2 Case Case Case Case Case Cluster Analysis ( 2544. : 123) (Cluster Analysis) (Cases) (Objects) (Similarity) (Dissimilarity or Distance) (Variables) (Cluster) 2. Cluster Analysis 2 ( . 2552 : 286) (Cluster Analysis)

2. 2 Case ( ) 2 Case Case 1 6 A,B,C,D,E,F 1 1 (1,000 ) () A 5 25 B 6 26 C 15 34 D 16 35 E 25 40 F 30 39 1 ( . 2552 : 286) 3. 3 2 2 6 (n =6) 1 3 1 : A B 2 : C D (34-35 ) 3 : E F (39-40 ) 2 3. Cluster Analysis 1 1. 2. 3. 1 4. 4 1. 2. (GPAX) (IQ) 5. 5 (2540) Cluster Analysis Cluster Analysis 2 (2548) Case 1 2 3 1 Case 1 4 1 Case 1 1 2 Case Case 2 Case Profile 6. 6 () 100 Case 20 2,000 (100 20) 20 3 300 (3 100) 4. Cluster Analysis 1. 2. 3. 4. 1 1 5. Cluster Analysis 5.1 Cluster Analysis (2540) Cluster Analysis 5.1.1 : Cluster Analysis Case Factor Analysis Case 7. 7 5.1.2 5.1.3 5.1.4 (Squared Euclidean distance) 2 2 2 ( : 2540) 114 157 43 48 2 ( (114 - 157)2 + (43 - 48)2 132 +52 194 0 1 3 8. 8 3 ( :2540) 0.38 0.81 -0.46 -0.11 5.2 Cluster Analysis Cluster Analysis 2 5.2.1 Hierarchical Cluster Analysis 5.2.2 K-Means Cluster Analysis 2 Step Cluster Analysis Hierarchical Cluster Analysis K-Means Cluster Analysis 5.2.1 Hierarchical Cluster Analysis Case 1. Case Case ( Case 200 200 K-Means Cluster) 2. 3. Case 9. 9 Hierarchical Cluster Case 1 Case Case 2 Case Case 3 Cluster 5.3 (Similarity Measure) Cluster Case Case Case Case Case n C 2 Case = n k C 2 k Case (Distance) (Similarity) (Pearson correlation) Case Hierarchical Cluster 3 1. (Interval scale) (Ratio scale) 2. (Count Data) : 2 3 Discriminant Case 10. 10 3. Binary 2 0 1 Hierarchical (Interval Ratio scale) Binary Case Case Cluster 3 Case Case Case 5.4 (Methods for Combining Cluster) Hierarchical Cluster Agglomerative Hierarchical Cluster Analysis SPSS Agglomerative Schedule Agglomerative schedule Cluster 1 1 Cluster Case 1 Case Case 1 Cluster Cluster Case n Case n Cluster n 1 : Case 2 Case Cluster 2 : Case 3 2 Case 2 Case 3, 4 , 2 Case Case 2 Case Case 1 1 Cluster Case 11. 11 SPSS Method 1. Between groups Linkage Average Linkage Between Groups UPGMA (Unweightede Pair-Group Method Using Arithmetic Average) 2 Average Linkage ( . 2550 : 217) cluster i j cluster i k cluster j k cluster dij = cluster i j dik = cluster i j d jk = cluster i j Cluster i Cluster j Cluster k 12. 12 cluster dik cluster I k 2. Within-group Linkage Technique Average Linkage Within Groups Method Cluster Case Cluster 3. Nearest Neighbor Single Linkage dij cluster i j d ik cluster i k d jk cluster j k d ij , d ik d jk , d ik d jk cluster k j 3 Single Linkage ( . 2550 : 218) d d Cluster i Cluster j Cluster k d 13. 13 4. Furthest Neighbor Technique Complete Linkage d ij cluster i j d ik cluster i k dkj cluster k j d ij ,d ik ,dkj d ik cluster i k 4 Complete Linkage ( . 2550 : 218) 5. Centroid Clustering cluster 2 cluster cluster 2 cluster dij cluster i cluster j d ij cluster i cluster k d ij cluster j cluster k dkj cluster k j d Cluster i Cluster j Cluster k d d 14. 14 5 Centroid Clustering ( . 2550 : 219) 6. Median Clustering Cluster 2 Cluster Cluster ( ) Centroid Clustering Cluster Cluster () Median Clustering Median Centroid Median Clustering Median Centroid Median Cluster Cluster 7. Wards Method Sum of the squared within-cluster distance Cluster Sum of square within-cluster distance Square within-cluster distance Square Euclidean distance Case Cluster Mean 5.2.2 K-Means Cluster Analysis 1) K-Means Clustering Case Case Cluster k K-Means (Iteration) Cases Case Cluster i Cluster j Centroid 15. 15 2) K-Means Clustering K-Means Clustering (Interval Scale) (Ration Scale) Binary Hierarchical 3) K-Means K-Means Clustering 4 1 k - - 2 C c x 3 2 1 2 ESSZ(Error Sum Square) C ( i ) i ESS = ESS 4 3 1 i i 2 i ESS 16. 16 4 4 2 4) Hierarchical K-Means (2548()) Hierarchical K-Means 1. K-Means Case n 200 n K-Means Hierarchical Case Hierarchical 2. K-Means - K-Means 3, 4 5 - Hierarchical K-Means 3. Hierarchical Standardized K-Means Standardized 4. K-Means Euclidean Distance Hierarchical 17. 17 Cluster Analysis Discriminant Analysis (2550) Cluster Analysis Case Case Discriminant Analysis 2 4 Cluster & Discriminant Cluster Analysis Discriminant Analysis 1. 1. 2. Case 2. Case 3. 3. 18. 18 SPSS Case Hierarchical Cluster K-Means 19. 19 1. SPSS Cases Hierarchical Cluster 1 : ) Case Case () () ( : 10,000 ) 5 () (10,000 45 60 2 7 6 Standardized () (10,000 .707 -.707 -.707 .707 Euclidean Distance 5 = (45 60)2 + (2 7)2 = 225 + 25 = 250 250 = (255 / 250) x 100 = 90% 10% Standardized Z-score 6 Euclidean distance Z-score (-.707 (-.707)2 + (-.707 - .707)2 = .999 50% 20. 20 ) Standardized 0 1 Hierarchical Cluster Standardized 1 Hierarchical Cluster Z-score Standardized Analyze Descriptive statistics Descriptive 6 6 Descriptive statistics box 21. 21 1 box variable (s) Cluster Case Save Standardized values as variables. Z-score Z Z-score 7 7 Z-score 22. 22 2 : Analyze Classify Hierarchical Cluster 8 Hierarchical Cluster Dialog box 8 1 : Variable (s) box Case (Numeric variable) 1 3 2 : Label Case By Case Case 1 Case 1 Province Province box Label Cases by Nominal String Box Label Cases by Case 23. 23 3 : Cluster Case Cases Case Variables 4 : Display Statistics Plots 8 9 9 Hierarchical Cluster Analysis : Statistics 9 2 1 : 2 2 Agglomeration schedule Case Proximity matrix Matrix Case 24. 24 2 : Cluster Membership Case Cluster None Case Default Single solutions cluster Cluster () 1 3 3 Range of solutions Cluster 2 8 10 Hierarchical Cluster Analysis :Plots 25. 25 10 3 1 : Dendrogram Cluster 1 25 2 : Icicle Icicle Plots 3 All Clusters Icicle Plot Cluster Specified range of clusters Icicle Plot Cluster Start, Stop By Start Stop By 3, 7 2 Icicle Plot 3, 5, 7 Cluster None Icicle Plot 3 : Orientation Vertical Icicle Plot Horizontal Icicle Plot 8 11 Hierarchical Cluster Analysis : Method 26. 26 11 4 1 : Cluster Method Cluster Cluster Between-group linkage : Average linkage between groups (UPGMA) Within-group linkage : Average linkage within groups Nearest neighbor : Single linkage Furthest neighbor : Complete linkage Centroid clustering Medain clustering Wards method 2 : Measure 3 Interval Interval Radio scale Count Binary 2 SPSS 2 X 2 case A, b, c, d 3 : Transform Value case Standardize Standardize Interval Count 1 None Standardize 27. 27 Z score Standardize Z score 0 1 Range 1 to 1 Standardize 1 1 Range 0 to 1 Standardize 0 1 4 : Transform Measure Interval Count Standardize Case proximity Absolute values Change sign () Rescale to 0 1 range 0 1 Standardize 8 12 : Save 28. 28 12 Case Cluster Membership None Single solution Range of solutions case 2, 3, 4, 5 from 2 through 5 box 1 box box Hierarchical Cluster Analysis 1 1 Hieratchical Cluster Case Case 20 Case 1 : case case cars SPSS 20 case 2 : Case 1 20 Data Select Case 13 29. 29 13 Select Case 13 Based on time or case range 13 First case box Last case 30. 30 3 : Hierarchical Cluster Analyze Classify Hierarchical Cluster 14 14 : Hierarchical Cluster 14 5 box Variables (s) Cluster Cases (Case) Display Stratistics Plots 31. 31 14 15 15 Statistics 15 Agglomeration schedule Proximity matrix Range of solutions 14 32. 32 14 16 16 : Plots Dendogram Icicle All Clusters 14 33. 33 14 17 17 : Method Cluster Method Between groups Linkage Measure Interval 5 Ratio scale Square Euclidean distance Transform Values Z scores 4 By Variable 14 34. 34 14 18 18 : Save Range of solution : 7 Case Processing Summary a Case Processing Summary a Cases Valid Missing Total N Percent N Percent N Percent 14 70.0% 6 30.0% 20 100.0% a. Squared Euclidean Distance used 35. 35 7 20 Case Missing 6 Case 14 70% (14/20) 8 Proximity Matrix 8 : Proximity Matrix 8 Case Squared Euclidean Distance case 1 case 9 28.593 case 1 case 3 1.024 Case case 1 case 3 case 1 case 3 5 case 1 case 9 case 1 case 3 5 Case Squared Euclidean Distance 1:Case 1 2:Case 2 3:Case 3 4:Case 4 5:Case 5 9:Case 9 . 13:Case 19 14:Case 20 1:Case 1 .000 6.302 1.024 2.319 1.974 28.953 - 11.307 25.208 2:Case 2 6.302 .000 5.360 1.800 4.071 11.079 - 3.368 10.148 3:Case 3 1.024 5.360 .000 2.603 .797 23.971 - 8.191 19.213 4:Case 4 2.319 1.800 2.603 .000 1.952 21.153 - 7.471 16.350 5:Case 5 1.974 4.071 .797 1.952 .000 22.681 - 5.848 18.117 6:Case 6 18.914 6.407 14.979 13.892 13.962 1.354 - 4.28