cluster analysis grouping cases or variables. clustering cases goal is to cluster cases into groups...

36
Cluster Analysis Grouping Cases or Variables

Upload: christian-girling

Post on 01-Apr-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Cluster Analysis

Grouping Cases or Variables

Page 2: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Clustering Cases

• Goal is to cluster cases into groups based on shared characteristics.

• Start out with each case being a one-case cluster.

• The clusters are located in k-dimensional space, where k is the number of variables.

• Compute the squared Euclidian distance between each case and each other case.

Page 3: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Squared Euclidian Distance

• the sum across variables (from i = 1 to v) of the squared difference between the score on variable i for the one case (Xi) and the score on variable i for the other case (Yi)

2

1

v

iii YX

Page 4: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Agglomerate

• The two cases closest to each other are agglomerated into a cluster.

• The distances between entities (clusters and cases) are recomputed.

• The two entities closest to each other are agglomerated.

• This continues until all cases end up in one cluster.

Page 5: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

What is the Correct Solution?

• You may have theoretical reasons to expect a certain k cluster solution.

• Look at that solution and see if it matches your expectations.

• Alternatively, you may try to make sense out of solutions at two or more levels of the analysis.

Page 6: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Faculty Salaries

• Subjects were faculty in Psychology at ECU.

• Variables were rank, experience, number of publications, course load, and salary.

• Data are at ClusterAnonFaculty.sav• Also see the statistical output

Page 7: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Analyze, Classify, Hierarchical Cluster

Page 8: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Statistics

Page 9: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Plots

Page 10: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Method

Page 11: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Save

Page 12: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Proximity Matrix

• We did not request this, but if we had it would display a measure of dissimilarity for each pair of entities.

• The pair of cases with the smallest squared Euclidian distance are clustered.

Page 13: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Stage Cluster Combined Coefficients

Cluster 1 Cluster 2 Cluster 1

1 32 33 .000

Look at the Agglomeration Schedule.Cases 32 and 33 are clustered. They are very similar (distance = 0.000)

Page 14: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Agglomeration Schedule

StageCluster Combined

Coefficients

Stage Cluster First Appears

Next Stage

Cluster 1 Cluster 2 Cluster 1 Cluster 2 Cluster 1 Cluster 21 32 33 .000 0 0 92 41 42 .000 0 0 63 43 44 .000 0 0 64 37 38 .000 0 0 55 37 39 .001 4 0 76 41 43 .002 2 3 27

Steps 2 Through 5

Page 15: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Stages 2-5

• The agglomeration schedule show that in Stage 2 cases 41 and 42 are clustered.

• In Stage 3 cases 43 and 44 are clustered.• In Stage 4 cases 37 and 38 are clustered.• In Stage 5 case 39 is added to the cluster

that contains cases 37 and 38.• And so on.

Page 16: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Vertical Icicle, Two Clusters

• Look at the top of the display (next slide).• You can see two clusters

– On the left Boris through Willy– On the right, Deanna through Sunila

• The 2 cluster solution was adjuncts versus full time faculty.

Page 17: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each
Page 18: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Vertical Icicle, Three Clusters

• Look at the icicle second highest white bar.

• Now there are three clusters– Adjuncts– Junior faculty (Deanna through Mickey)– Senior faculty (Lawrence through Roslyn)

Page 19: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each
Page 20: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Vertical Icicle, Four Clusters

• Look at the white bar furthest to the right.• Now there are four clusters

– Adjuncts– Junior faculty – The acting chair (Lawrence)– The rest of the senior faculty (Catalina

through Roslyn)

Page 21: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

The Dendogram

• At the far right you can see the two cluster solution.

• The next step to the left shows the three cluster solution.

• The next step to the left shows the four cluster solution.

• And so on.• Truncated and rotated dendogram on next

slide.

Page 22: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each
Page 23: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Compare Two Clusters

• The 2 cluster solution was adjuncts versus everybody else.

• Look at the t tests in the output• Adjuncts had lower rank, experience,

number of publications, course load, and salary.

Page 24: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Compare Three Clusters

• Look at the ANOVAs and plots.• The senior faculty had higher salary,

experience, rank, and number of pubs.

Compare Four Clusters• The acting chair had a higher salary and

number of publications.

Page 25: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

I Could Not Help Myself

• With these data on hand, I could not resist predicting salary from the other variables.

• Salary was well correlated with Rank, FTEs, Publications, and Experience.

• In the multiple regression, only Rank and FTEs had significant unique effects.

• The residuals suggest who was being overpaid and who underpaid.

Page 26: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Split by Sex

• For men, the unique effect of number of publications was positive – more publications, higher salary.

• For women it was negative – more publications, lower salary.

• Curious.

Page 27: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Workaholism

• Aziz & Zickar (2005)• Workaholics may be defined as those

– High in work involvement,– High in drive to work, and– Low in work enjoyment.

• For each case, a score was obtained for each of these three dimensions.

Page 28: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

The Three Cluster Solution

• Workaholics– High work involvement– High drive to work– Low work enjoyment

• Positively engaged workers– High work involvement– Medium drive to work– High work enjoyment

Page 29: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

• Unengaged workers– Low work involvement– Low drive to work– Low work enjoyment

• Past research/theory indicated there should be six clusters, but the theorized six clusters were not obtained.

Page 30: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Clustering Variables

• FactBeer.sav• The statistical output.• Analyze, Classify, Hierarchical Cluster

Page 31: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each
Page 32: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Statistics

Page 33: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Plots

Page 34: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Method

Page 35: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Proximity Matrix

• Is simply the intercorrelation matrix• The two most correlated variables are

Color and Aroma (r = .909) – they are clustered on the first step.

• Stage 2: Size and Alcohol (r = .904) are clustered.

• Stage 3: Taste added to the cluster that already contains Color and Aroma

Page 36: Cluster Analysis Grouping Cases or Variables. Clustering Cases Goal is to cluster cases into groups based on shared characteristics. Start out with each

Also See Other Tables & Plots

• Stage 4: Cost added to the cluster that already contains Size and Alcohol.

• Stage 5: The two clusters are combined– But they are not very similar (similarity

coefficient = .038)– Now we have one cluster with six variables

and one with one (Reputation)