chapter 9 -- cluster analysis

25
Chapter 9 Chapter 9 Cluster Analysis: Cluster Analysis: Overview and Applications Overview and Applications

Upload: anuradharai

Post on 20-Jul-2016

97 views

Category:

Documents


14 download

DESCRIPTION

Chapter 9 -- Cluster Analysis

TRANSCRIPT

Page 1: Chapter 9 -- Cluster Analysis

Chapter 9Chapter 9Cluster Analysis:Cluster Analysis:

Overview and ApplicationsOverview and Applications

Page 2: Chapter 9 -- Cluster Analysis

Cluster Analysis OverviewCluster Analysis Overview

• What is it?What is it?

• Why use it?Why use it?

Page 3: Chapter 9 -- Cluster Analysis

Cluster AnalysisCluster Analysis

. . . . . . groups objects (respondents, groups objects (respondents, products, firms, variables, etc.) so that products, firms, variables, etc.) so that each object is similar to the other objects each object is similar to the other objects in the cluster and different from objects in in the cluster and different from objects in all the other clusters.all the other clusters.

Page 4: Chapter 9 -- Cluster Analysis

Two Variable Cluster AnalysisTwo Variable Cluster Analysis

11

33

22

Low Low Frequency of Going to Fast Food RestaurantsFrequency of Going to Fast Food Restaurants High High

Low

Low

Fre

quen

cy o

f Ea

ting

Out

Freq

uenc

y of

Eat

ing

Out

H

igh

Hig

h

Page 5: Chapter 9 -- Cluster Analysis

Cluster Analysis of Eating Out StatementsCluster Analysis of Eating Out Statements • I eat out as often as I can.I eat out as often as I can.

• I eat out at fast food restaurants at least once a I eat out at fast food restaurants at least once a week.week.

• I prefer restaurants that have quick service.I prefer restaurants that have quick service.

• Eating at home is better than eating out.Eating at home is better than eating out.

• I prefer to eat at restaurants that have a nice I prefer to eat at restaurants that have a nice atmosphere.atmosphere.

• I prefer restaurants with the highest quality food.I prefer restaurants with the highest quality food. Objective: Identify groups that maximize ratio Objective: Identify groups that maximize ratio ofof

between groups variancebetween groups variance largelarge within groups variance smallwithin groups variance small

=

1 7

7-point Agree/Disagree Scale

Page 6: Chapter 9 -- Cluster Analysis

Three Cluster Diagram Showing Three Cluster Diagram Showing Between-Cluster and Within-Cluster VariationBetween-Cluster and Within-Cluster Variation

Between-Cluster Variation = MaximizeBetween-Cluster Variation = MaximizeWithin-Cluster Variation = MinimizeWithin-Cluster Variation = Minimize

Page 7: Chapter 9 -- Cluster Analysis

HighHigh

LowLowLowLow HighHigh

Scatter Diagram for Cluster Scatter Diagram for Cluster ObservationsObservations

Freq

uenc

y of

eat

ing

out

Freq

uenc

y of

eat

ing

out

Frequency of going to fast food Frequency of going to fast food restaurantsrestaurants

Page 8: Chapter 9 -- Cluster Analysis

HighHigh

LowLowLowLow HighHigh

Scatter Diagram for Cluster ObservationsScatter Diagram for Cluster Observations

Frequency of going to fast food Frequency of going to fast food restaurantsrestaurants

Freq

uenc

y of

eat

ing

out

Freq

uenc

y of

eat

ing

out

Page 9: Chapter 9 -- Cluster Analysis

High

LowLow High

Scatter Diagram for Cluster ObservationsScatter Diagram for Cluster Observations

Frequency of going to fast food Frequency of going to fast food restaurantsrestaurants

Freq

uenc

y of

eat

ing

out

Freq

uenc

y of

eat

ing

out

Page 10: Chapter 9 -- Cluster Analysis

HighHigh

LowLowLowLow HighHigh

Scatter Diagram for Cluster ObservationsScatter Diagram for Cluster Observations

Frequency of going to fast food Frequency of going to fast food restaurantsrestaurants

Freq

uenc

y of

eat

ing

out

Freq

uenc

y of

eat

ing

out

Page 11: Chapter 9 -- Cluster Analysis

Comparison of Score Profiles for Factor Comparison of Score Profiles for Factor Analysis and Hierarchical Cluster AnalysisAnalysis and Hierarchical Cluster Analysis

VariablesVariablesRespondentRespondent 11 22 33 AA 77 66 77 BB 66 77 66 CC 44 33 44

DD 33 44 33

7654321

Respondent ARespondent ARespondent BRespondent B

Respondent CRespondent CRespondent DRespondent D

Score

Page 12: Chapter 9 -- Cluster Analysis

What Can We Do With Cluster What Can We Do With Cluster Analysis?Analysis?

1.1. Determine if statistically different Determine if statistically different clusters exist.clusters exist.

2.2. Identify the meaning of the clusters.Identify the meaning of the clusters.

3.3. Explain how the clusters can be used.Explain how the clusters can be used.

Page 13: Chapter 9 -- Cluster Analysis

Research Design ConsiderationsResearch Design Considerations in Using Cluster Analysis:in Using Cluster Analysis:

• Outliers.Outliers.• Similarity/Distance Measures.Similarity/Distance Measures.• Standardizing the Data.Standardizing the Data.

Page 14: Chapter 9 -- Cluster Analysis

Cluster Analysis AssumptionsCluster Analysis Assumptions: :

• Representative Sample.Representative Sample.

• Minimal Multicollinearity.Minimal Multicollinearity.

Page 15: Chapter 9 -- Cluster Analysis

Three Basic QuestionsThree Basic Questions::

1.1. How to measure similarity?How to measure similarity?2.2. How to form clusters?How to form clusters?

(extraction method)(extraction method)

3.3. How many clusters?How many clusters?

Page 16: Chapter 9 -- Cluster Analysis

Answers to First Two Basic Answers to First Two Basic QuestionsQuestions::

1.1. How to measure similarity?How to measure similarity?• Distance – squared Distance – squared

Euclidean.Euclidean.

2.2. How to form clusters?How to form clusters?• Hierarchical – Wards Hierarchical – Wards

method.method.

Page 17: Chapter 9 -- Cluster Analysis

Third Basic Question: How many Third Basic Question: How many clusters?clusters?

1.1. Run cluster; examine solutions for two, Run cluster; examine solutions for two, three, four, etc. clusters ??three, four, etc. clusters ??

2.2. Select number of clusters based on “a Select number of clusters based on “a priori” criteria, practical judgement, priori” criteria, practical judgement, common sense, theoretical foundations, common sense, theoretical foundations, and statistical significance.and statistical significance.

Page 18: Chapter 9 -- Cluster Analysis

Steps in Cluster Steps in Cluster AnalysisAnalysis::

1.1. Identify the variables to be clustered.Identify the variables to be clustered.2.2. Determine if clusters exist. To do so, verify Determine if clusters exist. To do so, verify

the clusters are statistically different and the clusters are statistically different and theoretically meaningful (a logical name can be theoretically meaningful (a logical name can be assigned).assigned).

3.3. Make an initial decision on how many clusters Make an initial decision on how many clusters to use.to use.

4.4. Where possible, validate clusters using an Where possible, validate clusters using an external variable.external variable.

5.5. Describe the characteristics of the derived Describe the characteristics of the derived clusters using demographics, psychographics, clusters using demographics, psychographics, etc.etc.

Page 19: Chapter 9 -- Cluster Analysis

Variable Description Variable TypeWork Environment MeasuresX1 I am paid fairly for the work I do. MetricX2 I am doing the kind of work I want. MetricX3 My supervisor gives credit an praise for work well done. MetricX4 There is a lot of cooperation among the members of my work group. MetricX5 My job allows me to learn new skills. MetricX6 My supervisor recognizes my potential. MetricX7 My work gives me a sense of accomplishment. MetricX8 My immediate work group functions as a team. MetricX9 My pay reflects the effort I put into doing my work. MetricX10 My supervisor is friendly and helpful. MetricX11 The members of my work group have the skills and/or training

to do their job well. MetricX12 The benefits I receive are reasonable. MetricRelationship MeasuresX13 Loyalty – I have a sense of loyalty to Samouel’s restaurant. MetricX14 Effort – I am willing to put in a great deal of effort beyond that

expected to help Samouel’s restaurant to be successful. MetricX15 Proud – I am proud to tell others that I work for Samouel’s restaurant. MetricClassification VariablesX16 Intention to Search MetricX17 Length of Time an Employee NonmetricX18 Work Type = Part-Time vs. Full-Time NonmetricX19 Gender NonmetricX20 Age NonmetricX21 Performance Metric

Description of Employee Survey VariablesDescription of Employee Survey Variables

Page 20: Chapter 9 -- Cluster Analysis

Variable Description Variable TypeRestaurant PerceptionsX1 Excellent Food Quality MetricX2 Attractive Interior MetricX3 Generous Portions MetricX4 Excellent Food Taste MetricX5 Good Value for the Money MetricX6 Friendly Employees MetricX7 Appears Clean & Neat MetricX8 Fun Place to Go MetricX9 Wide Variety of menu Items MetricX10 Reasonable Prices MetricX11 Courteous Employees MetricX12 Competent Employees MetricSelection Factor RankingsX13 Food Quality NonmetricX14 Atmosphere NonmetricX15 Prices NonmetricX16 Employees NonmetricRelationship VariablesX17 Satisfaction MetricX18 Likely to Return in Future MetricX19 Recommend to Friend MetricX20 Frequency of Patronage NonmetricX21 Length of Time a Customer NonmetricClassification VariablesX22 Gender NonmetricX23 Age NonmetricX24 Income NonmetricX25 Competitor NonmetricX26 Which AD Viewed (#1, 2 or 3) NonmetricX27 AD Rating MetricX28 Respondents that Viewed Ads Nonmetric

Description of Customer Survey VariablesDescription of Customer Survey VariablesVS.VS.

Page 21: Chapter 9 -- Cluster Analysis

Using SPSS to Identify Clusters:Using SPSS to Identify Clusters:

For this example we are looking for subgroups among all the we are looking for subgroups among all the restaurant customers using the satisfaction variables. The SPSS click through restaurant customers using the satisfaction variables. The SPSS click through sequence is: Analyze sequence is: Analyze Classify Classify Hierarchical Cluster. This will take you to a Hierarchical Cluster. This will take you to a dialog box where you select and move variables Xdialog box where you select and move variables X1717, X, X1818 and X and X1919 into the into the “Variables” box. Now look at the other options below. We will use all the “Variables” box. Now look at the other options below. We will use all the defaults shown on the dialog box as well as the defaults for the Statistics and defaults shown on the dialog box as well as the defaults for the Statistics and Plots options below. Next click on the Method box and select Ward’s under Plots options below. Next click on the Method box and select Ward’s under Cluster Method (it is the last one and you must scroll down). Squared Cluster Method (it is the last one and you must scroll down). Squared Euclidean Distances is the default under Measure and we will use it. At this Euclidean Distances is the default under Measure and we will use it. At this point we will not need the Save option so click on “OK” to run the program.point we will not need the Save option so click on “OK” to run the program.

When the program finishes look for a table called Agglomeration When the program finishes look for a table called Agglomeration Schedule. There are lots of numbers in it, but we only use the numbers in the Schedule. There are lots of numbers in it, but we only use the numbers in the Coefficients column (middle of table). At the bottom of the agglomeration Coefficients column (middle of table). At the bottom of the agglomeration schedule table find the numbers in the Coefficients column. The number at the schedule table find the numbers in the Coefficients column. The number at the bottom will be the largest. As you move up the column the numbers (error bottom will be the largest. As you move up the column the numbers (error coefficients) get smaller. For example, thecoefficients) get smaller. For example, the bottom number is 834.255 and the bottom number is 834.255 and the one right above it is 282.850. one right above it is 282.850.

Page 22: Chapter 9 -- Cluster Analysis

Dialog Boxes for SPSS Dialog Boxes for SPSS ClusterCluster

Page 23: Chapter 9 -- Cluster Analysis

Error Coefficients for Cluster Error Coefficients for Cluster SolutionSolution

Error Coefficients

Page 24: Chapter 9 -- Cluster Analysis

New Cluster VariablesNew Cluster Variables

New two-group and three-group

variables.

Page 25: Chapter 9 -- Cluster Analysis

Cluster AnalysisCluster AnalysisLearning CheckpointLearning Checkpoint

1.1. Why might we use cluster analysis?Why might we use cluster analysis?2.2. What are the three major steps in What are the three major steps in

cluster analysis?cluster analysis?3.3. How do you decide how many How do you decide how many

clustersclusters to extract?to extract?4.4. Why do we validate clusters?Why do we validate clusters?

VS.VS.