survey analysis: data mining versus standard statistical...

44
Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses By Dean Abbott Abbott Analytics http://www.abbottanalytics.com Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA

Upload: others

Post on 30-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better

Analysis of Survey Responses

By Dean AbbottAbbott Analytics

http://www.abbottanalytics.com

Salford Systems Data Mining 2006March 27-31 2006

San Diego, CA

Page 2: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

2© Abbott Analytics, 2000-2006

Acknowledgements

Work done under contract with Seer AnalyticsWork done under contract with Seer Analytics

Subcontractors: Subcontractors: TessarTessar and Associates (now Mobile and Associates (now Mobile

Foundry), Abbott Consulting (now Abbott Analytics)Foundry), Abbott Consulting (now Abbott Analytics)

Seer Analytics, LLCSeer Analytics, LLC518 North Tampa Street518 North Tampa StreetTampa, FL 33602Tampa, FL 33602813813--318318--01110111http://http://www.seeranalytics.comwww.seeranalytics.com

we help you see what's there.

SEER

http://http://www.mobilefoundry.netwww.mobilefoundry.net//

Page 3: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

3© Abbott Analytics, 2000-2006

About Abbott Analytics

Abbott AnalyticsAbbott AnalyticsFounded in 1999, based in San Diego, CAFounded in 1999, based in San Diego, CA

Dedicated to data mining consulting and trainingDedicated to data mining consulting and training

Principal: Dean AbbottPrincipal: Dean AbbottApplied Data Mining for 19+ years inApplied Data Mining for 19+ years in

Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud Detection, Predictive Toxicology, Biological Risk AssessmentDetection, Predictive Toxicology, Biological Risk Assessment

Course InstructionCourse InstructionPublic 2Public 2--day Data Mining Coursesday Data Mining Courses

Conference TutorialsConference Tutorials

Customized Training and Knowledge TransferCustomized Training and Knowledge TransferData mining methodology (CRISPData mining methodology (CRISP--DM)DM)

Training services for software products, including CART, Training services for software products, including CART, Clementine, Clementine, AffiniumAffinium Model, Insightful MinerModel, Insightful Miner

Page 4: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

4© Abbott Analytics, 2000-2006

Talk Outline

Member surveyMember survey

Survey descriptionSurvey description

Results using statistical modelingResults using statistical modeling

Lessons learnedLessons learned

Employee surveyEmployee survey

Survey descriptionSurvey description

Results using decision trees (CART)Results using decision trees (CART)

Lessons learnedLessons learned

Page 5: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

5© Abbott Analytics, 2000-2006

Problem Setup: Member Survey

Question:Question:

What are the characteristics of members who indicated the What are the characteristics of members who indicated the

highest overall satisfaction with their Club?highest overall satisfaction with their Club?

Data:Data:

32,811 records containing survey answers32,811 records containing survey answers

No demographic data except what was on survey (marital No demographic data except what was on survey (marital

status, children, age, gender)status, children, age, gender)

Approach:Approach:

Create supervised learning models with target variable Create supervised learning models with target variable

““overall_satisfaction = 1overall_satisfaction = 1””

Page 6: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

6© Abbott Analytics, 2000-2006

Data Preparation

Begin with 57 candidate inputs to modelBegin with 57 candidate inputs to model

All survey questions are multiple choiceAll survey questions are multiple choice

Treated as categories, not numbersTreated as categories, not numbers

Typically 6 categories per question (1Typically 6 categories per question (1--5)5)

Unknown initially coded as Unknown initially coded as ““00””

No text comments fields included as inputs to modelNo text comments fields included as inputs to model

Create new column for target variableCreate new column for target variable

If overall_satisfaction = 1, variable value = 1,If overall_satisfaction = 1, variable value = 1,otherwise, variable value = 0otherwise, variable value = 0

Data very clean with respect to missing dataData very clean with respect to missing data

Only needed to record # children fieldsOnly needed to record # children fields

Number missingNumber missing

11,006 children < 6; 10,701 children 611,006 children < 6; 10,701 children 6--12; 10,873 children 1312; 10,873 children 13--17; 4,936 children 17; 4,936 children (overall)(overall)

When missing, recoded values with When missing, recoded values with ““--11”” to indicate missingto indicate missing

Page 7: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

7© Abbott Analytics, 2000-2006

Member Survey Question Categories

Page 8: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

8© Abbott Analytics, 2000-2006

Sampling

Begin with 32,811 responsesBegin with 32,811 responses

Set aside about half for validation (not used during Set aside about half for validation (not used during modeling): 16,379 recordsmodeling): 16,379 records

These records will be used to provide final summaries of the These records will be used to provide final summaries of the segmentssegments

16,433 records used in creating and scoring model16,433 records used in creating and scoring model

5,059 had overall satisfaction = 1 (30.8%)5,059 had overall satisfaction = 1 (30.8%)

Model 1 splits data into training and testing data: 2/3 for Model 1 splits data into training and testing data: 2/3 for training (creating model), 1/3 for testing (scoring and ranking training (creating model), 1/3 for testing (scoring and ranking models)models)

Approximately 11,503 for training; 4,930 for testingApproximately 11,503 for training; 4,930 for testing

Page 9: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

9© Abbott Analytics, 2000-2006

Relationship of Overall Satisfaction to Recommend to Friends

0 1 2 3 4

OVERALL.RA

0

1

2

3

4

5

RE

CO

MM

EN

D.

Overall satisfaction

Rec

omm

end

to F

riend

•Of the 4912 / 16739 (30.2%) with Overall Satisfaction = 1

•86% have Recommend to friends = 1

•Of the 8708 / 16739 (54%) with Recommend to Friends = 1

•49% have Overall Satis. = 1• 4227 / 16739 (26.0%) have both overall satisfaction and recommend to friends both equal to 1•This is the biggest bin of the cross tab, followed by

•Overall = 2 / recommend = 2 (24%; 3890 / 16739)•Overall = 2 / recommend = 1 (22%; 3565 / 16739)•No other bin greater than 5% of records

Page 10: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

10© Abbott Analytics, 2000-2006

Objective and Data Challenges

Project ObjectiveProject ObjectiveInterpret results of survey for large health clubInterpret results of survey for large health club (not a predictive model)(not a predictive model)

ChallengesChallengesMissing data (some questions either N/A or blank)Missing data (some questions either N/A or blank)

Solution: Impute values that least effect information communicatSolution: Impute values that least effect information communicated by ed by question (not a mean or median!)question (not a mean or median!)

Answers (target variables) highly correlated with one anotherAnswers (target variables) highly correlated with one another

MultiMulti--collinearity and interpretation of results problematiccollinearity and interpretation of results problematic

Must reduce dimensionality without losing interpretation of resuMust reduce dimensionality without losing interpretation of resultslts

Solution: Factor analysisSolution: Factor analysis

Target variableTarget variable

Three questions pointed to the important actionable information Three questions pointed to the important actionable information (related to (related to how satisfied members were)how satisfied members were)

Solution: combine all three into a new Solution: combine all three into a new ““index of excellenceindex of excellence””

Page 11: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

11© Abbott Analytics, 2000-2006

Data Preprocessing Approach

Reduce input data (for understanding)Reduce input data (for understanding)

Use factor analysis to identify groupings of variables that are Use factor analysis to identify groupings of variables that are

interesting. interesting.

Factors can be candidate inputs to models, but didnFactors can be candidate inputs to models, but didn’’t work as well on t work as well on

this datathis data

Selected as inputs, those variables with highest loadings as Selected as inputs, those variables with highest loadings as

representative of that type of factorrepresentative of that type of factor

Also retained key questions in addition to the factor analysis Also retained key questions in addition to the factor analysis

representative questionsrepresentative questions

The effect is to remove questions The effect is to remove questions ““too highlytoo highly”” correlated correlated

with one another, while maintaining relevant information for with one another, while maintaining relevant information for

modeling.modeling.

Page 12: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

12© Abbott Analytics, 2000-2006

Predictive Modeling Approach

Identify Key Questions

Identify Key Questions

Factor Analysis: 10 factors

Factor Analysis: 10 factors

Regression Model: Find Significant

Variables

Regression Model: Find Significant

Variables

Regression Model: Find Significant

Variables

Regression Model: Find Significant

Variables

3 questions with high association with target

10 factors, or variables that loaded highest on each factor

13 fields down to 7

Variable ranks

60+

Sur

vey

Que

stio

ns60

+ S

urve

y Q

uest

ions

3 key questions

Page 13: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

13© Abbott Analytics, 2000-2006

loadings

00.5

11.5

22.5

33.5

44.5

5

Factor1

Factor2

Facto r3

Factor4

Facto r5

Factor6

Factor7

Factor8

Factor9

Factor10

Factor

Lo

adin

g

loadings

Factor 1

0.00

0.20

0.40

0.60

0.80

1.00

Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

Top Question Loadings

Load

ing

Val

ue

Factor 2

0.00

0.20

0.40

0.60

0.80

Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23

Top Question Loadings

Load

ing

Val

ues

Factor Analysis: Making the Complex Simple

Page 14: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

14© Abbott Analytics, 2000-2006

Member Survey Factor Analysis Loadings

Page 15: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

15© Abbott Analytics, 2000-2006

Reduce Variables using Regression

Already beginning with Already beginning with

only 13 variablesonly 13 variables

Question: how many of Question: how many of

these are useful these are useful

predictors?predictors?

Decided to retain 5 Decided to retain 5

factors for final modelfactors for final model

Regression Rankings of Questions/Factors

0

0.1

0.2

0.3

0.4

0.5

0.6

Q44 Q22 Q25

facto

r3.2

facto

r3.9

facto

r3.1

facto

r3.4

facto

r3.3

facto

r3.8

facto

r3.1

0

facto

r3.6

facto

r3.5

facto

r3.7

Question/Factor

Reg

ress

ion

Co

effi

cien

t

Page 16: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

16© Abbott Analytics, 2000-2006

Explaining Results Through Visualization

Customer Customer waswas notnot interested in interested in ““technotechno”” solutionssolutions

Customer Customer waswas interested in what actions could be taken interested in what actions could be taken

as a result of the data mining modelsas a result of the data mining models

Which characteristics are most correlated with best Which characteristics are most correlated with best

customers?customers?

What do they like and dislike about the club?What do they like and dislike about the club?

Is it equipment? relationships? facility? staff?Is it equipment? relationships? facility? staff?

Show key contributors, how each club compared with other Show key contributors, how each club compared with other

club locations, and if club is improvingclub locations, and if club is improving

Page 17: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

17© Abbott Analytics, 2000-2006

Key: Explaining Results

Visualization shows Visualization shows

key variables in survey key variables in survey

associated with associated with

““excellenceexcellence””, and , and

performance metrics performance metrics

for each clubfor each club

How well did this How well did this

club do?club do?

What is the change What is the change

over last yearover last year’’s s

result?result?

Shows which attributes Shows which attributes

does the club need to does the club need to

improve to improve improve to improve

customer satisfaction.customer satisfaction.

relationships

facility

equipment

Staff 2Staff 1

goals

value

Drivers ofSatisfaction

Page 18: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

18© Abbott Analytics, 2000-2006

So What’s The Problem with That?

Regression, Neural Networks are Regression, Neural Networks are ““globalglobal”” estimatorsestimators

The operate over the entire data spaceThe operate over the entire data space

Descriptors of Regression represent Descriptors of Regression represent averageaverage influenceinfluence

Neither technique provides explicit localized characteristicsNeither technique provides explicit localized characteristics

Customer would like actionable analyticsCustomer would like actionable analytics

Clear characteristics of subgroups Clear characteristics of subgroups

Different strategies for subgroupsDifferent strategies for subgroups

Conclusion: In Round 2 (Employee Survey), use Conclusion: In Round 2 (Employee Survey), use another approachanother approach

Page 19: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

19© Abbott Analytics, 2000-2006

Employee Survey Analysis Problem Setup

Very similar to member surveyVery similar to member survey60+ questions60+ questions

Few demographicsFew demographics

Attitudes the jobAttitudes the job

How to handle questionsHow to handle questionsThey are ordinal, but CARTThey are ordinal, but CART®® supports interval and nominal supports interval and nominal typestypes

Treat as categorical, but make sure values arenTreat as categorical, but make sure values aren’’t split upt split upIf see a split on a question having values 1, 2, 4If see a split on a question having values 1, 2, 4——rebuild as interval rebuild as interval variablevariable

DidnDidn’’t happen this way thought happen this way though——all worked out wellall worked out well

Page 20: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

20© Abbott Analytics, 2000-2006

Employee Survey Question Groupings

Page 21: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

21© Abbott Analytics, 2000-2006

Employee Survey:Target Variable Definition

Predict key attitudes that are consequentsPredict key attitudes that are consequentsSatisfactionSatisfaction

Recommend to a FriendRecommend to a Friend

Intend to Work Next Year at ClubIntend to Work Next Year at Club

Club is Good Place to WorkClub is Good Place to Work

Exclude these from each othersExclude these from each others’’ modelsmodelsThey are highly correlated with each otherThey are highly correlated with each other

Models that predict a target variable with these as inputs are nModels that predict a target variable with these as inputs are not actionableot actionable

Key Predictors, questions relating to:Key Predictors, questions relating to:Communications with managementCommunications with management

Quality of supervisorsQuality of supervisors

Training receivedTraining received

Effectiveness of clubEffectiveness of club

Fairness of policiesFairness of policies

Perceived member attitudesPerceived member attitudes

Page 22: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

22© Abbott Analytics, 2000-2006

Employee Satisfaction (=1) Model: Data Information

File: modeling data with binarized dependents w missing.txtTarget Variable: Q1_1Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10,

Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21,Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31,Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47,Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57,Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65

Class N Cases Pct Cases0 4,645 76.0%1 1,470 24.0%

Page 23: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

23© Abbott Analytics, 2000-2006

Employee Satisfaction Model: Performance

Node

Cases Target Class

% of Node Tgt. Class

% Target Class

Cum % Tgt. Class

Cum % Pop % Pop

Cases in Node Cum lift Lift

8 859 60.75 58.44 58.44 23.12 23.12 1,414 2.53 2.53 4 95 43.58 6.46 64.90 26.69 3.57 218 2.43 1.81 7 201 42.23 13.67 78.57 34.47 7.78 476 2.28 1.76 3 30 17.44 2.04 80.61 37.29 2.81 172 2.16 0.73 5 92 14.38 6.26 86.87 47.75 10.47 640 1.82 0.60 6 14 13.86 0.95 87.82 49.40 1.65 101 1.78 0.58 2 124 10.12 8.44 96.26 69.44 20.03 1,225 1.39 0.42 1 55 2.94 3.74 100.00 100.00 30.56 1,869 1.00 0.12

Class N Cases N Misclassified Pct. Class0 4,645 953 20.521 1,470 315 21.43

Page 24: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

24© Abbott Analytics, 2000-2006

Employee Satisfaction Model: Splits

• Q8: Feel Welcome– Surrogate: Q27 (family friendly),

Q28 (inclusive environment), Q18 (good working conditions)

– Q18: Good working conditions– Surrogate: Q17 (necessary

support/materials to do job)

• Q3: Feeling of accomplishment– Surrogates: Q6 (responsibilities

good fit with interests/skills)– Q7: Staff Competent

– Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth)

1

2

3

8

Q36

Q3

Q7

Q32

Q3

Q18

Q8

4

5

6 7

Page 25: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

25© Abbott Analytics, 2000-2006

Employee Satisfaction:Q8 Split (root node)

Competitor Split Improvementwinner Q8 1 0.1174

1 Q18 1 0.11692 Q3 1 0.09983 Q35 1 0.09574 Q6 1 0.09515 Q7 1,2 0.094

Strongly agree feel welcome

Page 26: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

26© Abbott Analytics, 2000-2006

Employee Satisfaction:Q18 Split (right side or root)

This is the best terminal node for satisfaction

Strongly agree feel welcome

Competitor Split ImprovementWinner Q18 1 0.0271

1 Q3 1 0.02032 Q35 1 0.01953 Q6 1 0.01774 Q14 1,5 0.01725 Q13 1,5 0.0167

Page 27: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

27© Abbott Analytics, 2000-2006

Employee Satisfaction Model: Key Variables

Primary splitters onlyVariable ScoreQ18 100Q8 81.02Q14 72.03Q27 55.11Q26 50.53Q28 50.12Q5 17.66Q3 14.14Q17 14.05Q11 13.15Q7 11.89Q13 11.56Q6 11.27Q33 11.03Q16 9.6

Variable ScoreQ8 100Q18 23.11Q3 17.46Q7 14.68Q36 2.88Q32 2.68

• Q8: Feel Welcome– Surrogate: Q27 (family friendly),

Q28 (inclusive environment), Q18 (good working conditions)

– Q18: Good working conditions– Surrogate: Q17 (necessary

support/materials to do job)

• Q3: Feeling of accomplishment– Surrogates: Q6 (responsibilities

good fit with interests/skills)– Q7: Staff Competent

– Surrogates: Q15 (supervisor lets know work is appreciated), Q33 (trust management to take interests into account), Q5 (good opportunities for professional growth)

Page 28: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

28© Abbott Analytics, 2000-2006

Member Satisfaction Model: Key Rules

/*Rules for terminal node 8*/Matches • 1,414 surveys (23.1%), • 859 highly satisfied (60.8%),• 58.4% of all highly satisfied

RULE:If ( Q18 = 1 and Q8 = 1)Then Highly SatisfiedP(0) = 0.39;P(1) = 0.61; Lift 2.5

If strongly agree that there are good working conditions and

strongly agree that member feels welcome, then highly

satisfied

/*Rules for terminal node 7 */Matches

• 476 surveys (7.8%), • 201 highly satisfied (42.2%),• 13.7% of all highly satisfied

RULE:If ( Q8 = 1 and Q18 <> 1 and Q3 ==

1 and Q32 == 1 or 2)Then Highly SatisfiedP(0) = 0.58;P(1) = 0.42; Lift 1.8

If strongly agree that feel welcome and strongly agree working at the club gives feeling of personal accomplishment, and agree management will take interests into account, even if don’t

strongly agree good working conditions, then highly satisfied

/*Rules for terminal node 4 */Matches

• 218 surveys (3.6%), • 95 highly satisfied (43.6%),• 6.5% of all highly satisfied

RULE:If ( Q8 <> 1 and Q7 = 1 or 2 and

Q3 == 1 and Q36 == 1 or 2)Then Highly SatisfiedP(0) = 0.56;P(1) = 0.44; Lift 1.8

If agree that I’ll be recognized for doing a good job, and strongly agree working at the club gives feeling of personal accomplishment, and agree that am paid fairly, even if don’t

strongly agree feel welcome, then

highly satisfied

Page 29: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

29© Abbott Analytics, 2000-2006

Member Satisfaction Model: Unsatisfied Rules

/*Rules for terminal node 1*/

Matches • 1,869 surveys (30.6%),

• 55 highly satisfied (2.9%),• 3.7% of highly satisfied

• 39.0% of all not highly satisfied

RULE:

If ( Q8 <> 1 and Q7 <> 1 or 2)Then not highly satisfied

P(0) = 0.96;P(1) = 0.04; Lift 0.12

If don’t strongly agree that feel welcome and don’t agree that will be properly

recognized for a good job, then not highly satisfied.

/*Rules for terminal node 5*/

Matches • 640 surveys (10.5%),

• 92 highly satisfied (14.4%),• 6.3% of all highly satisfied

• 11.8% of all not highly satisfied

RULE:

If ( Q8 = 1 and Q18 <> 1 and Q3 <> 1)Then not highly satisfied

P(0) = 0.86;

P(1) = 0.14; Lift 0.58

If don’t strongly agree that there are good working conditions and don’t strongly

agree that feel welcome and work doesn’t give a feeling of accomplishment, even

though strongly agree that feel welcome,

then not highly satisfied.

/*Rules for terminal node 2 */

Matches • 1,225 surveys (20.0%),

• 124 highly satisfied (10.1%),• 8.4% of highly satisfied

• 23.7% of all not highly satisfied

RULE:If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 <>

1)

Then not highly satisfied P(0) = 0.90;

P(1) = 0.10; Lift 0.42

If don’t strongly agree that feel welcome

and work doesn’t give a feeling of accomplishment, even though I agree that

I will be properly recognized for a good

job, then not highly satisfied.

Page 30: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

30© Abbott Analytics, 2000-2006

Recommend to Friend (=1) Model: Data Information

File: modeling data with binarized dependents w missing.txtTarget Variable: Q44_1Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10,

Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q19, Q20, Q21,Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31,Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47,Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57,Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65

Class N Cases Pct0 3,958 64.7%1 2,157 35.3%

This model includes Q19 (am treated with respect), and is the best model to report

Page 31: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

31© Abbott Analytics, 2000-2006

Recommend to Friend Model Performance

Class N Cases N Misclassified Pct. Class0 3,958 894 22.591 2,157 525 24.34

NodeCases Target

Class% of Node Tgt. Class % Target Class

Cum % Tgt. Class

Cum % Pop % Pop

Cases in Node Cum lift Lift

10 1,113 71.90 51.60 51.60 25.32 25.32 1,548 2.04 2.04 9 110 58.51 5.10 56.70 28.39 3.07 188 2.00 1.66 5 198 56.57 9.18 65.88 34.11 5.72 350 1.93 1.60 4 128 49.81 5.93 71.81 38.32 4.20 257 1.87 1.41 8 83 45.36 3.85 75.66 41.31 2.99 183 1.83 1.29 3 215 29.49 9.97 85.63 53.23 11.92 729 1.61 0.84 7 36 24.83 1.67 87.30 55.60 2.37 145 1.57 0.70 2 132 15.60 6.12 93.42 69.44 13.84 846 1.35 0.44 6 12 14.12 0.56 93.97 70.83 1.39 85 1.33 0.40 1 130 7.29 6.03 100.00 100.00 29.17 1,784 1.00 0.21

Page 32: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

32© Abbott Analytics, 2000-2006

Recommend to Friend Model Splits

Q19: Treated with respectQ19: Treated with respect

Surrogates: Q18 (good working conditions) and Q8 Surrogates: Q18 (good working conditions) and Q8

(feel welcome)(feel welcome)

Q37: Compensation practice is fairQ37: Compensation practice is fair

Surrogates: Q36 (I am paid fairly)Surrogates: Q36 (I am paid fairly)

Q45: How think members rate club Q45: How think members rate club

Surrogates: Q47, Q46, Q60 (memberSurrogates: Q47, Q46, Q60 (member--cleanliness, cleanliness,

enough equip., check on progress)enough equip., check on progress)

Q33: Trust management to take interests into accountQ33: Trust management to take interests into account

Surrogates: Q32 (management keeps promises), Q34 Surrogates: Q32 (management keeps promises), Q34

(leaders remove roadblocks to inclusion)(leaders remove roadblocks to inclusion)

Q5: Good opportunities for professional growthQ5: Good opportunities for professional growth

Surrogates: Q4 (responsibilities good fit with interests), Surrogates: Q4 (responsibilities good fit with interests),

Q7 (appropriately recognized)Q7 (appropriately recognized)

Q8: Feel welcomeQ8: Feel welcome

Surrogates: Q7Surrogates: Q7

1

2

4 8

5 9

7

6Q8

Q5

Q45

Q33

Q50

Q35

Q45

Q37

Q19

10

3

Page 33: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

33© Abbott Analytics, 2000-2006

Recommend to Friend Model Key Variables

Primary splitters only

Variable ScoreQ8 100.0

Q19 99.1 Q18 97.4 Q15 64.5 Q16 63.1 Q14 61.3 Q33 39.6 Q35 33.8 Q32 24.7 Q34 23.9 Q31 23.9 Q9 21.5 Q7 15.4

Q45 14.8 Q37 12.9 Q5 10.0

Q36 9.7 Q4 4.3

Q38 4.0 Q22 1.6 Q50 1.4 Q26 1.0 Q48 0.8 Q47 0.7 Q28 0.6 Q46 0.6 Q11 0.3 Q51 0.3 Q60 0.1 Q49 0.0

Variable ScoreQ19 100Q33 32.23Q45 14.94Q37 12.99Q5 8.98Q8 3.03

Q35 1.67Q50 1.34

Q19: Treated with respectQ19: Treated with respect

Surrogates: Q18 (good working conditions) and Surrogates: Q18 (good working conditions) and

Q8 (feel welcome)Q8 (feel welcome)

Q37: Compensation practice is fairQ37: Compensation practice is fair

Surrogates: Q36 (I am paid fairly)Surrogates: Q36 (I am paid fairly)

Q45: How think members rate club Q45: How think members rate club

Surrogates: Q47, Q46, Q60 (memberSurrogates: Q47, Q46, Q60 (member--cleanliness, cleanliness,

enough equip., check on progress)enough equip., check on progress)

Q33: Trust management to take interests into Q33: Trust management to take interests into

accountaccount

Surrogates: Q32 (management keeps promises), Surrogates: Q32 (management keeps promises),

Q34 (leaders remove roadblocks to inclusion)Q34 (leaders remove roadblocks to inclusion)

Q5: Good opportunities for professional growthQ5: Good opportunities for professional growth

Surrogates: Q4 (responsibilities good fit with Surrogates: Q4 (responsibilities good fit with

interests), Q7 (appropriately recognized)interests), Q7 (appropriately recognized)

Q8: Feel welcomeQ8: Feel welcome

Surrogates: Q7Surrogates: Q7

Page 34: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

34© Abbott Analytics, 2000-2006

Recommend to Friend Model: Key Rules

/*Rules for terminal node 10*/

Matches • 1,548 surveys (25.3%),

• 1,113 recommend (71.9%),• 51.6% of all strong recommends

RULE:If ( Q19= 1 and Q37 = 1 or 2)

Then Recommend = 1

P(0) = 0.281;P(1) = 0.719;; Lift = 2.0

If strongly agree that supervisors treat me with respect, and agree that

compensation practice is fair, then

strongly agree that will recommend to

friend.

/*Rules for terminal node 9*/

Matches • 188 surveys (3.1%),

• 110 recommend 58.5%),• 5.1% of all strong recommends

RULE:If ( Q19 = 1 and Q37 <> 1or 2 and

Q45 = 1)Then Recommend = 1

P(0) = 0.415;P(1) = 0.585; Lift = 1.7

If strongly agree that supervisors treat

me with respect, and believe that

members strongly agree they are highly satisfied, even though don’t agree

compensation practice is fair, then strongly agree that will recommend to

friend

/*Rules for terminal node 5*/

Matches • 350 surveys (5.7%),

• 198 recommend (73.5%),

• 9.2% of all strong recommends

RULEIF ( Q19 <> 1 and Q33 = 1 or 2 and

Q45 = 1 )Then Recommend = 1

P(0)= 0.434;

P(1) = 0.566; Lift = 1.4

If agree that trust management will take my interests into account, and believe

that members strongly agree they are highly satisfied, even though don’t

strongly agree supervisors treat me with

respect, then strongly agree that will recommend to friend

Page 35: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

35© Abbott Analytics, 2000-2006

Recommend to Friend Model: Rules for Not Recommending

/*Rules for terminal node 1 */Matches

• 1,784 surveys (29.2%), • 130 highly recommend (7.3%), 94% don’t highly rec.

• 6.0% of all highly recommend

RULE:

If ( Q31 <> 1 and Q22 <> 1)Then Don’t Strongly Recommend

P(0) = 0.94

P(1) = 0.06;

If don’t strongly agree that supervisors treat me with respect, and don’t agree that management will take

interests into account, then don’t strongly agree that will recommend to friend.

/*Rules for terminal node 2 */Matches

• 846 surveys (13.84%),

• 132 highly recommend (15.6%), 84.4% don’t highly rec.• 6.1% of all highly recommend

RULE

If ( Q19 <>1and Q33 = 1or 2 and Q45 <> 1 and Q5 <> 1 or 2)

Then Don’t Strongly RecommendP(0) = 0.84;

P(1) = 0.16;

If don’t strongly agree that supervisors treat me with respect, and don’t strongly believe that members are highly satisfied, and don’t

agree that there are good opportunities for professional growth, then even though agree that management will take interests into account,

don’t strongly agree that will recommend to friend.

Page 36: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

36© Abbott Analytics, 2000-2006

Intend to Continue Working at Club (=1) Model: Data Information

File:modeling data with binarized dependents w missing.txtTarget Variable: Q39_1Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10,

Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21,Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31,Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47,Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57,Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65

Class N Cases Pct0 3,030 49.6%1 3,085 50.4%

Page 37: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

37© Abbott Analytics, 2000-2006

Intend to Continue Working at Club:Model Performance

Class N Cases N MisclassifiedPct.

Misclass0 3,030 868 28.651 3,085 849 27.52

Node

Cases

Target

Class

% of Node

Tgt. Class

% Target

Class

Cum %

Tgt. Class

Cum %

Pop % Pop

Cases in

Node Cum lift Lift

10 1,099 80.81 35.62 35.62 22.24 22.24 1,360 1.60 1.60

9 486 69.63 15.75 51.38 33.66 11.42 698 1.53 1.38

5 349 67.38 11.31 62.69 42.13 8.47 518 1.49 1.34

8 100 65.36 3.24 65.93 44.63 2.50 153 1.48 1.30

4 202 53.87 6.55 72.48 50.76 6.13 375 1.43 1.07

7 75 43.86 2.43 74.91 53.56 2.80 171 1.40 0.87

2 224 35.33 7.26 82.17 63.93 10.37 634 1.29 0.70

3 43 33.59 1.39 83.57 66.02 2.09 128 1.27 0.67

6 65 30.23 2.11 85.67 69.53 3.52 215 1.23 0.60

1 442 23.73 14.33 100.00 100.00 30.47 1,863 1.00 0.47

Page 38: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

38© Abbott Analytics, 2000-2006

Intend to Continue Working at Club Model: Splitters

• Q8: Feel Welcome

– Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions)

• Q69: Age

– Surrogate: Q66 (how long worked at Club), Q68 (education)

• Q18: Good Working Conditions

– Q17 (have necessary support and materials to do job)

• Q5: Good Opportunities for Professional Growth

– Q7, Q33 (Management will take my interests into account)

• Q7: Will be Recognized for Good Job

– Q15 (Work is appreciated)

Q56

Q66

Q7

Q5

Q6

Q5

Q18

Q69

Q8

1

2

3 4

65

87

9

10

Page 39: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

39© Abbott Analytics, 2000-2006

Intend to Continue Working at Club Model: Key Variables

Primary splitters only

Variable Score

Q8 100

Q18 84.13

Q27 63.23

Q11 57.03

Q28 50.45

Q26 48.54

Q7 43.43

Q5 37.23

Q33 32.81

Q31 23.56

Q69 22.21

Q4 21.86

Q9 18.79

Q3 13.82

Q13 9.98

Q14 9.46

Q16 8.12

Q15 6.03

Q66 5.26

Q17 3.99

Q56 2.15

Q6 2.03

Q23 1.63

Q68 1.23

Variable Score

Q8 100

Q5 37.07

Q69 17.48

Q7 11.24

Q18 10.7

Q66 5.19

Q56 2.15

Q6 2.03

• Q8: Feel Welcome

– Surrogate: Q27 (family friendly place), Q28 (diverse environment), Q18 (good working conditions)

• Q69: Age

– Surrogate: Q66 (how long worked at Club), Q68 (education)

• Q18: Good Working Conditions

– Q17 (have necessary support and materials to do job)

• Q5: Good Opportunities for Professional Growth

– Q7, Q33 (Management will take my interests into account)

• Q7: Will be Recognized for Good Job

– Q15 (Work is appreciated)

Page 40: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

40© Abbott Analytics, 2000-2006

Intend to Continue Working at Club Model: Key Rules

/*Rules for terminal node 10 */Matches • 1,360 surveys (22.2%), • 1,099 intend to continue

(80.8%),• 35.6% of all intend to continue

RULE:If (Q8 = 1 and Q69>=2.5 )Then Intend to continueP(0) = 0.19;P(1) = 0.81;; Lift = 1.6

If strongly agree that feel welcome and am 35 years old or

older, then strongly agree that intend to continue working at the club.

/*Rules for terminal node 9 */Matches

• 698 surveys (11.4%), • 486 intend to continue (69.6%),• 15.8% of all intend to continue

RULE:If ( Q8 = 1 and Q18 = 1and Q69 <= 2.5 )Then Intend to continueP(0) = 0.30;P(1) = 0.70; Lift = 1.4

If strongly agree that feel welcome and

strongly agree that there are good working conditions, am older than 35 years old, then strongly agree that intend to continue working at the club.

/*Rules for terminal node 5 */Matches

• 518 surveys (8.5%), • 349 intend to continue (67.4%),• 11.3% of all intend to contiue

RULEIF ( Q8 <> 1 and Q5 = 1 or 2 and Q7 = 1 or

2 and Q66 > 2.5 )Then Intend to continueP(0)= 0.32;P(1) = 0.68; Lift = 1.3

If I strongly agree that if I do a good job I’ll be recognized, and I strongly agree that there are good opportunities for professional growth, and I have worked at the club for more than 2 years, even though don’t strongly agree that

feel welcome , then I strongly agree that intend to continue working at the club.

Page 41: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

41© Abbott Analytics, 2000-2006

Intend to Continue Working at Club Model: Rules for Don’t Strongly Intend to Continue

/* Rules for terminal node 1 */Matches • 1,863 surveys (30.5%), • 442 strongly intend to continue working (23.7%),• 14.3% of all strongly intend to continue working• 46.9% of all not strongly intending to continue

RULE:If ( Q8 <> 1 and Q5 <> 1 or 2)Then not strongly intending to continue working at clubP(0) = 0.76;P(1) = 0.24; Lift 0.47

If don’t strongly agree that feel welcome and don’t strongly agree that there are good opportunities for professional growth, then don’t strongly agree that intend to continue working at the club.

/*Rules for terminal node 2 */Matches

• 634 surveys (10.4%), • 224 strongly intend to continue working (35.3%),• 7.3% of all strongly intend to continue working

• 13.5% of all not strongly intending to continue working

RULEIf ( Q8 <> 1 and Q5 = 1 or 2 and Q7 <> 1 or 2 )Then not strongly intending to continue working at clubP(0) = 0.65;

P(1) = 0.35; Lift 0.70

If don’t strongly agree that feel welcome and don’t strongly agree that if I do a good job I’ll be recognized, even though I strongly agree that there are good opportunities for professional growth, then don’t strongly

agree that intend to continue working at the club.

Page 42: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

42© Abbott Analytics, 2000-2006

Satisfaction ModelSatisfaction Model

Top two rules identify 65% of most satisfiedTop two rules identify 65% of most satisfied

Top three rules identify 79% of most satisfiedTop three rules identify 79% of most satisfied

Recommend to FriendRecommend to Friend

Top three rules identify 66% of most likely to recommend to Top three rules identify 66% of most likely to recommend to

friendfriend

Intend to Keep Working at ClubIntend to Keep Working at Club

Top three rules identify 63% of most likely to keep workingTop three rules identify 63% of most likely to keep working

Summary of Results

Page 43: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

43© Abbott Analytics, 2000-2006

Summary of Results

Satisfaction keys:Satisfaction keys:Make an environment where employees feel welcome, and have a senMake an environment where employees feel welcome, and have a sense se of purposeof purpose

Recommend to a Friend keysRecommend to a Friend keysSupervisors treat employees with respect and either good pay or Supervisors treat employees with respect and either good pay or it is it is perceived that members really like the clubperceived that members really like the club

Will work at club in a years timeWill work at club in a years timeFor those under 35: feel welcome (relationships)For those under 35: feel welcome (relationships)

For those over 35 (or worked at club a long time): feel welcome For those over 35 (or worked at club a long time): feel welcome and and good good working conditionsworking conditions

For those who donFor those who don’’t feel welcome, need good opportunities for t feel welcome, need good opportunities for professional growthprofessional growth

Page 44: Survey Analysis: Data Mining versus Standard Statistical ...docs.salford-systems.com/DeanAbbott.pdf · Prect key attitudes that are consequents Satisfaction Recoento a Friend Intento

44© Abbott Analytics, 2000-2006

Conclusions

Trees can be used to provide concise summaries Trees can be used to provide concise summaries of behavioral tendencies from surveys of behavioral tendencies from surveys

Regression shows global, average attitudesRegression shows global, average attitudes

Trees show specific, localized attitudesTrees show specific, localized attitudes

Two or three rules can describe nearly 2/3 of all Two or three rules can describe nearly 2/3 of all employee attitudes of interestemployee attitudes of interest

Rules make sense, and are easy to explainRules make sense, and are easy to explain

Rules and are actionableRules and are actionable