my five predictive analytics pet peeves april 2013...my five predictive analytics pet peeves dean...

Post on 10-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

My Five Predictive Analytics

Pet Peeves Dean Abbott

Abbott Analytics, Inc. Predictive Analytics World, San Francisco, CA (#pawcon)

April 16, 2013

Email: dean@abbottanalytics.com Blog: http://abbottanalytics.blogspot.com

Twitter: @deanabb © Abbott Analytics, Inc. 2001-2013 1

Topics

•  Why Pet Peeves? •  A call for humility for Predictive Modelers

•  The Five Pet Peeves 1.  Machine Learning Skills > Domain Expertise

2.  Just Build the Most Accurate Model!

3.  Significance?…What do you mean by Significance?

4.  My Algorithm is better than Your Algorithm

5.  My classifier calls everything 0…time to resample!

© Abbott Analytics, Inc. 2001-2013 2

Peeve 1 Which is Better: Machine Learning

Expertise or Domain Expertise?

•  Question: who is more important in the process of building predictive models: •  The Data Scientist / Predictive Modeler / Data Miner

•  The Domain Expert / Business Stakeholder?

© Abbott Analytics, Inc. 2001-2013 3

Photo from http://despair.com/superioritee.html

Which is Better: 2012 Strata Conference Debate?

From Strata Conference: http://radar.oreilly.com/2012/03/machine-learning-expertise-google-analytics.html

© Abbott Analytics, Inc. 2001-2013 4

“I think you can get pretty far with some

common sense, maybe Google-ing the

basic information you need to know

about a domain, and a lot of statistical

intuition”

Formula for Success?

© Abbott Analytics, Inc. 2001-2013 5

Conclusion: Frame the Problem First

•  Mark Driscoll: Moderator of Strata Debate •  “could you currently prepare your data for a Kaggle

competition? If so, then hire a machine learner. If not, hire a data scientist who has the domain expertise and the data hacking skills to get you there.” – http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine

•  But even this may not work, which brings me to the second pet peeve…

© Abbott Analytics, Inc. 2001-2013 6

Peeve 2 Just Build Accurate Models

•  The Problems with Model Accuracy:

1.  There’s More to Success than “Accuracy”

2.  Which Accuracy?

© Abbott Analytics, Inc. 2001-2013 7

The Winner is… Best Accuracy

© Abbott Analytics, Inc. 2001-2013 8

http://www.netflixprize.com/leaderboard

Why Model Accuracy is Not Enough: Netflix Prize

http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

© Abbott Analytics, Inc. 2001-2013 9

Why Data Science is Not Enough: Netflix Prize

http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

There’s more to a solution than accuracy—you have to be able to use it!

© Abbott Analytics, Inc. 2001-2013 10

Peeve 3 The Best Model Wins

•  We select the “winning model”, but is there a significant difference in model performance?

© Abbott Analytics, Inc. 2001-2013 12

KDD Cup 98 Results

© Abbott Analytics, Inc. 2001-2013 13

Calculator from http://www.answersresearch.com/means.php

Example: Statistical Significance without Practical Significance

Measure Control Campaign

(based on model)

Number Mailed 5,000,000 4,000,000 Response Rate 1% 1.011%

outside margin of error? yes i.e., statisticall significant? yes

expected responders 50,000 40,000 actual responders 50,000 40,440

difference 0 440

Revenue Per Responder $100 Total Revenue Expected $4,000,000

Total Revenue Actual $4,044,000 Difference Revenue $44,000

Significance based on z=2 (95.45% confidence)

•  Cost per contact: negligible (email)

•  Cost for analysts to build model: $80,000

© Abbott Analytics, Inc. 2001-2013 14

Peeve 4 My Algorithm is Better than Your Algorithm

© Abbott Analytics, Inc. 2001-2013 15

From 2011 Rexer Analytics Data Mining Survey

http://www.rexeranalytics.com/Data-Miner-Survey-Results-2011.html

Every Algorithm Has its Day

© Abbott Analytics, Inc. 2001-2013 16

Elder, IV, J. F., and Lee, S. S. (1997), “Bundling Heterogeneous Classi�ers with Advisor Perceptrons,” Technical Report, University of Idaho, October, 14.

Modeling)Technique)/>)Modeling)

Implementa6on)/>)Par6cipant)

Affilia6on)Loca6on))Par6cipant)

Affilia6on)Type)

AUCROC)(Trapezoidal)Rule))

AUCROC)(Trapezoidal)Rule))Rank))

Top)Decile)Response)Rat)

Top)Decile)Response)Rate)Rank)

TreeNet&+&Logis-c&Regression& Salford&Systems& Mainland&China& Prac--oner& 70.01%& 1& 13.00%& 7&

Probit&Regression& SAS& USA& Prac--oner& 69.99%& 2& 13.13%& 6&

MLP&+&nHTuple&Classifier& Brazil& Prac--oner& 69.62%& 3& 13.88%& 1&

TreeNet& Salford&Systems& USA& Prac--oner& 69.61%& 4& 13.25%& 4&

TreeNet& Salford&Systems& Mainland&China& Prac--oner& 69.42%& 5& 13.50%& 2&

Ridge&Regression& Rank& Belgium& Prac--oner& 69.28%& 6& 12.88%& 9&

2HLayer&Linear&Regression& USA& Prac--oner& 69.14%& 7& 12.88%& 9&

Log&Regr+&Decision&Stump&+&AdaBoost&+&VFI& Mainland&China& Academia& 69.10%& 8& 13.25%& 4&

Logis-c&Average&of&Single&Decision&Func-ons& Australia& Prac--oner& 68.85%& 9& 12.13%& 17&

Logis-c&Regression& Weka& Singapore& Academia& 68.69%& 10& 12.38%& 16&

Logis-c&Regression& Mainland&China& Prac--oner& 68.58%& 11& 12.88%& 9&

Decision&Tree&+&Neural&Network&+&Log.Regression& Singapore& 68.54%& 12& 13.00%& 7&

Scorecard&Linear&Addi-ve&Model& Xeno& USA& Prac--oner& 68.28%& 13& 11.75%& 20&

Random&Forest& Weka& USA& 68.04%& 14& 12.50%& 14&

Expanding&Regression&Tree&+&RankBoost&+&Bagging& Weka& Mainland&China& Academia& 68.02%& 15& 12.50%& 14&

Logis-c&Regression& SAS&+&Salford& India& Prac--oner& 67.58%& 16& 12.00%& 19&

J48&+&BayesNet& Weka& Mainland&China& Academia& 67.56%& 17& 11.63%& 21&

Neural&Network&+&General&Addi-ve&Model& Tiberius& USA& Prac--oner& 67.54%& 18& 11.63%& 21&

Decision&Tree&+&Neural&Network& Mainland&China& Academia& 67.50%& 19& 12.88%& 9&

Decision&Tree&+&Neural&Network&+&Log.&Regression& SAS& USA& Academia& 66.71%& 20& 13.50%& 2&

PAKDD Cup 2007 Results: Look at all them Algorithms!

•  18 Different Algorithms Used in Top 20 Solutions;

© Abbott Analytics, Inc. 2001-2013 17

http://lamda.nju.edu.cn/conf/pakdd07/dmc07/results.htm

Peeve 5 You Must Stratify Data

to Balance the Target Class

•  For example, 93% non-responders (N), 7% responders (R)

•  What’s the Problem? (The justification for resampling) •  “Sample is biased toward responders” •  “Models will learn non-responders better” •  “Most algorithms will generate models that say ‘call

everything a non-responder’ and get 93% correct classification!” (I used to say this too)

•  Most common solution: •  Stratify the sample to get 50%/50% (some will argue that one

only needs 20-30% responders)

© Abbott Analytics, Inc. 2001-2013 18

Neural Network Results on Same Data

© Abbott Analytics, Inc. 2001-2013 19

Distribution of Target

NOTE: all models built using JMP 10, SAS Institute, Inc.

Sample Decision Tree Built on Imbalanced Population

© Abbott Analytics, Inc. 2001-2013 20

Distribution of Target

Predictions of Target Variable Se

nsiti

vity

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00 0.20 0.40 0.60 0.80 1.001-Specificity

But….ROC Curve Looks like this

Why do we get a ROC Curve that looks OK, but the confusion matrix says “everything is N (No)”?

All Rows

5388Count

2778.7248G^2

114.84791LogWorth

AVG_DON>=12.6

2462Count

1906.1388G^2

16.436964LogWorth

REC_DON_AMT>=22

1110Count

1073.8838G^2

1.0384958LogWorth

RFA_2(L3F, L2F, L3G)

91Count

115.37798G^2

RFA_2(L4G, L2G, L1F, L1G, L1E, L2E, L4F)

1019Count

947.1647G^2

0.7753319LogWorth

MAX_DON_DT<9110

32Count

41.183459G^2

MAX_DON_DT>=9110

987Count

900.58503G^2

REC_DON_AMT<22

1352Count

772.40299G^2

1.5219174LogWorth

CARDPM12>=8

26Count

30.289597G^2

CARDPM12<8

1326Count

734.00892G^2

AVG_DON<12.6

2926Count

623.4558G^2

14.98445LogWorth

REC_DON_AMT>=15

1256Count

463.93369G^2

5.3217072LogWorth

MAX_DON_AMT>=21

155Count

122.97078G^2

MAX_DON_AMT<21

1101Count

317.08262G^2

REC_DON_AMT<15

1670Count

101.41981G^2

1.739958LogWorth

MAX_DON_AMT>=20

132Count

35.849605G^2

1.6609832LogWorth

CARDGIFT_LIFE<4

15Count

15.012073G^2

CARDGIFT_LIFE>=4

117Count

11.515776G^2

MAX_DON_AMT<20

1538Count

55.605138G^2

So What Happened?

•  Note: no algorithm predicts decisions (N or R): they all produce probabilities/likelihoods/confidences

•  Every data mining tool creates decisions (and by extension, forms confusion matrices) by thresholding the predicted probability by 0.5 (i.e., assuming equal likelihoods is the baseline)

•  When the imbalance is large, algorithms will not produce probs/likelihoods > 0.5… a score this large is far too unlikely for an algorithm to be “that sure”

© Abbott Analytics, Inc. 2001-2013 21

What the Predictions Looks Like

© Abbott Analytics, Inc. 2001-2013 22

Confusion Matrices For the Decision Tree: Before and After

Response_STR) N) R) Total)N& 2,798& 2,204& 5,002&

R& 45& 341& 386&

Total& 2,843& 2,545& 5,388&

© Abbott Analytics, Inc. 2001-2013 24

Response_STR) N) R) Total)N& 5,002& 0& 5,002&

R& 386& 0& 386&

Total& 5,388& 0& 5,388&

Decision Tree: Threshold at 0.071

Decision Tree: Threshold at 0.5

Conclusions •  The Rant is Done!

•  The Five Pet Peeves 1.  Machine Learning Skills > Domain Expertise

•  Be humble; we need both data science and domain experts! 2.  Just Build the Most Accurate Model!

•  Select the model that addresses your metric 3.  Significance?…What do you mean by Significance?

•  Don’t get hung up on “best” when many models will do well •  Learn from difference in patterns found by these models

4.  My Algorithm is better than Your Algorithm •  Don’t stress about the algorithm; learn to use a few very well

5.  My classifier calls everything 0…time to resample! •  Don’t throw away 0s needlessly; only do it when there are enough of them

that you won’t miss them.

© Abbott Analytics, Inc. 2001-2013 25

top related