my five predictive analytics pet peeves april 2013...my five predictive analytics pet peeves dean...

My Five Predictive Analytics

Pet Peeves Dean Abbott

Abbott Analytics, Inc. Predictive Analytics World, San Francisco, CA (#pawcon)

April 16, 2013

Email: dean@abbottanalytics.com Blog: http://abbottanalytics.blogspot.com

Twitter: @deanabb © Abbott Analytics, Inc. 2001-2013 1

Topics

•  Why Pet Peeves? •  A call for humility for Predictive Modelers

•  The Five Pet Peeves 1.  Machine Learning Skills > Domain Expertise

2.  Just Build the Most Accurate Model!

3.  Significance?…What do you mean by Significance?

4.  My Algorithm is better than Your Algorithm

5.  My classifier calls everything 0…time to resample!

Peeve 1 Which is Better: Machine Learning

Expertise or Domain Expertise?

•  Question: who is more important in the process of building predictive models: •  The Data Scientist / Predictive Modeler / Data Miner

•  The Domain Expert / Business Stakeholder?

Photo from http://despair.com/superioritee.html

Which is Better: 2012 Strata Conference Debate?

From Strata Conference: http://radar.oreilly.com/2012/03/machine-learning-expertise-google-analytics.html

“I think you can get pretty far with some

common sense, maybe Google-ing the

basic information you need to know

about a domain, and a lot of statistical

intuition”

Formula for Success?

Conclusion: Frame the Problem First

•  Mark Driscoll: Moderator of Strata Debate •  “could you currently prepare your data for a Kaggle

competition? If so, then hire a machine learner. If not, hire a data scientist who has the domain expertise and the data hacking skills to get you there.” – http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine

•  But even this may not work, which brings me to the second pet peeve…

Peeve 2 Just Build Accurate Models

•  The Problems with Model Accuracy:

1.  There’s More to Success than “Accuracy”

2.  Which Accuracy?

The Winner is… Best Accuracy

http://www.netflixprize.com/leaderboard

Why Model Accuracy is Not Enough: Netflix Prize

http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

Why Data Science is Not Enough: Netflix Prize

http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

There’s more to a solution than accuracy—you have to be able to use it!

Peeve 3 The Best Model Wins

•  We select the “winning model”, but is there a significant difference in model performance?

KDD Cup 98 Results

Calculator from http://www.answersresearch.com/means.php

Example: Statistical Significance without Practical Significance

Measure Control Campaign

(based on model)

Number Mailed 5,000,000 4,000,000 Response Rate 1% 1.011%

outside margin of error? yes i.e., statisticall significant? yes

expected responders 50,000 40,000 actual responders 50,000 40,440

difference 0 440

Revenue Per Responder $100 Total Revenue Expected $4,000,000

Total Revenue Actual $4,044,000 Difference Revenue $44,000

Significance based on z=2 (95.45% confidence)

•  Cost per contact: negligible (email)

•  Cost for analysts to build model: $80,000

Peeve 4 My Algorithm is Better than Your Algorithm

From 2011 Rexer Analytics Data Mining Survey

http://www.rexeranalytics.com/Data-Miner-Survey-Results-2011.html

Every Algorithm Has its Day

Elder, IV, J. F., and Lee, S. S. (1997), “Bundling Heterogeneous Classi�ers with Advisor Perceptrons,” Technical Report, University of Idaho, October, 14.

Modeling)Technique)/>)Modeling)

Implementa6on)/>)Par6cipant)

Affilia6on)Loca6on))Par6cipant)

Affilia6on)Type)

AUCROC)(Trapezoidal)Rule))

AUCROC)(Trapezoidal)Rule))Rank))

Top)Decile)Response)Rat)

Top)Decile)Response)Rate)Rank)

TreeNet&+&Logis-c&Regression& Salford&Systems& Mainland&China& Prac--oner& 70.01%& 1& 13.00%& 7&

Probit&Regression& SAS& USA& Prac--oner& 69.99%& 2& 13.13%& 6&

MLP&+&nHTuple&Classifier& Brazil& Prac--oner& 69.62%& 3& 13.88%& 1&

TreeNet& Salford&Systems& USA& Prac--oner& 69.61%& 4& 13.25%& 4&

TreeNet& Salford&Systems& Mainland&China& Prac--oner& 69.42%& 5& 13.50%& 2&

Ridge&Regression& Rank& Belgium& Prac--oner& 69.28%& 6& 12.88%& 9&

2HLayer&Linear&Regression& USA& Prac--oner& 69.14%& 7& 12.88%& 9&

Log&Regr+&Decision&Stump&+&AdaBoost&+&VFI& Mainland&China& Academia& 69.10%& 8& 13.25%& 4&

Logis-c&Average&of&Single&Decision&Func-ons& Australia& Prac--oner& 68.85%& 9& 12.13%& 17&

Logis-c&Regression& Weka& Singapore& Academia& 68.69%& 10& 12.38%& 16&

Logis-c&Regression& Mainland&China& Prac--oner& 68.58%& 11& 12.88%& 9&

Decision&Tree&+&Neural&Network&+&Log.Regression& Singapore& 68.54%& 12& 13.00%& 7&

Scorecard&Linear&Addi-ve&Model& Xeno& USA& Prac--oner& 68.28%& 13& 11.75%& 20&

Random&Forest& Weka& USA& 68.04%& 14& 12.50%& 14&

Expanding&Regression&Tree&+&RankBoost&+&Bagging& Weka& Mainland&China& Academia& 68.02%& 15& 12.50%& 14&

Logis-c&Regression& SAS&+&Salford& India& Prac--oner& 67.58%& 16& 12.00%& 19&

J48&+&BayesNet& Weka& Mainland&China& Academia& 67.56%& 17& 11.63%& 21&

Neural&Network&+&General&Addi-ve&Model& Tiberius& USA& Prac--oner& 67.54%& 18& 11.63%& 21&

Decision&Tree&+&Neural&Network& Mainland&China& Academia& 67.50%& 19& 12.88%& 9&

Decision&Tree&+&Neural&Network&+&Log.&Regression& SAS& USA& Academia& 66.71%& 20& 13.50%& 2&

PAKDD Cup 2007 Results: Look at all them Algorithms!

•  18 Different Algorithms Used in Top 20 Solutions;

http://lamda.nju.edu.cn/conf/pakdd07/dmc07/results.htm

Peeve 5 You Must Stratify Data

to Balance the Target Class

•  For example, 93% non-responders (N), 7% responders (R)

•  What’s the Problem? (The justification for resampling) •  “Sample is biased toward responders” •  “Models will learn non-responders better” •  “Most algorithms will generate models that say ‘call

everything a non-responder’ and get 93% correct classification!” (I used to say this too)

•  Most common solution: •  Stratify the sample to get 50%/50% (some will argue that one

only needs 20-30% responders)

Neural Network Results on Same Data

Distribution of Target

NOTE: all models built using JMP 10, SAS Institute, Inc.

Sample Decision Tree Built on Imbalanced Population

Distribution of Target

Predictions of Target Variable Se

0.00 0.20 0.40 0.60 0.80 1.001-Specificity

But….ROC Curve Looks like this

Why do we get a ROC Curve that looks OK, but the confusion matrix says “everything is N (No)”?

All Rows

5388Count

2778.7248G^2

114.84791LogWorth

AVG_DON>=12.6

2462Count

1906.1388G^2

16.436964LogWorth

REC_DON_AMT>=22

1110Count

1073.8838G^2

1.0384958LogWorth

RFA_2(L3F, L2F, L3G)

91Count

115.37798G^2

RFA_2(L4G, L2G, L1F, L1G, L1E, L2E, L4F)

1019Count

947.1647G^2

0.7753319LogWorth

MAX_DON_DT<9110

32Count

41.183459G^2

MAX_DON_DT>=9110

987Count

900.58503G^2

REC_DON_AMT<22

1352Count

772.40299G^2

1.5219174LogWorth

CARDPM12>=8

26Count

30.289597G^2

CARDPM12<8

1326Count

734.00892G^2

AVG_DON<12.6

2926Count

623.4558G^2

14.98445LogWorth

REC_DON_AMT>=15

1256Count

463.93369G^2

5.3217072LogWorth

MAX_DON_AMT>=21

155Count

122.97078G^2

MAX_DON_AMT<21

1101Count

317.08262G^2

REC_DON_AMT<15

1670Count

101.41981G^2

1.739958LogWorth

MAX_DON_AMT>=20

132Count

35.849605G^2

1.6609832LogWorth

CARDGIFT_LIFE<4

15Count

15.012073G^2

CARDGIFT_LIFE>=4

117Count

11.515776G^2

MAX_DON_AMT<20

1538Count

55.605138G^2

So What Happened?

•  Note: no algorithm predicts decisions (N or R): they all produce probabilities/likelihoods/confidences

•  Every data mining tool creates decisions (and by extension, forms confusion matrices) by thresholding the predicted probability by 0.5 (i.e., assuming equal likelihoods is the baseline)

•  When the imbalance is large, algorithms will not produce probs/likelihoods > 0.5… a score this large is far too unlikely for an algorithm to be “that sure”

What the Predictions Looks Like

Confusion Matrices For the Decision Tree: Before and After

Response_STR) N) R) Total)N& 2,798& 2,204& 5,002&

R& 45& 341& 386&

Total& 2,843& 2,545& 5,388&

Response_STR) N) R) Total)N& 5,002& 0& 5,002&

R& 386& 0& 386&

Total& 5,388& 0& 5,388&

Decision Tree: Threshold at 0.071

Decision Tree: Threshold at 0.5

Conclusions •  The Rant is Done!

•  The Five Pet Peeves 1.  Machine Learning Skills > Domain Expertise

•  Be humble; we need both data science and domain experts! 2.  Just Build the Most Accurate Model!

•  Select the model that addresses your metric 3.  Significance?…What do you mean by Significance?

•  Don’t get hung up on “best” when many models will do well •  Learn from difference in patterns found by these models

4.  My Algorithm is better than Your Algorithm •  Don’t stress about the algorithm; learn to use a few very well

5.  My classifier calls everything 0…time to resample! •  Don’t throw away 0s needlessly; only do it when there are enough of them

that you won’t miss them.

my five predictive analytics pet peeves april 2013...my five predictive analytics pet peeves dean...

Documents

ibm spss predictive analytics workshop · explore multiple...

predictive analytics demystified agenda •introduction...

in this book, you ll learn: predictive analytics...

big data analytics and predictive analytics - _ predictive...

predictive analytics

making predictive analytics more practical with...

microsoft predictive analytics

predictive –hr analytics

predictive analytics summit san diego february,...

software predictive analytics · predictive analytics -...

using predictive analytics

predictive analytics by discourse analytics

sap predictive analytics hands-on · pdf filee-book:...

predictive people analytics · predictive people analytics...

analytics overview #predictive analytics

understanding predictive analytics

predictive analytics & information governance ·...

predictive quality - spss analytics partner · evolution of...

ibm predictive analytics for defence · pdf fileibm...

predictive analytics