chapter 4: predictive modeling
DESCRIPTION
Chapter 4: Predictive Modeling. Chapter 4: Predictive Modeling. Objectives. Explain the concepts of predictive modeling. Illustrate the modeling essentials of a predictive model. Explain the importance of data partitioning. Catalog Case Study. Analysis Goal: - PowerPoint PPT PresentationTRANSCRIPT
1
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic Regression
4.4 Churn Case Study
4.5 A Note about Model Management
4.6 Recommended Reading
2
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic Regression
4.4 Churn Case Study
4.5 A Note about Model Management
4.6 Recommended Reading
3
Objectives Explain the concepts of predictive modeling. Illustrate the modeling essentials of a predictive
model. Explain the importance of data partitioning.
4
Catalog Case StudyAnalysis Goal:
A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future.
Data set: CATALOG2010
Number of rows: 48,356
Number of columns: 98
Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales
Targets: RESPOND (binary)
ORDERSIZE (continuous)
5
Where You’ve Been, Where You’re Going… With basic descriptive modeling techniques (RFM), you
identified customers who might be profitable. Sophisticated predictive modeling techniques can
produce risk scores for current customers, profitable prospects from outside the customer database, cross-sell and up-sell lists, and much more.
Scoring techniques based on predictive models can be implemented in real-time data collection systems, automating the process of fact-based decision making.
6
Descriptive Modeling Tells You about NowDescriptive statistics inform you about your sample. This information is important for reacting to things that have happened in the past.
Past BehaviorFact-Based
Reports Current State of
the Customer
7
From Descriptive to Predictive ModelingPredictive modeling techniques, paired with scoring and good model management, enable you to use your data about the past and the present to make good decisions for the future.
Fact-Based PredictionsPast Behavior
8
Predictive Modeling Terminology
The observations in a training data set are known as training cases.
The variables are called inputs and targets.
inputs target
Training Data Set
9
Predictive Model
Predictive model: a concise representation of the input and target association
Training Data Setinputs target
10
Predictive Model
predictions
Predictions: output of the predictive model given a set of input measurements
inputs
11
Modeling Essentials
Determine type of prediction.
Select useful inputs.
Optimize complexity.
12
Select useful inputs.
Optimize complexity.
Modeling Essentials
Determine type of prediction.
13
Three Prediction Types
rankings
estimates
decisionspredictioninputs
14
Decision Predictions
A predictive model usesinput measurementsto make the best decision for each case.
primary
secondary
secondary
primary
tertiary
inputs prediction
15
Ranking Predictions
A predictive model usesinput measurementsto optimally rank each case.
prediction
720
520
630
470
580
inputs
16
Estimate Predictions
A predictive model usesinput measurementsto optimally estimate the target value.
prediction
0.65
0.33
0.75
0.28
0.54
inputs
17
Idea ExchangeThink of two or three business problems that would require each of the three types of prediction. What would require a decision? How would you obtain
information to help you in making a decision based on a model score?
What would require a ranking? How would you use this ranking information?
What would require an estimate? Would you estimate a continuous quantity, a count, a proportion, or some other quantity?
18
Select useful inputs.
Optimize complexity.
Modeling Essentials – Predict Review
Determine type of prediction. Decide, rank,and estimate.
19
Select useful inputs.
Determine type of prediction.
Optimize complexity.
Modeling Essentials
20
Input Reduction Strategies
Irrelevancy
0.70
0.60
0.50
0.40
x4
x3
Redundancy
x1
x2
21
Irrelevancy
0.70
0.60
0.50
0.40
x4
x3
Input Reduction – Redundancy
Redundancy
x1
x2
Input x2 has the same information as input x1.
Example: x1 is household income and x2 is home value.
22
Redundancy
x1
x2
Input Reduction – IrrelevancyIrrelevancy
0.70
0.60
0.50
0.40
x4
x3
Predictions change with input x4 but much
less with input x3.
Example: Target is response to direct mail solicitation, x3 is religious affiliation, and x4 is response to previous solicitations.
23
Modeling Essentials – Select Review
Eradicateredundancies
and irrelevancies.
Decide, rank,and estimate.
Select useful inputs.
Determine type of prediction.
Optimize complexity.
24
Select useful inputs.
Modeling Essentials
Determine type of prediction.
Optimize complexityOptimize complexity.
25
Data PartitioningTraining Data Validation Data
Partition available data into training and validation sets.
The model is fit on the training data set, and model performance is evaluated on the validation data set.
inputs target inputs target
26
5
4
3
2
1
Predictive Model SequenceTraining Data Validation Data
Create a sequence of models with increasing complexity.
ModelComplexity
inputs target inputs target
27
5
4
3
2
1
Model Performance AssessmentTraining Data Validation Data
ModelComplexity
ValidationAssessment
Rate model performance using validation data.
inputs target inputs target
28
3
Model SelectionTraining Data Validation Data
ModelComplexity
ValidationAssessment
Select the simplest model with the highest validation assessment.
inputs target inputs target
29
4.01 Multiple Choice PollThe best model is the
a. simplest model with the best performance on the training data.
b. simplest model with the best performance on the validation data.
c. most complex model with the best performance on the training data.
d. most complex model with the best performance on the validation data.
30
4.01 Multiple Choice Poll – Correct AnswerThe best model is the
a. simplest model with the best performance on the training data.
b. simplest model with the best performance on the validation data.
c. most complex model with the best performance on the training data.
d. most complex model with the best performance on the validation data.
31
Select useful inputs.
Modeling Essentials – Optimize Review
Determine type of prediction.
Optimize complexity.
Eradicateredundancies
and irrelevancies.
Decide, rank,and estimate.
Tune models withvalidation data.
32
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees 4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic Regression
4.4 Churn Case Study
4.5 A Note about Model Management
4.6 Recommended Reading
33
Objectives Explain the concept of decision trees. Illustrate the modeling essentials of decision trees. Construct a decision tree predictive model in
SAS Enterprise Miner.
34
Modeling Essentials – Decision Trees
Determine type of prediction.
Select useful inputs.
Optimize complexity.
35
Simple Prediction Illustration
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
Predict dot color for each x1 and x2.
Training Data
36
Decision Tree Prediction Rules
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
60%55%
70%
x1
<0.52 ≥0.52 <0.51 ≥0.51x1
x2
<0.63 ≥0.63
root node
interior node
leaf node
37
Decision Tree Prediction Rules
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
60%55%
x1
<0.52 ≥0.52
<0.63
70%
<0.51 ≥0.51x1
x2
≥0.63
root node
interior node
leaf node
Predict:
38
≥0.51
60%55%
x1
<0.52 ≥0.52
<0.63
40%
60%55%
x1
<0.52 ≥0.52 ≥0.51
<0.63
Decision Tree Prediction Rules
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
Decision = Estimate = 0.70
70%
<0.51x1
x2
≥0.63
Predict:
39
Prediction rulesDetermine type of prediction.
Modeling Essentials – Decision Trees
Pruning
Split searchSelect useful inputs
Optimize complexity.
Select useful inputs.
40
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
Calculate the logworth of every partition on input x1.
left right
Classification Matrix
41
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
maxlogworth(x1)
0.95
0.52left right
Select the partition with the maximum logworth.
42
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
maxlogworth(x1)
0.95
left right
53%53% 42%42%
47%47% 58%58%
Repeat for input x2.
43
Decision Tree Split Search
maxlogworth(x1)
0.95
left right
53%53% 42%42%
47%47% 58%58%
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
0.63
maxlogworth(x2)
4.92
bottom top
44
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
maxlogworth(x2)
4.92
bottom top
maxlogworth(x1)
0.95
left right
Compare partition logworth ratings.
45
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
0.63
x2<0.63 ≥0.63
Create a partition rule from the best partition across all inputs.
46
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
x2<0.63 ≥0.63
Repeat the process in each subset.
47
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
0.52
maxlogworth(x1)
5.72
left right
48
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
maxlogworth(x1)
5.72
left right
61%61% 55%55%
39%39% 45%45%
0.02
maxlogworth(x2)
-2.01
bottom top
49
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
maxlogworth(x2)
-2.01
bottom top
maxlogworth(x1)
5.72
left right
50
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
0.52
maxlogworth(x2)
-2.01
bottom top
38%38% 55%55%
62%62% 45%45%
maxlogworth(x1)
5.72
left right
51
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
x2
x1
<0.63 ≥0.63
<0.52 ≥0.52
Create a second partition rule.
52
Repeat to form a maximal tree.
Decision Tree Split Search
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
53
4.02 PollThe maximal tree is usually the tree that you use to score new data.
Yes
No
54
4.02 Poll – Correct AnswerThe maximal tree is usually the tree that you use to score new data.
Yes
No
55
Modeling Essentials – Decision Trees
Optimize complexityOptimize complexity.
Prediction rulesDetermine type of prediction.
Split searchSelect useful inputs.
56
6
5
4
3
2
Predictive Model SequenceTraining Data Validation Data
ModelComplexity
1
Create a sequence of models with increasing complexity.
inputs target inputs target
57
The Maximal TreeTraining Data Validation Data
6
5
4
3
2
ModelComplexity
1Maximal
Tree
A maximal tree is the most complex model in the sequence.
Create a sequence of models with increasing complexity.
inputs target inputs target
58
The Maximal TreeTraining Data Validation Data
5
4
3
2
ModelComplexity
1
A maximal tree is the most complex model in the sequence.
inputs target inputs target
60
Pruning One SplitTraining Data Validation Data
4
3
2
1
ModelComplexity
Each subtree’s predictive performance is rated on validation data.
inputs target inputs target
61
Pruning One SplitTraining Data Validation Data
4
3
2
1
ModelComplexity
The subtree with the highest validation assessment is selected.
inputs target inputs target
62
Pruning Two SplitsTraining Data Validation Data
4
3
2
1
ModelComplexity
Similarly, this is done for subsequent models.
inputs target inputs target
63
Pruning Two SplitsTraining Data Validation Data
3
2
1
ModelComplexity
Prune two splits from the maximal tree,…
inputs target inputs target
continued...
64
Pruning Two SplitsTraining Data Validation Data
3
2
1
ModelComplexity
…rate each subtree using validation assessment, and…
inputs target inputs target
continued...
65
Pruning Two SplitsTraining Data Validation Data
3
2
1
ModelComplexity
…select the subtree with the best assessment rating.
inputs target inputs target
66
Subsequent PruningTraining Data Validation Data
ModelComplexity
Continue pruning until all subtrees are considered.
inputs target inputs target
67
Selecting the Best Tree Training Data Validation Data
ModelComplexity
ValidationAssessment
Compare validation assessment between tree complexities.
inputs target inputs target
68
Validation AssessmentTraining Data Validation Data
Choose the simplest model with highest validation assessment.
ModelComplexity
ValidationAssessment
inputs target inputs target
69
Validation AssessmentTraining Data Validation Data
What are appropriate validation assessmentratings?
inputs target inputs target
70
Assessment Statistics
inputs target
Validation Data
target measurement (binary, continuous, and so on)
prediction type (decisions, rankings, estimates)
Ratings depend on…
71
inputs
Binary Targets
primary outcomesecondary outcome
target
1
0
1
1
0
72
inputs
Binary Target Predictions
target
1
0
1
1
0
prediction
primary
secondary
0.249
720
520 rankings
estimates
decisions
73
inputs
Decision Optimization
target
1
0
1
1
0
prediction
0.249
720
520
primary
secondary
decisions
74
inputs
Decision Optimization – Accuracy
target
1
0
1
1
0
prediction
0.249
720
520
primary
secondary
true positive true positive
true negativetrue negative
Maximize accuracy: agreement between outcome and prediction
75
inputs
Decision Optimization – Misclassification
target
1
0
1
1
0
prediction
0.249
720
520
secondary
primarysecondary
primary
false negativefalse negative
false positivefalse positive
Minimize misclassification: disagreement between outcome and prediction
76
inputs
Ranking Optimization
target
1
0
1
1
0
prediction
0.249
720
520
secondary
primary
1
0
720
520 rankings
estimates
decisions
77
inputs
Ranking Optimization – Concordance
target
1
0
1
1
0
prediction
0.249
720
520
secondary
primary
1
0
720
520
Maximize concordance: proper ordering of primary and secondary outcomes
target=0→low score target=1→high scoretarget=0→low score target=1→high score
78
inputs
Ranking Optimization – Discordance
target
1
0
1
1
0
prediction
0.249
secondary
primary
0
1
720
520
target=0→high scoretarget=1→low scoretarget=0→high scoretarget=1→low score
Minimize discordance: improper ordering of primary and secondary outcomes
720
520
79
inputs
Estimate Optimization
target
1
0
1
1
0
prediction
0.249
secondary
primary
720
520
1 0.249
rankings
estimates
decisions
80
inputs
Estimate Optimization – Squared Error
target
1
0
1
1
0
prediction
0.249
secondary
primary
720
520
1 0.249 (target – estimate)2(target – estimate)2
Minimize squared error:squared difference between target and prediction
81
inputs
Complexity Optimization – Summary
target
1
0
1
1
0
prediction
0.249
secondary
primary
720
520concordance / discordance
squared error
accuracy / misclassification
rankings
estimates
decisions
82
4.03 QuizWhat are some target variables that you might encounter that would require optimizing on… accuracy/misclassification? concordance/discordance? average squared error?
83
Statistical Graphs
ROC Curves
Gains and Lift Charts
84
Decision Matrix
TrueNegative
FalsePositive
FalseNegative
TruePositive
ActualNegative
PredictedNegative
PredictedPositive
ActualPositive
Predicted ClassA
ctua
l Cla
ss 0
1
0 1
85
Sensitivity
TruePositive
PredictedPositive
ActualPositive
Predicted ClassA
ctua
l Cla
ss 0
1
0 1
86
Positive Predicted Value
TruePositive
PredictedPositive
ActualPositive
Predicted ClassA
ctua
l Cla
ss 0
1
0 1
87
Specificity
TrueNegative
ActualNegative
PredictedNegative
Predicted ClassA
ctua
l Cla
ss 0
1
0 1
88
Negative Predicted Values
TrueNegative
ActualNegative
PredictedNegative
Predicted ClassA
ctua
l Cla
ss 0
1
0 1
89
ROC Curve
90
Gains Chart
91
Catalog Case Study: Steps to Build a Decision Tree1. Add the CATALOG2010 data source to the diagram.
2. Use the Data Partition node to split the data into training and validation data sets.
3. Use the Decision Tree node to select useful inputs.
4. Use the Model Comparison node to generate model assessment statistics and plots.
92
Constructing a Decision Tree Predictive Model
Catalog Case Study
Task: Construct a decision tree model.
93
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic 4.3 Predictive Modeling Using Logistic RegressionRegression
4.4 Churn Case Study
4.5 A Note about Model Management
4.6 Recommended Reading
94
Objectives Explain the concepts of logistic regression. Discuss modeling strategies for building a
predictive model. Fit a predictive logistic regression model in
SAS Enterprise Miner.
95
Modeling Essentials – Regressions
Determine type of prediction.
Select useful inputs.
Optimize complexity.
97
Simple Linear Regression Model
Regression Best Fit Line
98
Linear Regression Prediction Formula
parameterestimate
inputmeasurement
interceptestimate
= β0 + β1 x1 + β2 x2 ^ ^ ^y prediction
estimate^
Choose intercept and parameter estimates to minimize:
∑( yi – yi )2
trainingdata
^squared error function
99
Binary Target
Linear regression does not work, because whatever the form of the equation, the results are generally unbounded.
Instead, you work with the probability p that the event will occur rather than a direct classification.
100
Odds Instead of ProbabilityConsider the probability p of an event (such as a horse losing a race) occurring.
The probability of the event not occurring is 1-p.
The odds of the event happening are p:(1-p), although you more commonly express this as integers, such as a 19-to-1 long shot at the race track.
The ratio19:1 means that the horse has one chance of winning for 19 chances of losing, or the probability of winning is 1/(19+1) = 5%.
1loss
win
p podds
p p
101
Properties of Odds and Log Odds
Odds is not symmetric, varying from 0 to infinity.
Odds is 1 when the probability is 50%.
Log Odds is symmetric, going from minus infinity to positive infinity, like a line.
Log Odds is 0 when the probability is 50%.
It is highly negative for low probabilities and highly positive for high probabilities.
Properties of Odds versus Log Odds
-5
-4
-3
-2
-1
0
1
2
3
4
5
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
odds
log odds
102
Logistic Regression Prediction Formula
= β0 + β1 x1 + β2 x2 ^ ^ ^
logit scores
^log
p
1 – p( )^
103
Logit Link Function
logitlink function
0 1
5
-5
The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).
^log
p
1 – p( )^
logit scores= β0 + β1 x1 + β2 x2 ^ ^ ^
104
Logit Link Function
^log
p
1 – p( )^
1
1 + e-logit( p )p = ^^
^logit( p )
To obtain prediction estimates, the logit equation is solved for p. ^
== β0 + β1 x1 + β2 x2 ^ ^ ^
105
4.04 PollLinear regression on a binary target is a problem because predictions can range outside of 0 and 1.
Yes
No
106
4.04 Poll – Correct AnswerLinear regression on a binary target is a problem because predictions can range outside of 0 and 1.
Yes
No
107
Simple Prediction Illustration – Regressions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
Predict dot color for each x1 and x2.
Need intercept and parameter estimates.
= β0 + β1 x1 + β2 x2 ^ ^ ^logit( p ) ^
108
Simple Prediction Illustration – Regressions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
log-likelihood function
Find parameter estimates by maximizing.
= β0 + β1 x1 + β2 x2 ^ ^ ^logit( p ) ^
109
Simple Prediction Illustration – Regressions
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40
0.50
0.60
0.70
Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2.
110
Regressions: Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
111
Regressions: Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
112
Missing Values and Regression Modeling
Training Datatargetinputs
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.
113
Missing Values and Regression Modeling
Consequence: Missing values can significantly reduce your amount of training data for regression modeling!
Training Datatargetinputs
114
Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? )
Problem 2: Prediction formulas cannot score cases with missing values.
115
Missing Values and the Prediction Formula
Problem 2: Prediction formulas cannot score cases with missing values.
116
Missing Value Issues
Manage missing values.
Problem 2: Prediction formulas cannot score cases with missing values.
Problem 1: Training data cases with missing valueson inputs used by a regression model are ignored.
117
Missing Value Causes
Manage missing values.
Non-applicable measurement
No match on merge
Non-disclosed measurement
118
Missing Value Remedies
Manage missing values.
xi = f(x1, … ,xp)
Non-applicable measurement
No match on merge
Non-disclosed measurement
119
4.05 PollObservations with missing values should always be deleted from scoring because a predicted value cannot be determined.
Yes
No
120
4.05 Poll – Correct AnswerObservations with missing values should always be deleted from scoring because a predicted value cannot be determined.
Yes
No
121
Predictionformula
Modeling Essentials – Regressions
Best modelfrom sequence
Sequentialselection
Determine type of predictions.
Select useful inputs
Optimize complexity.
Select useful inputs.
122
Variable Redundancy
123
X1
X3
X4
X6
X8
X9
X10
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
Variable Clustering
Inputs are selected bycluster representationexpert opiniontarget correlation.
X1
X3
X4
X6
X8
X9
X10
124
Selection by 1 – R2 Ratio
Own Cluster
Next Closest
R2 = 0.90
R2 = 0.01
1-R 2
own cluster
1-R 2
next closest
1 – 0.90
1 – 0.01= = 0.101
X2
125
Predictionformula
Modeling Essentials – Regressions
Best modelfrom sequence
Determine type of prediction.
Select useful inputs
Optimize complexity.
Select useful inputs. Sequentialselection
126
Sequential Selection – Forward
Entry CutoffInput p-value
127
Sequential Selection – Forward
Entry CutoffInput p-value
128
Sequential Selection – Forward
Entry CutoffInput p-value
129
Sequential Selection – Forward
Entry CutoffInput p-value
130
Sequential Selection – Forward
Entry CutoffInput p-value
131
Sequential Selection – Forward
Entry CutoffInput p-value
132
Sequential Selection – Backward
Stay CutoffInput p-value
133
Sequential Selection – Backward
Stay CutoffInput p-value
134
Sequential Selection – Backward
Stay CutoffInput p-value
135
Sequential Selection – Backward
Stay CutoffInput p-value
136
Sequential Selection – Backward
Stay CutoffInput p-value
137
Sequential Selection – Backward
Stay CutoffInput p-value
138
Sequential Selection – Backward
Stay CutoffInput p-value
139
Sequential Selection – Backward
Stay CutoffInput p-value
140
Sequential Selection – Backward
Stay CutoffInput p-value
141
Sequential Selection – Stepwise
Input p-value Entry Cutoff
Stay Cutoff
142
Sequential Selection – Stepwise
Input p-value Entry Cutoff
Stay Cutoff
143
Sequential Selection – Stepwise
Input p-value Entry Cutoff
Stay Cutoff
144
Sequential Selection – Stepwise
Input p-value Entry Cutoff
Stay Cutoff
145
Sequential Selection – Stepwise
Input p-value Entry Cutoff
Stay Cutoff
146
Sequential Selection – Stepwise
Input p-value Entry Cutoff
Stay Cutoff
147
4.06 PollDifferent model selection methods often result in different candidate models. No one method is uniformly the best.
Yes
No
148
4.06 Poll – Correct AnswerDifferent model selection methods often result in different candidate models. No one method is uniformly the best.
Yes
No
149
Modeling Essentials – Regressions
Determine type of prediction.
Select useful inputs.
Optimize complexity.
Predictionformula
Variable clusteringand selection
150
Model Fit versus Complexity
1 2 3 4 5 6
Model fit statistic
training
validation
151
Select Model with Optimal Validation Fit
1 2 3 4 5 6
Model fit statistic
Evaluate eachsequence step.
152
Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
153
Interpretation
x1
x2 x1
x2
Unit change in x2
2 change in logit
logit(p) p
100(exp(2)-1)%change in the odds
154
Odds Ratio from a Logistic Regression ModelEstimated logistic regression model:
logit(p) = .7567 + .4373*(gender)
Estimated odds ratio (Females to Males):
odds ratio = (e-.7567+.4373)/(e-.7567) = 1.55
An odds ratio of 1.55 means that females have 1.55 times the odds of having the outcome compared to males.
155
Properties of the Odds Ratio
Group in denominatorhas higher odds of the event.
Group in numeratorhas higher odds of the event.
No Association
0 1
156
Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
157
Extreme Distributions and Regressions
high leverage pointsskewed inputdistribution
Original Input Scale
158
Extreme Distributions and Regressions
high leverage pointsskewed inputdistribution
true association
true association
Original Input Scale
159
Extreme Distributions and Regressions
high leverage pointsskewed inputdistribution
standard regression
true association
standard regression
true association
Original Input Scale
160
Extreme Distributions and Regressions
high leverage pointsskewed inputdistribution
standard regression
true association
standard regression
true association
Original Input Scale
more symmetricdistribution
Regularized Scale
161
Original Input Scale
Regularizing Input Transformations
more symmetricdistribution
Regularized Scale
standard regression
standard regression
Original Input Scale
high leverage pointsskewed inputdistribution
162
Regularizing Input TransformationsRegularized Scale
standard regression
standard regression
Original Input Scale
regularized estimate
regularized estimate
true association
true association
163
Idea ExchangeWhat are examples of variables with unusual distributions that could produce problems in a regression model? Would you transform these variables? If so, what types of transformations would you entertain?
164
Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
165
Nonnumeric Input Coding
Level
1 0
DA DB
0 1AB
Two-level variable:
Level
1 00 1
AB
DA DB
Coding redundancy:
166
Nonnumeric Input Coding: Many Levels
Level DI
1 0 0 0 0 0 0 0
DA DB DC DD DE DF DG DH
0
0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 00 0 1 0 0 0 0 0
0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0
00000001
ABCDEFGHI
167
DI
000000001
DI
000000001
Coding Redundancy: Many Levels
Level
1 0 0 0 0 0 0 0
DA DB DC DD DE DF DG DH
0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 00 0 1 0 0 0 0 0
0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0
ABCDEFGHI
168
DI
000000001
Coding Consolidation
Level
1 0 0 0 0 0 0 0
DA DB DC DD DE DF DG DH
0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 00 0 1 0 0 0 0 0
0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0
ABCDEFGHI
169
DI
000000001
Coding Consolidation
Level
1 0 0 0 0 0 0 0
DABCD DB DC DD DEF DF DGH DH
1 0 0 1 0 0 0 0
1 1 0 0 0 0 0 01 0 1 0 0 0 0 0
0 0 0 0 1 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 1 10 0 0 0 0 0 0 0
ABCDEFGHI
170
Beyond the Prediction Formula
Manage missing values.
Interpret the model.
Account for nonlinearities.
Handle extreme or unusual values.
Use nonnumeric inputs.
171
Standard Logistic Regression
= w0 + w1 x1 + w2 x2 ^
^ ^ ^log p
1 – p( )^ · ·
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
172
Polynomial Logistic Regression
= w0 + w1 x1 + w2 x2 ^
^ ^ ^log p
1 – p( )^ · ·
quadratic terms
+ w3 x1 + w4 x2 2 2^ ^
+ w5 x1 x2
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
x1
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x2
0.40 0.50 0.60 0.700.30
0.60
0.70
0.80
173
Idea ExchangeWhat are some predictors that you can think of that would have a nonlinear relationship with a target? What do you think the functional form of the relationship is (for example, quadratic, exponential, …)?
174
Catalog Case StudyAnalysis Goal:
A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future.
Data set: CATALOG2010
Number of rows: 48,356
Number of columns: 98
Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales
Targets: RESPOND (binary)
ORDERSIZE (continuous)
175
Fitting a Logistic Regression Model
Catalog Case Study
Task: Build a logistic regression model in SAS Enterprise Miner.
176
Catalog Case Study: Steps to Build a Logistic Regression Model1. Add the CATALOG2010 data source to the diagram.
2. Use the Data Partition node to split the data into training and validation data sets.
3. Use the Variable Clustering node to select relatively independent inputs.
4. Use the Regression node to select relevant inputs.
5. Use the Model Comparison node to generate model assessment statistics and plots.
In the previous example, you performed steps 1 and 2.
177
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic Regression
4.4 Churn Case Study4.4 Churn Case Study
4.5 A Note about Model Management
4.6 Recommended Reading
178
Objectives Formulate an objective for predictive churn in a
telecommunications example. Generate predictive models in SAS Enterprise Miner
to predict churn. Score a customer database to target who is most likely
to churn.
179
Telecommunications Company Mobile (prepaid and postpaid)
and fixed service provider. In recent years, a high
percentage of high revenue subscribers have churned.
Company wants to target subscribers with a high churn probability for its customer retention program.
180
Churn Score A churn propensity score measures the propensity for
an active customer to churn. The score enables marketing managers to take
proactive steps to retain targeted customers before churn occurs.
Churn scores are derived from analysis of the historical behavior of churned customers and existing customers who have not churned.
181
Possible Predictor Variables Outstanding bill value Outstanding balance period Number of calls Call duration (international, local, national calls) Period as customer Total dropped calls Total failed calls
182
Model Implementation
inputs predictions
Predictions might be added to a data source inside or outside of SAS Enterprise Miner.
183
Churn Case Study1. Examine the CHURN_TELECOM data set and add it
to a diagram.
2. Partition the data in training and validation data sets.
3. Perform missing value imputation.
4. Recode nominal variables to combine class levels.
5. Reduce redundancy with variable clustering.
6. Reduce irrelevant inputs with a decision tree and a logistic regression. Compare results and select the final model based on validation error.
7. Score a data set to generate the list of churn risk customers.
184
Analyzing Churn Data
Churn Case Study
Task: Analyze churn data.
185
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic Regression
4.4 Churn Case Study
4.5 A Note about Model Management 4.5 A Note about Model Management
4.6 Recommended Reading
186
Objectives Discuss the movement of analytics from the “back
office” to the executive level and the reasons for these changes.
Describe the three-way pull for model management. Explain why models must be maintained and
reassessed over time.
187
Model Management and Business AnalyticsModel management is the assessment, deployment, and continued modification of models. This is a critical business process. Demonstrate that the model is well developed. Verify that the model is working well. Perform outcomes analysis.
Model management requires a collaborative effort across the company: VP Decision Analysis and Support Group, Senior Modeling
Analyst, Enterprise Architect, Internal Validation Compliance Analyst, Database Administrator
188
Analytical Model Management Challenges
Proliferation of Data and Models
Largely Manual ProcessesMoving to Production
Increased RegulationSarbanes-Oxley, Basel II
ActionableInferences
Integrating withOperational Systems
189
Three-Way Pull for Model Management
Business Value
GovernanceProcess
ProductionProcess
190
Three-Way Pull for Model ManagementBusiness Value Deployment of the “best” models Consistent model development and validation Understanding of model strategy and lifetime value
Production Process Efficient deployment of models in a timely manner Effective deployment to minimize operational risk
Governance Process Audit trails for compliance purposes Justification for management and shareholders
191
Changes in the Analytical Landscape
Analytical Modelers
Management
IT Ops
Data Integrators
Business
Governance
STAKEHOLDERSNow…
CustomerService
Retail
Logistics
Promotions
OPERATIONS TARGET
Customers
Stockholders
Suppliers
Employees
192
Model ManagementAs models proliferate, you need:
To be more diligent, but… There is not an established process to handle model
deployment into production. Model deployment is inefficient. More individuals and groups in the organization must be
involved in the process.
To be more vigilant, but… It is difficult to effectively manage existing models and track
the model life cycle. It is difficult to consistently provide appropriate internal and
regulatory documentation.
193
Idea ExchangeHow can you implement model management in your organization? Do you already have systems in place for continuous improvement and monitoring of models? For audit trails and compliance checks? Describe briefly how they operate.
194
Lessons LearnedModel management is a key part of good business analytics. Models should be evaluated before, during, and after
deployment. New models replace old ones as dictated by the data
over time.
195
Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling
4.2 Predictive Modeling Using Decision Trees
4.3 Predictive Modeling Using Logistic Regression
4.4 Churn Case Study
4.5 A Note about Model Management
4.6 Recommended Reading4.6 Recommended Reading
196
Recommended Reading Davenport, Thomas H., Jeanne G. Harris, and Robert Morison. 2010. Analytics at Work: Smarter Decisions, Better Results. Boston: Harvard Business Press. Chapters 7 and 8
– Chapters 7 and 8 focus on making analytics an integral part of a business. Systems, processes, and organizational culture must work together to move toward analytical leadership. The remaining three chapters of the book (9-11) are optional, self-study material.
197
Recommended ReadingMay, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapter 1
– May’s book provides a counterpoint to the Davenport, et al. book, from the perspective of the role of analysts in the organization, and how organizations can make the best use of their analytical talent.
198
Recommended ReadingMorris, Michael. “Mining Student Data Could Save Lives.” The Chronicle of Higher Education. October 2, 2011. http://chronicle.com/article/Mining-Student-Data-Could-Save/129231/This article discusses the mining of student data at colleges and universities to prevent large-scale acts of violence on campus. Mining of students’ data (including Internet usage and social networking data), would enhance the capacity of threat-assessment teams to protect the health and safety of the students.