predictive analytics: demystifying current and emerging ......decision trees –finding interactions...
TRANSCRIPT
![Page 1: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/1.jpg)
May 18, 2017
Tom Kolde, FCAS, MAAA
Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies
![Page 2: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/2.jpg)
1
About the Presenters
• Tom Kolde, FCAS, MAAA• Consulting Actuary• Chicago, Illinois
• Linda Brobeck, FCAS, MAAA • Senior Consulting Actuary• San Francisco, CA
![Page 3: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/3.jpg)
2
Agenda
• Machine Learning Overview
• Deeper Dive
– GLM and GAM
– Decision Trees (Recursive Partitioning)
– Ensembles
– Networks
• Summary
![Page 4: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/4.jpg)
3
Machine Learning Overview
Machine Learning
Supervised
Predictive
Target Variable
Task Driven
Regression, Classification
Unsupervised
Descriptive
No Target Variable
Data Driven
Clustering, Pattern Discovery, Dimension Reduction
Reinforcement – Algorithm Learns to React
![Page 5: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/5.jpg)
4
Analytics with reinforcement
Predictive modeling – any technique - no reinforcement
Predictive modeling beyond GLM without reinforcement
Polling Question #1
How do you define Machine Learning?
Other
A
B
C
D
![Page 6: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/6.jpg)
5
Machine Learning Overview
Machine Learning
Regression Instance based
Regularization Decision Trees
BayesianDimensionality
Reduction
Association Rule Learning
Clustering
Artificial Neural Network
Deep Learning
![Page 7: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/7.jpg)
6
Machine Learning Overview
Deep Learning
Deep Boltzmann Machine Deep Belief Networks Convolutional Neural NetworkStacked Auto-Encoders
PerceptronBack-PropagationHopfield Network
Radial Basis Function Network
Dimensionality Reduction
Multidimensional Scaling Projection PursuitLinear, Mixture, Quadratic, Flexible Discriminant Analysis
Principal Component AnalysisPC Regression
Partial Least Squares Sammon Mapping
Instance based
k-Nearest NeighborLearning Vector Quantization
Self-Organizing Map Locally Weighted Learning
Regularization
Ridge RegressionLeast Absolute Shrinkage and Selection Operator (LASSO)Elastic NetLeast-Angle Regression
k-Means, k-MediansExpectation Maximization Hierarchical Clustering
Clustering
Artificial Neural Network
Ordinary Least SquaresLinear, Logistic, StepwiseMultivariate AdaptiveLocally Estimated Scatterplot Smoothed
Regression
Naïve, Gaussian Naïve, Multinomial Naïve
Bayesian Belief Network
Bayesian Network
Bayesian
A priori algorithmEclat algorithm
Association Rule Learning
Iterative Dichotomizer 3 (ID3)Chi-squared Automatic
Interaction Detection (CHAID)Decision Stump
Conditional Decision Trees
Decision Trees
![Page 8: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/8.jpg)
7
• Generalized Linear Model– 𝜂 = 𝑋𝛽 = 𝛽0 + 𝛽1 x1 + 𝛽2 x2 + … + 𝛽p xp
– 𝜇 = g-1(𝜂)
• Goodness-of-fit– Conceptually equivalent to sum of squares
in ordinary linear regression
– Deviance: 𝑖2wiø µ
𝑖
𝑦𝑖 𝑦𝑖 − 𝜃
𝑉(𝜃)d𝜃
• Key Assumptions– Link function g() – describes the functional relationship between the
parameters and the fitted value– Variance function V(µi) – describes the relationship between the mean
and variance for an observation– Scale Parameter ø - accounts for misspecification of the variance
function– Weights wi - the amount of weight given to an individual record
Generalized Linear Modeling (GLM)
![Page 9: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/9.jpg)
8
Distributions
![Page 10: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/10.jpg)
9
GLMs have become the primary tool
Advantages Transparent results Allows modeler flexibility in setting assumptions beyond
ordinary least squares Results are in a form that is readily implementable Allows for explicit adjustments for business considerations Methodology is applicable to pricing, underwriting, and
other insurance functions Widely accepted
Limitations Requires a greater number of assumptions than non-
parametric methods Requires significant domain expertise and modeler
judgment
![Page 11: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/11.jpg)
10
Parametric vs Non-parametric
Parametric The shape of the predictor function is defined by a few parameters
o Algorithms simplify the function to a known form
o Machine learning finds coefficients
NonparametricThe shapes of predictor functions are fully determined by the data
![Page 12: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/12.jpg)
11
• Generalized Additive Model
– 𝜂 = 𝛽0 + 𝑓1( x1) + 𝑓2(x2) + … + 𝑓p( xp)
– 𝜇 = g-1(𝜂)
• Goodness-of-fit
– Similar to GLMs
• Functions 𝑓𝑖 can be:
– Parametric with a specified form (i.e., a polynomial)
– Non-Parametric
– Each 𝑓𝑖 can be a different function
Generalized Additive Models (GAM)
Replaces estimation of linear form parameters with smooth linear or non-linear functions
![Page 13: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/13.jpg)
12
Generalized Additive Models (GAM)
Advantages
• Greater flexibility, particularly when the relationship is not linear
• Estimate these smooth relationships simultaneously and then predict by simply adding them up
• GAM scales well with the increasing dimensionality and it yields interpretable models
• Exploratory modeling can uncover hidden patterns in the data and inspire parsimonious parametric models
Limitations
• Subject to overfitting
• Requires specification of each 𝑓𝑖• Requires significant domain expertise and modeler judgment
![Page 14: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/14.jpg)
13
Generalized Linear Models (GLM)
Generalized Additive Models (GAM)
Decision Trees
Polling Question #2
Which modeling technique(s) does your company utilize?
Random Forests
A
B
C
D
Other Ensemble ModelsE
F Unsupervised Learning (Dimensionality Reduction and/or Clustering)
G Neural Networks
![Page 15: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/15.jpg)
14
Decision Trees - Methodology
Split data according to measures of similarity
If the Target Variable is: Categorical Classification TreeContinuous Regression Tree
Two Competing Objectives: PurityMeasure of VariationParsimony Desire for Simple
![Page 16: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/16.jpg)
15
Splitting Procedure
The domain space of explanatory variables X1,…Xn is split into two subsets where observed values in Xj belong to one of the subsets i.e. < s or >= s OR s1=male s2=female
Improvement Value
The dimensions j and s above are chosen to minimize the error in the prediction among all such binary (two-leveled) trees.
Stopping Criteria
– No stopping criterion
– Minimum leaf (node) size; Maximum levels or splits
– Let data determine the stopping criterion
Decision Trees – The Process
![Page 17: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/17.jpg)
16
Decision Trees – Measures for Splitting Criteria
Measures IndependenceSignificance
• Numeric purity
• p-values of Chi-square variance reduction
Measures DisorderEntropy
• Categorical
• Measures pureness of the level
• Measures Gain in Intrinsic InformationGain Ratio• Information Gain = Entropy (parent) – Weighted sum of Entropy (children)
• Penalizes large values/splits
Measures MisclassificationGini
• Max = 1 – (1 / # of classes)
• Minimum = 0 (all records belong to one class)
![Page 18: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/18.jpg)
17
Advantages Non-parametric Simple to understand / Easy to interpret Automatic variable and interaction selection Handles missing values and outliers
Limitations Over-fit concerns (unstable) Some relationships difficult to find
Decision Trees – Advantages/Limitations
![Page 19: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/19.jpg)
18
Decision Trees – Finding Interactions
• Two explanatory variables interact if they combine non-additively to affect the target
• Traditional regression requires an explicit interaction term identified upfront
• A Decision Tree of depth D can capture interactions of order up to D
Decision Tree automatically captures interactions
All Drivers
Age <=30
Male
Single Married
Female
Age > 30
Annual Mileage <10,000
Annual Mileage >10,000
![Page 20: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/20.jpg)
19
• Enhancing GLMs
Screening predictor variables
Analyzing residuals
Identifying transformations and/or interactions
• Portfolio diagnostics
• Checking/Quality Control
Applications of Decision Trees
![Page 21: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/21.jpg)
20
Ensembles
Combine many weak classifiers in order to strengthen the overall result
Bagging (Bootstrap Aggregating)
Many models each based on sample
Each model in the ensemble gets a vote
Boosting
Iterative models dependent on previous model
Stacked Generalization (Blending)
Use of diverse models in combination
![Page 22: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/22.jpg)
21
Random Forests
• Each tree is built using a random sample
– With replacement of the observations
– Without replacement of the explanatory variables
• The predicted target value is the mean predicted target value over the ensemble
• Perturbation (interjecting randomness) is implemented, at each node, by only searching for optimal splits among a randomly chosen subset of the explanatory variables
Random Sample 1
Random Sample 4
Random Sample 3
Random Sample 2
![Page 23: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/23.jpg)
22
• Gradient Boosting– Trees built sequentially, tree based on previously built because
the new tree is built on the residual of prior tree(s)
• Multiplicative Boosted Trees– Multiplicative residuals
– Multiplicative combining of trees
• AdaBoost– Adaptive boosting
– Iteratively changes weights of training
observations based on errors of
previous prediction
Boosting
![Page 24: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/24.jpg)
23
• Blending of many model types
• Use diverse models for the blending/stacking
• 2 stages
1. Fit base learners to data
2. Fit a combiner algorithm to the predictions of the base learners
Stacked Generalization
Learner 1
Linear Regression
Learner 2
Decision Tree
Learner 3
Neural Network
Combiner AlgorithmLogistic Regression
![Page 25: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/25.jpg)
24
Neural Networks
X1
X2
X3
X1
X2
X3i
X1i
X2i
H3
H1
H2
H3
H2
H3
H1
H2
H3
H1
H2Yi
H3i
H1i
H2i
Input layer Hidden layer Target layer
Requires limited assumptions regarding the relationship of explanatory variables.
• Target layer regression model on a series of derived input, called hidden units
• Hidden units are regressions on the original inputs• Regression parameters are adjusted iteratively to minimize the
squared residuals
![Page 26: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/26.jpg)
25
Advantages Assists discovery of local interactions Software packages reduce learning curve
Limitations Subject to overfitting Identifying underlying drivers of model is difficult
Neural Networks
![Page 27: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/27.jpg)
26
Transparency of Results
Unfamiliarity of methods
Time constraints associated with running multiple methodologies
Polling Question #3
What do you consider the significant obstacles to utilizing methodologies other than GLM?
Software limitations
A
B
C
D
OtherE
![Page 28: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/28.jpg)
27
• Selecting/Combining Techniques
– Depends on the application/objective
– No silver bullet
• Machine Learning Reinforcement vs. Domain Expertise
Summary
![Page 29: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/29.jpg)
28
Questions
![Page 30: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/30.jpg)
29
Join Us for the Next APEX Webinar
![Page 31: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/31.jpg)
30
• We’d like your feedback and suggestions
• Please complete our survey
• For copies of this APEX presentation
• Visit the Resource Knowledge Center at Pinnacleactuaries.com
Final notes
![Page 32: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect](https://reader034.vdocuments.mx/reader034/viewer/2022050509/5f9a43c46415b02902205e1d/html5/thumbnails/32.jpg)
31Commitment Beyond Numbers
Thank You for Your Time and Attention
Tom Kolde
Linda Brobeck