predictive analytics: demystifying current and emerging ......decision trees –finding interactions...

32
May 18, 2017 Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies

Upload: others

Post on 07-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

May 18, 2017

Tom Kolde, FCAS, MAAA

Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies

Page 2: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

1

About the Presenters

• Tom Kolde, FCAS, MAAA• Consulting Actuary• Chicago, Illinois

• Linda Brobeck, FCAS, MAAA • Senior Consulting Actuary• San Francisco, CA

Page 3: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

2

Agenda

• Machine Learning Overview

• Deeper Dive

– GLM and GAM

– Decision Trees (Recursive Partitioning)

– Ensembles

– Networks

• Summary

Page 4: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

3

Machine Learning Overview

Machine Learning

Supervised

Predictive

Target Variable

Task Driven

Regression, Classification

Unsupervised

Descriptive

No Target Variable

Data Driven

Clustering, Pattern Discovery, Dimension Reduction

Reinforcement – Algorithm Learns to React

Page 5: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

4

Analytics with reinforcement

Predictive modeling – any technique - no reinforcement

Predictive modeling beyond GLM without reinforcement

Polling Question #1

How do you define Machine Learning?

Other

A

B

C

D

Page 6: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

5

Machine Learning Overview

Machine Learning

Regression Instance based

Regularization Decision Trees

BayesianDimensionality

Reduction

Association Rule Learning

Clustering

Artificial Neural Network

Deep Learning

Page 7: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

6

Machine Learning Overview

Deep Learning

Deep Boltzmann Machine Deep Belief Networks Convolutional Neural NetworkStacked Auto-Encoders

PerceptronBack-PropagationHopfield Network

Radial Basis Function Network

Dimensionality Reduction

Multidimensional Scaling Projection PursuitLinear, Mixture, Quadratic, Flexible Discriminant Analysis

Principal Component AnalysisPC Regression

Partial Least Squares Sammon Mapping

Instance based

k-Nearest NeighborLearning Vector Quantization

Self-Organizing Map Locally Weighted Learning

Regularization

Ridge RegressionLeast Absolute Shrinkage and Selection Operator (LASSO)Elastic NetLeast-Angle Regression

k-Means, k-MediansExpectation Maximization Hierarchical Clustering

Clustering

Artificial Neural Network

Ordinary Least SquaresLinear, Logistic, StepwiseMultivariate AdaptiveLocally Estimated Scatterplot Smoothed

Regression

Naïve, Gaussian Naïve, Multinomial Naïve

Bayesian Belief Network

Bayesian Network

Bayesian

A priori algorithmEclat algorithm

Association Rule Learning

Iterative Dichotomizer 3 (ID3)Chi-squared Automatic

Interaction Detection (CHAID)Decision Stump

Conditional Decision Trees

Decision Trees

Page 8: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

7

• Generalized Linear Model– 𝜂 = 𝑋𝛽 = 𝛽0 + 𝛽1 x1 + 𝛽2 x2 + … + 𝛽p xp

– 𝜇 = g-1(𝜂)

• Goodness-of-fit– Conceptually equivalent to sum of squares

in ordinary linear regression

– Deviance: 𝑖2wiø µ

𝑖

𝑦𝑖 𝑦𝑖 − 𝜃

𝑉(𝜃)d𝜃

• Key Assumptions– Link function g() – describes the functional relationship between the

parameters and the fitted value– Variance function V(µi) – describes the relationship between the mean

and variance for an observation– Scale Parameter ø - accounts for misspecification of the variance

function– Weights wi - the amount of weight given to an individual record

Generalized Linear Modeling (GLM)

Page 9: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

8

Distributions

Page 10: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

9

GLMs have become the primary tool

Advantages Transparent results Allows modeler flexibility in setting assumptions beyond

ordinary least squares Results are in a form that is readily implementable Allows for explicit adjustments for business considerations Methodology is applicable to pricing, underwriting, and

other insurance functions Widely accepted

Limitations Requires a greater number of assumptions than non-

parametric methods Requires significant domain expertise and modeler

judgment

Page 11: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

10

Parametric vs Non-parametric

Parametric The shape of the predictor function is defined by a few parameters

o Algorithms simplify the function to a known form

o Machine learning finds coefficients

NonparametricThe shapes of predictor functions are fully determined by the data

Page 12: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

11

• Generalized Additive Model

– 𝜂 = 𝛽0 + 𝑓1( x1) + 𝑓2(x2) + … + 𝑓p( xp)

– 𝜇 = g-1(𝜂)

• Goodness-of-fit

– Similar to GLMs

• Functions 𝑓𝑖 can be:

– Parametric with a specified form (i.e., a polynomial)

– Non-Parametric

– Each 𝑓𝑖 can be a different function

Generalized Additive Models (GAM)

Replaces estimation of linear form parameters with smooth linear or non-linear functions

Page 13: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

12

Generalized Additive Models (GAM)

Advantages

• Greater flexibility, particularly when the relationship is not linear

• Estimate these smooth relationships simultaneously and then predict by simply adding them up

• GAM scales well with the increasing dimensionality and it yields interpretable models

• Exploratory modeling can uncover hidden patterns in the data and inspire parsimonious parametric models

Limitations

• Subject to overfitting

• Requires specification of each 𝑓𝑖• Requires significant domain expertise and modeler judgment

Page 14: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

13

Generalized Linear Models (GLM)

Generalized Additive Models (GAM)

Decision Trees

Polling Question #2

Which modeling technique(s) does your company utilize?

Random Forests

A

B

C

D

Other Ensemble ModelsE

F Unsupervised Learning (Dimensionality Reduction and/or Clustering)

G Neural Networks

Page 15: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

14

Decision Trees - Methodology

Split data according to measures of similarity

If the Target Variable is: Categorical Classification TreeContinuous Regression Tree

Two Competing Objectives: PurityMeasure of VariationParsimony Desire for Simple

Page 16: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

15

Splitting Procedure

The domain space of explanatory variables X1,…Xn is split into two subsets where observed values in Xj belong to one of the subsets i.e. < s or >= s OR s1=male s2=female

Improvement Value

The dimensions j and s above are chosen to minimize the error in the prediction among all such binary (two-leveled) trees.

Stopping Criteria

– No stopping criterion

– Minimum leaf (node) size; Maximum levels or splits

– Let data determine the stopping criterion

Decision Trees – The Process

Page 17: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

16

Decision Trees – Measures for Splitting Criteria

Measures IndependenceSignificance

• Numeric purity

• p-values of Chi-square variance reduction

Measures DisorderEntropy

• Categorical

• Measures pureness of the level

• Measures Gain in Intrinsic InformationGain Ratio• Information Gain = Entropy (parent) – Weighted sum of Entropy (children)

• Penalizes large values/splits

Measures MisclassificationGini

• Max = 1 – (1 / # of classes)

• Minimum = 0 (all records belong to one class)

Page 18: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

17

Advantages Non-parametric Simple to understand / Easy to interpret Automatic variable and interaction selection Handles missing values and outliers

Limitations Over-fit concerns (unstable) Some relationships difficult to find

Decision Trees – Advantages/Limitations

Page 19: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

18

Decision Trees – Finding Interactions

• Two explanatory variables interact if they combine non-additively to affect the target

• Traditional regression requires an explicit interaction term identified upfront

• A Decision Tree of depth D can capture interactions of order up to D

Decision Tree automatically captures interactions

All Drivers

Age <=30

Male

Single Married

Female

Age > 30

Annual Mileage <10,000

Annual Mileage >10,000

Page 20: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

19

• Enhancing GLMs

Screening predictor variables

Analyzing residuals

Identifying transformations and/or interactions

• Portfolio diagnostics

• Checking/Quality Control

Applications of Decision Trees

Page 21: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

20

Ensembles

Combine many weak classifiers in order to strengthen the overall result

Bagging (Bootstrap Aggregating)

Many models each based on sample

Each model in the ensemble gets a vote

Boosting

Iterative models dependent on previous model

Stacked Generalization (Blending)

Use of diverse models in combination

Page 22: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

21

Random Forests

• Each tree is built using a random sample

– With replacement of the observations

– Without replacement of the explanatory variables

• The predicted target value is the mean predicted target value over the ensemble

• Perturbation (interjecting randomness) is implemented, at each node, by only searching for optimal splits among a randomly chosen subset of the explanatory variables

Random Sample 1

Random Sample 4

Random Sample 3

Random Sample 2

Page 23: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

22

• Gradient Boosting– Trees built sequentially, tree based on previously built because

the new tree is built on the residual of prior tree(s)

• Multiplicative Boosted Trees– Multiplicative residuals

– Multiplicative combining of trees

• AdaBoost– Adaptive boosting

– Iteratively changes weights of training

observations based on errors of

previous prediction

Boosting

Page 24: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

23

• Blending of many model types

• Use diverse models for the blending/stacking

• 2 stages

1. Fit base learners to data

2. Fit a combiner algorithm to the predictions of the base learners

Stacked Generalization

Learner 1

Linear Regression

Learner 2

Decision Tree

Learner 3

Neural Network

Combiner AlgorithmLogistic Regression

Page 25: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

24

Neural Networks

X1

X2

X3

X1

X2

X3i

X1i

X2i

H3

H1

H2

H3

H2

H3

H1

H2

H3

H1

H2Yi

H3i

H1i

H2i

Input layer Hidden layer Target layer

Requires limited assumptions regarding the relationship of explanatory variables.

• Target layer regression model on a series of derived input, called hidden units

• Hidden units are regressions on the original inputs• Regression parameters are adjusted iteratively to minimize the

squared residuals

Page 26: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

25

Advantages Assists discovery of local interactions Software packages reduce learning curve

Limitations Subject to overfitting Identifying underlying drivers of model is difficult

Neural Networks

Page 27: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

26

Transparency of Results

Unfamiliarity of methods

Time constraints associated with running multiple methodologies

Polling Question #3

What do you consider the significant obstacles to utilizing methodologies other than GLM?

Software limitations

A

B

C

D

OtherE

Page 28: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

27

• Selecting/Combining Techniques

– Depends on the application/objective

– No silver bullet

• Machine Learning Reinforcement vs. Domain Expertise

Summary

Page 29: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

28

Questions

Page 30: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

29

Join Us for the Next APEX Webinar

Page 31: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

30

• We’d like your feedback and suggestions

• Please complete our survey

• For copies of this APEX presentation

• Visit the Resource Knowledge Center at Pinnacleactuaries.com

Final notes

Page 32: Predictive Analytics: Demystifying Current and Emerging ......Decision Trees –Finding Interactions • Two explanatory variables interact if they combine non-additively to affect

31Commitment Beyond Numbers

Thank You for Your Time and Attention

Tom Kolde

[email protected]

Linda Brobeck

[email protected]