nelson henwood - finity consulting · conclusions . the gentle introduction . the ‘gentle’...
TRANSCRIPT
Nelson Henwood
Pricing elasticity and demand modelling using GBMs
Today’s Presentation
3
(Gentle) Introduction
to GBMs
GBMs vs Logistic
Regression
Modelling Process Overview
Some Results
A Couple of other
Observations Conclusions
The gentle introduction
The ‘gentle’ introduction
GBM stands for Gradient Boosting Machine
5
= Gradient Descent
An optimisation algorithm for finding
the local minimum of a function
A machine learning ensemble algorithm to combine
weak learners into a single strong learner
Boosting
+
In English
Fit an ensemble model using an iterative
process
At each step, introduce a weak learner to
compensate the shortcomings of the existing
model:
In Gradient Boosting:
Shortcomings are identified by negative gradients
also called pseudo residuals (effectively residuals
with a view on error distribution)
Shortcomings tell us how to improve our model
Effectively, we are iteratively explaining
the model errors and using this to improve
our prediction
6
GBM
GBM development timeline
7
1984 CART
Breiman, Freidman et al
1997 Adaboost
Freud & Schapire
1999 Greedy Function Approximation: A
Gradient Boosting Machine
Stochastic Gradient Boosting
2000 Treenet 1 released
2003 CART
Breiman, Freidman et al
2007 Extensions to GBM R Package
Quantile regression
2012 Further extensions to GBM R
Package
Multinomial, t-distribution, pairwise
2014 XGBoost released
Key model parameters
8
Base Learner
Complexity Shrinkage
Sub-
sampling
Stopping
Criteria
Base Learner
Complexity Shrinkage Subsampling
Stopping Criteria
Base Learners
Loss Function
Training Fraction
Base learners
Decision tree most popular
9
Other possibilities
Linear Models
Ordinary linear regression
Ridge penalised linear
regression
Random effects
Smooth Models
P-splines
Radial basis functions
Other Models
Markov Random Fields
Wavelets
Custom base-learner
functions
Mixed Models
Base Learner Complexity
10
Control underlying complexity of
base learners
Trade-off between overfitting and
ability to explain underlying
complexity
Tree depth 2 or 3 to 8
Rule of thumb - minimum
observations 2-5% of data
Maximum Tree Depth
Minimum Observations
per Node
Shrinkage or learning rate
“Shrink” the impact of each
additional fitted base learner
If one of the boosting iterations
turns out to be “erroneous”, its
negative impact can be easily
corrected in subsequent steps
11
Reduces the size of
incremental steps
The smaller (closer to 0) the
shrinkage parameter, the better
the model generalization (but
convergence takes longer)
It is better to improve a model by taking many
small steps than by taking fewer larger steps
Subsampling (Bagging)
At each learning iteration, only a
random part of the training data is
used to fit the next base learner
Improves generalisation and
computation time
This random part is called the
“bag fraction”
A typical value is 50%
12
Introduce a bit of randomness
into the fitting procedure
Stopping criteria: How many boosting iterations?
13
Random Sample
Cross-fold Validation
Logic sequence summary
14
Initial estimate 𝐹0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
Calculate pseudo
residuals 𝑟𝑚 𝑥 = 𝑦 − 𝐹𝑚(𝑥)
Select random sample Random 50%
Build a decision tree
on residuals ℎ𝑚(𝑥) to approximate residuals
Update prediction 𝐹𝑚+1 𝑥 = 𝐹𝑚 𝑥 + 𝛿ℎ𝑚 𝑥
Learning rate 𝛿
1
2
3
4
5
Iterate until fit
deteriorates on
hold-out data
GBMs Vs
Logistic Regression
Traditional approaches Vs Machine learning
Aspect Traditional approach Machine learning
Modelling process
Iterative: Manually propose model
form, predictors etc. Gather feedback,
augment hypothesis.
Largely automated: Skill and
imagination of modeller still relevant
Hypothesis testing Statistical inference Empirical validation: More intuitive
Assumptions Response distribution
Linear predictor is “correct”
On par
Not required for inference
Predictive accuracy Usually good with high effort
expended Good with low effort
Efficiency Low High
Data volumes
required
More suitable with lower volumes
or predictors
More suitable with high volumes
and large number of predictors
16
Common objections
17
It’s a
black
box!
It’s expensive!
As transparent as GLMs
Key drivers & interactions
- shapes (relativities)
- range of predictions
- high/low segments
The software’s
expensive!
More efficient and cheaper
than GLMs (typically more
predictive)
Cost is ~20% of the GLM
cost for retention
modelling and ~30% for
competitor deconstructions
R is free
On-line courses are free
Many packages now
offering R plug-ins
Common objections
18
Prediction
volatility
(it’s not smooth)
For continuous variables
use monotonic function to
ensure smooth response
Group levels of discrete
variables (as with a GLM)
Can’t
implement
the results!
Learn on the job
Co-source a project
with experts:
Get the models,
scripts and knowledge
transfer
Retention models typically
once removed from the
customer facing pricing
Models can be scored
using R / Python and used
with SAS processes
PMML execution in Radar
/ Earnix
We
lack
internal
knowledge!
Modelling process
Section outline
Pre-modelling
Model segmentation
Cancellations
Feature engineering
Other considerations
Modelling
Technical model tuning
Variable selection
and ‘actuarial’ model tuning
Time trends and scoring
20
Pre-modelling : Segmenting the modelling
21
Class of business Motor / Home
Homeowners: Combined Vs Stand Alone products
‘Decrement’ type Cancellations Vs Lapses
Cancellations that are really lapses
Coverage downgrades
Payment frequency Annual Vs Monthly
Shoppers Vs Non-shoppers If only we could segment the modelling this way
How can we approximate this?
Pre-modelling: Considering cancellations
Monthly ‘chunks’
for latest data
Influence able vs
non-influence able
Pricing Vs
Customer management models
Need a year to fully expose
the policy to cancellation
Don’t want to use “old” data
in our model
Cut exposure into monthly
chunks
Use policy month
explanatory variable
Key drivers and
price elasticity differ
depending on
cancellation reason
How good are your
cancellation codes?
Split modelling
if possible
Information emerges through the
policy year (eg claim,
endorsement) but can’t use this for
pricing
For “customer management”
models we can update the
predictors each policy month
22
Pre-modelling: Feature engineering
23
Feature engineering is the process of using domain knowledge of the data
to create features that make machine learning algorithms work.
(Wikipedia)
External data Price data Customer
features
Behavioural
features External
Data Price Data
Customer Features
“Behavioural” Features
Pre-modelling: Some other considerations
24
Peril affected policies for homes
Price change data
How much data?
Oversampling
Modelling: Tuning model parameters
25
Base learner
complexity
Shrinkage
Bag fraction
Number of
trees (stopping)
1
2
3
4
Criteria:
Best fit on
validation data
ROC
Deviance
Modelling: ‘Actuarial’ model tuning
26
Key
drivers Shapes
Highly
correlated
variables
No
‘cheating’
‘Execution
ability’
Modelling: Time trends and scoring
27
Time parameters in model
(sometimes multiple)
Examine trends
and interactions
Align / challenge
budget / forward
forecast
Examples + Results
Finity GBM dashboard: Motor annual lapse model
29
Variable importance
30
Relative contribution to predictive
power of the model – based on
number of times the variable appears
in splits and model improvement as a
result
Sums to 100 across all variables
used in the underlying trees
Main effects not separated from
interaction effects
Variable InfluenceCU: Customer Variable 1 15.0CU: Policy Tenure 11.4CU: Customer Variable 3 6.7PR: Premium Change (%) 5.6BE: Payment Delay 4.1PR: Premium Rate 4.0PO: ABS Region 3.6PO: Vehicle Age 3.5CP: Competitor 1 CPI 3.1PO: Insured Age 2.9CP: Rank Insurer 2.3TI: Renewal Offer Month 2.3PO: Policy Variable 4 1.9PO: Policy Variable 5 1.8CU: Customer Variable 4 1.7CP: Competitor 2 CPI 1.5PR: Premium Change (%) Prior Renewal 1.4BE: Behavioural 2 1.1PR: Premium 3 1.1CP: Competitor 3 CPI 1.0
Variable importance
31
Variable InfluenceCU: Customer Variable 1 15.0CU: Policy Tenure 11.4CU: Customer Variable 3 6.7PR: Premium Change (%) 5.6BE: Payment Delay 4.1PR: Premium Rate 4.0PO: ABS Region 3.6PO: Vehicle Age 3.5CP: Competitor 1 CPI 3.1PO: Insured Age 2.9CP: Rank Insurer 2.3TI: Renewal Offer Month 2.3PO: Policy Variable 4 1.9PO: Policy Variable 5 1.8CU: Customer Variable 4 1.7CP: Competitor 2 CPI 1.5PR: Premium Change (%) Prior Renewal 1.4BE: Behavioural 2 1.1PR: Premium 3 1.1CP: Competitor 3 CPI 1.0
Total competitor
related influence =
9.3 Total price related
influence (inc. comp) =
21.4
Variable type importance
32
Cumulative gains curve and Gini
ROC Area = 78%
% of policies
Highest prediction Lowest prediction
Measure on validation data
Order observations by model score
Plot % of observation against % of
target
Gini index is area under the curve
Decile chart L
ap
se
s r
ate
Highest prediction Lowest prediction
Order observations by model
score and create 10 equal
sized groups
Compare the actual and predicted
outcome on validation data
Big separation between high/low
decile is desirable
Actual and predicted should
be close
X-Variable R
ela
tive
Im
pa
ct
Partial dependence
35
Measures the impact on
predicted lapse /
cancellation from a change
in a single predictor
Impact from all other
predictors are held constant
Some key partial dependencies
36
A few more key partial dependencies
37
Competitive position partial dependencies
Competitor ratio
measured as:
Competitor Premium
Vs Insurer Premium
Strength of impact
on retention varies
considerably across
competitors
38
Interaction strength
39
Key interactions (to
whatever depth desired)
can also be identified
Strength measure
Interactions with price
and competitive position
give us information about
price elasticity
Variable 1 Variable 2 Strength
Primary Insured Age Primary Driver Age 0.49
Customer Variable 3 Primary Insured Age 0.24
Customer Variable 3 Multi Product Holdings 0.14
Customer Variable 1 Premium Change (%) 0.11
Customer Variable 3 Insurer Competitive Rank 0.11
Customer Variable 1 Policy Tenure 0.11
Premium Change (%) Primary Driver Age 0.10
Policy Tenure Premium Change (%) 0.08
Premium Change (%) Multi Product Holdings 0.08
Competitor 1 CPI Technical Vehicle Risk 0.08
Motor annual lapse elasticity example
40
Primary driver age Policy duration
Motor monthly cancellation: Relative price sensitivity example (primary driver age)
41
Competitor cheaper Competitor more expensive
Distribution of price elasticity: Motor attrition
42
Motor attrition price elasticity: 5% increase
CPI = Competitor vs Client
Motor attrition price elasticity: 5% decrease
CPI = Competitor vs Client
Example segmentation: Renewal price increase elasticity
45
A couple of other things
Some other observations
47
Start with
a simple model A simple linear relationship is harder for a GBM
‘Offsets’ Can include in model statement
Exploring different
or mixed base learners ‘Linear’ for continuous, trees for discrete?
Predictive Power
48
Conclusion
In summary
50
Fast / efficient Good
predictive power
Not a
black box
Multiple
execution options
Questions?
Distribution & use
This presentation has been prepared for the Finity
Consulting Pricing & Analytics Seminar, held on 18
October 2016. It is not intended, nor necessarily
suitable, for any other purpose.
Third parties should recognise that the furnishing of this
presentation is not a substitute for their own due
diligence and should place no reliance on this
presentation or the data contained herein which would
result in the creation of any duty or liability by Finity to
the third party.
Reliances & limitations
Finity wishes it to be understood that the information
presented at the Seminar is of a general nature and
does not constitute actuarial advice or investment
advice. While Finity has taken reasonable care in
compiling the information presented, Finity does not
warrant that the information provided is relevant to a
particular reader’s situation, specific objectives or
needs.
Finity does not have any responsibility to any attendee
at the conference or to any other party arising from the
content of this presentation. Before acting on any
information provided by Finity in this presentation,
readers should consider their own circumstances and
their need for advice on the subject – Finity would be
pleased to assist.