data mining: neural network analytics and... · data mining: neural network applications by louise...
TRANSCRIPT
Data Mining:Neural Network
Applications
by Louise Francis
CAS Annual Meeting, Nov 11, 2002Francis Analytics and Actuarial Data Mining, Inc.
Objectives of Presentation
• Introduce insurance professionals to neuralnetworks
• Show that neural networks are a lot like someconventional statistics
• Indicate where use of neural networks mightbe helpful
• Show practical examples of using neuralnetworks
• Show how to interpret neural networkmodels
A Common ActuarialApplication:Loss DevelopmentAccident Months of Development
Year 12 24 36 48 60 72 84 96 108 120 132 144 156 168 180
1980 267 1,975 4,587 7,375 10,661 15,232 17,888 18,541 18,937 19,130 19,189 19,209 19,234 19,234 19,246
1981 310 2,809 5,686 9,386 14,884 20,654 22,017 22,529 22,772 22,821 23,042 23,060 23,127 23,127 23,127
1982 370 2,744 7,281 13,287 19,773 23,888 25,174 25,819 26,049 26,180 26,268 26,364 26,371 26,379 26,397
1983 577 3,877 9,612 16,962 23,764 26,712 28,393 29,656 29,839 29,944 29,997 29,999 29,999 30,049 30,049
1984 509 4,518 12,067 21,218 27,194 29,617 30,854 31,240 31,598 31,889 32,002 31,947 31,965 31,986
1985 630 5,763 16,372 24,105 29,091 32,531 33,878 34,185 34,290 34,420 34,479 34,498 34,524
1986 1,078 8,066 17,518 26,091 31,807 33,883 34,820 35,482 35,607 35,937 35,957 35,962
1987 1,646 9,378 18,034 26,652 31,253 33,376 34,287 34,985 35,122 35,161 35,172
1988 1,754 11,256 20,624 27,857 31,360 33,331 34,061 34,227 34,317 34,378
1989 1,997 10,628 21,015 29,014 33,788 36,329 37,446 37,571 37,681
1990 2,164 11,538 21,549 29,167 34,440 36,528 36,950 37,099
1991 1,922 10,939 21,357 28,488 32,982 35,330 36,059
1992 1,962 13,053 27,869 38,560 44,461 45,988
1993 2,329 18,086 38,099 51,953 58,029
1994 3,343 24,806 52,054 66,203
1995 3,847 34,171 59,232
1996 6,090 33,392
1997 5,451
An Example of a NonlinearAn Example of a NonlinearFunctionFunction
1 3 6 8 11 13 16D eve lopm ent Age
0
20000
40000
60000
Cum
ulat
ive
Pai
d Lo
sses
S catterp lo t o f C um ulative P aid Losses
Conventional Statistics:Regression
• One of the most common methods offitting a function is linear regression
• Models a relationship between twovariables by fitting a straight linethrough points
• Minimizes a squared deviation betweenan observed and fitted value
Neural Networks
• Also minimizes squared deviationbetween fitted and actual values
• Can be viewed as a non-parametric,non-linear regression
The Feedforward NeuralNetwork
Three Layer N eural N etw ork
Input La ye r Hidde n La ye r Output La ye r(Input Da ta ) (P roce ss Da ta ) (P re dicte d V a lue )
The Activation Function
• The sigmoid logistic function
YeYf −+
=1
1)(
nn XwXwXwwY +++= ...* 22110
The Logistic Function
-1.2 -0.7 -0.2 0.3 0.8
X0.0
0.2
0.4
0.6
0.8
1.0
Logistic Function for Various Values of w1
w1=-10w1=-5w1=-1w1=1w1=5w1=10
Simple Example: One Hidden Node
Simple Neural Network One Hidden Node
Input Layer Hidden Layer Output Layer(Input Data) (Process Data) (Predicted Value)
Function if Network hasOne Hidden Node
)(101,0 1011)();( Xwwe
XwwfwwXfh +−+=+==
)1
1(3210
10321
1),);,;((Xwwe
wwe
wwwwXff+−+
+−+
=
Development Example:Development Example:Incremental Payments UsedIncremental Payments Usedfor Fittingfor Fitting
1 3 6 8 11 13 16Development Age (Years)
0
10000
20000
30000
Pai
d Lo
sses
Scatterplot of IncrementalPaid Losses
Two Methods for FittingTwo Methods for FittingDevelopment CurveDevelopment Curve
• Neural Networks• Simpler model using only development age for
prediction• More complex model using development age and
accident year• GLM model
• Example uses Poisson regression• Like OLS regression, but does not require
normality• Fits some nonlinear relationships• See England and Verrall, PCAS 2001
The Chain Ladder ModelThe Chain Ladder Model
Cumulative paid:
∑==
j
kikij CD
1
Age to age factor:
ij
jiij D
D 1, +=λ
Estimate of age to age factor using mean:
n
n
iij
j
∑= =1
λλ
Common Approach: TheCommon Approach: TheDeterministic Chain LadderDeterministic Chain Ladder
Estimate of paid at 24 months:
12121224 DDC −= λ
Estimate of Ultimate Paid:
∏==
u
jkikijiu DD λ
GLM ModelGLM ModelA Stochastic Chain Ladder ModelA Stochastic Chain Ladder Model
Poisson Model:
∑ =
=
==
=
n
kk
jiij
jiijij
y
yxCVar
yxmCE
11
][
)(
φ
Data often normalized by dividing by an exposure base
Hidden Nodes for Paid ChainHidden Nodes for Paid ChainLadder ExampleLadder Example
3 8 1 3D e v e lo p m e n t A g e (Y e a rs )
0 .0
0 .2
0 .4
0 .6
0 .8
1 .0
H id d e n N o d e 1H id d e n N o d e 2
O u tp u t o f H id d e n N o d e s
3 8 1 3D e v e lo p m e n t A g e (Y e a rs )
0
2 0
4 0
6 0
Ne
ura
l Ne
two
rk F
itte
d V
alu
e
F itte d V a lu e fo r 2 N o d e s
NN Chain Ladder Model withNN Chain Ladder Model with3 Nodes3 Nodes
0.5 3.0 5.5 8.0 10.5 13.0 15.5Development Age
0.00
20.00
40.00
60.00
80.00
Neu
ral N
etw
ork
Fitte
d
3 Node Neural Network Fitted
Universal FunctionApproximator
• The feedforward neural network withone hidden layer is a universal functionapproximator
• Theoretically, with a sufficient numberof nodes in the hidden layer, anycontinuous nonlinear function can beapproximated
Neural Network Curve with DevNeural Network Curve with DevAge and Accident YearAge and Accident Year
0.5 3.0 5.5 8.0 10.5 13.0 15.5Development Age
0
50
100
150
Neu
ral F
itted
PP
E
1980
1980
1981
1980
1981
1982
19801981
1982
1983
1980
19811982
1983
1984
1980
1981
19821983
1984
1985 1980
1981
1982
1983
1984
1985
1986
1980
1981
1982
1983
1984
1985
1986
1987
19801981
1982
1983
1984
1985
1986
1987
1988
198019811982
1983
1984
1985
1986
1987
1988
1989
1980198119821983
1984
1985
1986
1987
1988
1989
1990
19801981198219831984
1985
1986
1987
1988
1989
1990
1991
198019811982198319841985
1986
1987
1988
1989
1990
1991
1992
1980198119821983198419851986
1987
1988
1989
1990
1991
1992
1993
19801981198219831984198519861987
1988
1989
1990
1991
1992
1993
1994
1982198319841985198619871988
1989
1990
1991
1992
1993
1994
1995
1983198419851986198719881989
1990
1991
1992
1993
19941995
1996
19831984198519861987198819891990
1991
1992
1993
1994
19951996
1997
GLM Poisson RegressionGLM Poisson RegressionCurveCurve
0.5 3.0 5.5 8.0 10.5 13.0 15.5Development Age (Years)
0
50
100
150
200
Cha
in L
adde
r GLM
Fitt
ed
80
8080
8080
8080 80 80 80 80 80 80 80 80
81
8181
81
81
81
8181 81 81 81 81 81 81
82
82
8282
82
82
8282 82 82 82 82 82 82
83
83
83
83
83
83
8383 83 83 83 83 83 83 83
84
84
84
84
84
84
8484 84 84 84 84 84 84
85
85
85
85
85
85
8585 85 85 85 85 85
86
86
86
86
86
86
8686 86 86 86 86
87
87
87
87
87
87
8787 87 87 87
88
88
88
88
88
88
8888 88 88
89
89
89
89
89
89
8989 89
90
90
90
90
90
90
9090
91
91
91
91
91
91
9192
92
92
92
92
92
93
93
93
93
93
94
94
94
94
95
95
95
96
96
97
Chain Ladder GLM FittedUsing Accident Year Symbol
How Many Hidden Nodesfor Neural Network?
• Too few nodes: Don’t fit the curve verywell
• Too many nodes: Overparameterization• May fit noise as well as pattern
How Do We Determine theNumber of Hidden Nodes?
• Use methods that assess goodness of fit• Hold out part of the sample• Resampling
• Bootstrapping• Jacknifing
• Algebraic formula• Uses gradient and Hessian matrices
Hold Out Part of Sample
• Fit model on 1/2 to 2/3 of data
• Test fit of model on remaining data
• Need a large sample
Cross-Validation
• Hold out 1/n (say 1/10) of data• Fit model to remaining data• Test on portion of sample held out• Do this n (say 10) times and average the
results• Used for moderate sample sizes• Jacknifing similar to cross-validation
Bootstrapping
• Create many samples by drawingsamples, with replacement, from theoriginal data
• Fit the model to each of the samples• Measure overall goodness of fit and
create distribution of results• Used for small and moderate sample
sizes
Jackknife of 95% CI for 2 andJackknife of 95% CI for 2 and5 Nodes5 Nodes
0.5 3.0 5.5 8.0 10.5 13.0 15.5Dev.age
0.00
2.00
4.00
6.00 Neural Fitted2Node955Node95
Jacknife Result for 2 and 5 Nodes
Another Complexity of Data:Another Complexity of Data:InteractionsInteractions
3 8 13
3 8 13
Development Age
80
180
80
180
Neu
ral F
itted
PPE
AccYr: 1980 to 1984 AccYr: 1984 to 1989
AccYr: 1989 to 1993 AccYr: 1993 to 1997
Fit of Paid Development by Accident Year
Technical Predictorsof
Stock Price
A Complex MultivariateExample
Stock Prediction: WhichIndicator is Best?
• Moving Averages• Measures of Volatility• Seasonal Indicators
• The January effect• Oscillators
The Data
• S&P 500 Index since 1930• Open• High• Low• Close
Moving Averages
• A very commonly used technical indicator• 1 week MA of returns• 2 week MA of returns• 1 month MA of returns
• These are trend following indicators• A more complicated time series smoother
based on running medians called T4253H
Volatility Measures
• Finance literature suggests volatility ofmarket changes over time
• More turbulent market -> highervolatility
• Measures• Standard deviation of returns• Range of returns• Moving averages of above
Seasonal Effects
158014781637151416261560158815981565164614291602N =
Month Effect On Stock Returns
MONTH
12.0011.00
10.009.00
8.007.00
6.005.00
4.003.00
2.001.00
95%
CI 2
0 da
y re
turn
.02
.01
0.00
-.01
-.02
Oscillators
• May indicate that market is overboughtor oversold
• May indicate that a trend is nearingcompletion
• Some oscillators• Moving average differences• Stochastic
Stochastic and RelativeStrength Index
• Stochastic based on observation that as pricesincrease closing prices tend to be closer toupper end of range• %K = (C – L5)/(H5 – L5)
• C is closing prince, L5 is 5 day low, H5 is 5 day high• %D = 3 day moving average of %K
• RS =(Average x day’s up closes)/(Avg of xday’s down Closes)• RSI = 100 – 100/(1+RS)
Measuring VariableImportance
• Look at weights to hidden layer
• Compute sensitivities:• a measure of how much the predicted
value’s error increases when the variablesare excluded from the model one at a time
Neural Network Result
• Variable ImportanceSmoothed return%K (from stochastic)Smoothed %K2 Week %D1 Week range of returnsSmoothed standard deviationMonth
• R2 was .13 or 13% of variance explained
Understanding Relationships BetweenUnderstanding Relationships BetweenPredictor and Target Variables: A TreePredictor and Target Variables: A TreeExampleExample
|smooth.r<-0.00689442
smoothk<0.0365138
smoothk<-3.25521e-005
marsi<-0.810819
ma.month<0.0113787
smooth.r<0.00444326
k<0.646776-0.004
-0.030 -0.100
-0.010
0.003
0.050 0.010
0.100
What are the Relationshipsbetween the Variables?
Smoothed Return
Neu
ral F
itted
-0.04 -0.02 0.0 0.02 0.04
-0.1
0-0
.05
0.0
0.05
0.10
0.15
Visualization Method forVisualization Method forUnderstanding NeuralUnderstanding NeuralNetwork FunctionsNetwork Functions
• Method was published by Plate et al. inNeural Computation, 2000• Based on Generalized Additive Models• Detailed description by Francis in “Neural
Networks Demystified”, 2001• Hastie, Tibshirini and Friedman present
a similar method
VisualizationVisualization
• Method is essentially a way toapproximate the derivatives of theneural network model
)...|(
)...|()...|(
21,1
2,121
ni
nini
xxxfxxxfxxxf
+
−≈∆
Neural NetworkResult for SmoothedReturn
-0 .0 4 -0 .0 2 0 .0 0 0 .0 2 0 .0 4S m o o the d R e tu rn
-0 .2 0
-0 .1 0
0 .0 0
0 .1 0
Ne
ura
l Fitt
ed
Neural NetworkResult for Oscillator
0.1 0.3 0.5 0.7 0.9 1.1%K
-0.02
-0.01
0.00
0.01
0.02
Pre
dic
ted
N eural N etw ork R esult for % K
Neural NetworkResult for Smoothed Returnand Oscillator
Neural Network Result forStandard Deviation
0.01 0.03 0.05 0.07 0.09
Smoothed Standard Deviation
-0.10
-0.08
-0.06
-0.04
-0.02
0.00
Pre
dict
ed
Neural Network Result for Standard Deviation
Conclusions
• Neural Networks are a lot like conventional statistics• They address some problems of conventional
statistics: nonlinear relationships, correlated variablesand interactions
• Despite black block aspect, we now can interpretthem
• Find further information, including many references,at www.casact.org/aboutcas/mdiprize.htm
• Neural Networks for Statistical Modeling, M. Smith• Reference for Kohonen Networks:
• Visual Explorations in Finance, Guido Debok and TeuvoKohone (Eds), Springer, 1998