data mining: neural network analytics and... · data mining: neural network applications by louise...

Data Mining:Neural Network

Applications

by Louise Francis

CAS Annual Meeting, Nov 11, 2002Francis Analytics and Actuarial Data Mining, Inc.

[email protected]

Objectives of Presentation

• Introduce insurance professionals to neuralnetworks

• Show that neural networks are a lot like someconventional statistics

• Indicate where use of neural networks mightbe helpful

• Show practical examples of using neuralnetworks

• Show how to interpret neural networkmodels

A Common ActuarialApplication:Loss DevelopmentAccident Months of Development

Year 12 24 36 48 60 72 84 96 108 120 132 144 156 168 180

1980 267 1,975 4,587 7,375 10,661 15,232 17,888 18,541 18,937 19,130 19,189 19,209 19,234 19,234 19,246

1981 310 2,809 5,686 9,386 14,884 20,654 22,017 22,529 22,772 22,821 23,042 23,060 23,127 23,127 23,127

1982 370 2,744 7,281 13,287 19,773 23,888 25,174 25,819 26,049 26,180 26,268 26,364 26,371 26,379 26,397

1983 577 3,877 9,612 16,962 23,764 26,712 28,393 29,656 29,839 29,944 29,997 29,999 29,999 30,049 30,049

1984 509 4,518 12,067 21,218 27,194 29,617 30,854 31,240 31,598 31,889 32,002 31,947 31,965 31,986

1985 630 5,763 16,372 24,105 29,091 32,531 33,878 34,185 34,290 34,420 34,479 34,498 34,524

1986 1,078 8,066 17,518 26,091 31,807 33,883 34,820 35,482 35,607 35,937 35,957 35,962

1987 1,646 9,378 18,034 26,652 31,253 33,376 34,287 34,985 35,122 35,161 35,172

1988 1,754 11,256 20,624 27,857 31,360 33,331 34,061 34,227 34,317 34,378

1989 1,997 10,628 21,015 29,014 33,788 36,329 37,446 37,571 37,681

1990 2,164 11,538 21,549 29,167 34,440 36,528 36,950 37,099

1991 1,922 10,939 21,357 28,488 32,982 35,330 36,059

1992 1,962 13,053 27,869 38,560 44,461 45,988

1993 2,329 18,086 38,099 51,953 58,029

1994 3,343 24,806 52,054 66,203

1995 3,847 34,171 59,232

1996 6,090 33,392

1997 5,451

An Example of a NonlinearAn Example of a NonlinearFunctionFunction

1 3 6 8 11 13 16D eve lopm ent Age

0

20000

40000

60000

Cum

ulat

ive

Pai

d Lo

sses

S catterp lo t o f C um ulative P aid Losses

Conventional Statistics:Regression

• One of the most common methods offitting a function is linear regression

• Models a relationship between twovariables by fitting a straight linethrough points

• Minimizes a squared deviation betweenan observed and fitted value

Neural Networks

• Also minimizes squared deviationbetween fitted and actual values

• Can be viewed as a non-parametric,non-linear regression

The Feedforward NeuralNetwork

Three Layer N eural N etw ork

Input La ye r Hidde n La ye r Output La ye r(Input Da ta ) (P roce ss Da ta ) (P re dicte d V a lue )

The Activation Function

• The sigmoid logistic function

YeYf −+

=1

1)(

nn XwXwXwwY +++= ...* 22110

The Logistic Function

-1.2 -0.7 -0.2 0.3 0.8

X0.0

0.2

0.4

0.6

0.8

1.0

Logistic Function for Various Values of w1

w1=-10w1=-5w1=-1w1=1w1=5w1=10

Simple Example: One Hidden Node

Simple Neural Network One Hidden Node

Input Layer Hidden Layer Output Layer(Input Data) (Process Data) (Predicted Value)

Function if Network hasOne Hidden Node

)(101,0 1011)();( Xwwe

XwwfwwXfh +−+=+==

)1

1(3210

10321

1),);,;((Xwwe

wwe

wwwwXff+−+

+−+

=

Development Example:Development Example:Incremental Payments UsedIncremental Payments Usedfor Fittingfor Fitting

1 3 6 8 11 13 16Development Age (Years)

0

10000

20000

30000

Pai

d Lo

sses

Scatterplot of IncrementalPaid Losses

Two Methods for FittingTwo Methods for FittingDevelopment CurveDevelopment Curve

• Neural Networks• Simpler model using only development age for

prediction• More complex model using development age and

accident year• GLM model

• Example uses Poisson regression• Like OLS regression, but does not require

normality• Fits some nonlinear relationships• See England and Verrall, PCAS 2001

The Chain Ladder ModelThe Chain Ladder Model

Cumulative paid:

∑==

j

kikij CD

1

Age to age factor:

ij

jiij D

D 1, +=λ

Estimate of age to age factor using mean:

n

n

iij

j

∑= =1

λλ

Common Approach: TheCommon Approach: TheDeterministic Chain LadderDeterministic Chain Ladder

Estimate of paid at 24 months:

12121224 DDC −= λ

Estimate of Ultimate Paid:

∏==

u

jkikijiu DD λ

GLM ModelGLM ModelA Stochastic Chain Ladder ModelA Stochastic Chain Ladder Model

Poisson Model:

∑ =

=

==

=

n

kk

jiij

jiijij

y

yxCVar

yxmCE

11

][

)(

φ

Data often normalized by dividing by an exposure base

Hidden Nodes for Paid ChainHidden Nodes for Paid ChainLadder ExampleLadder Example

3 8 1 3D e v e lo p m e n t A g e (Y e a rs )

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

H id d e n N o d e 1H id d e n N o d e 2

O u tp u t o f H id d e n N o d e s

3 8 1 3D e v e lo p m e n t A g e (Y e a rs )

0

2 0

4 0

6 0

Ne

ura

l Ne

two

rk F

itte

d V

alu

e

F itte d V a lu e fo r 2 N o d e s

NN Chain Ladder Model withNN Chain Ladder Model with3 Nodes3 Nodes

0.5 3.0 5.5 8.0 10.5 13.0 15.5Development Age

0.00

20.00

40.00

60.00

80.00

Neu

ral N

etw

ork

Fitte

d

3 Node Neural Network Fitted

Universal FunctionApproximator

• The feedforward neural network withone hidden layer is a universal functionapproximator

• Theoretically, with a sufficient numberof nodes in the hidden layer, anycontinuous nonlinear function can beapproximated

Neural Network Curve with DevNeural Network Curve with DevAge and Accident YearAge and Accident Year

0.5 3.0 5.5 8.0 10.5 13.0 15.5Development Age

0

50

100

150

Neu

ral F

itted

PP

E

1980

1980

1981

1980

1981

1982

19801981

1982

1983

1980

19811982

1983

1984

1980

1981

19821983

1984

1985 1980

1981

1982

1983

1984

1985

1986

1980

1981

1982

1983

1984

1985

1986

1987

19801981

1982

1983

1984

1985

1986

1987

1988

198019811982

1983

1984

1985

1986

1987

1988

1989

1980198119821983

1984

1985

1986

1987

1988

1989

1990

19801981198219831984

1985

1986

1987

1988

1989

1990

1991

198019811982198319841985

1986

1987

1988

1989

1990

1991

1992

1980198119821983198419851986

1987

1988

1989

1990

1991

1992

1993

19801981198219831984198519861987

1988

1989

1990

1991

1992

1993

1994

1982198319841985198619871988

1989

1990

1991

1992

1993

1994

1995

1983198419851986198719881989

1990

1991

1992

1993

19941995

1996

19831984198519861987198819891990

1991

1992

1993

1994

19951996

1997

GLM Poisson RegressionGLM Poisson RegressionCurveCurve

0.5 3.0 5.5 8.0 10.5 13.0 15.5Development Age (Years)

0

50

100

150

200

Cha

in L

adde

r GLM

Fitt

ed

80

8080

8080

8080 80 80 80 80 80 80 80 80

81

8181

81

81

81

8181 81 81 81 81 81 81

82

82

8282

82

82

8282 82 82 82 82 82 82

83

83

83

83

83

83

8383 83 83 83 83 83 83 83

84

84

84

84

84

84

8484 84 84 84 84 84 84

85

85

85

85

85

85

8585 85 85 85 85 85

86

86

86

86

86

86

8686 86 86 86 86

87

87

87

87

87

87

8787 87 87 87

88

88

88

88

88

88

8888 88 88

89

89

89

89

89

89

8989 89

90

90

90

90

90

90

9090

91

91

91

91

91

91

9192

92

92

92

92

92

93

93

93

93

93

94

94

94

94

95

95

95

96

96

97

Chain Ladder GLM FittedUsing Accident Year Symbol

How Many Hidden Nodesfor Neural Network?

• Too few nodes: Don’t fit the curve verywell

• Too many nodes: Overparameterization• May fit noise as well as pattern

How Do We Determine theNumber of Hidden Nodes?

• Use methods that assess goodness of fit• Hold out part of the sample• Resampling

• Bootstrapping• Jacknifing

• Algebraic formula• Uses gradient and Hessian matrices

Hold Out Part of Sample

• Fit model on 1/2 to 2/3 of data

• Test fit of model on remaining data

• Need a large sample

Cross-Validation

• Hold out 1/n (say 1/10) of data• Fit model to remaining data• Test on portion of sample held out• Do this n (say 10) times and average the

results• Used for moderate sample sizes• Jacknifing similar to cross-validation

Bootstrapping

• Create many samples by drawingsamples, with replacement, from theoriginal data

• Fit the model to each of the samples• Measure overall goodness of fit and

create distribution of results• Used for small and moderate sample

sizes

Jackknife of 95% CI for 2 andJackknife of 95% CI for 2 and5 Nodes5 Nodes

0.5 3.0 5.5 8.0 10.5 13.0 15.5Dev.age

0.00

2.00

4.00

6.00 Neural Fitted2Node955Node95

Jacknife Result for 2 and 5 Nodes

Another Complexity of Data:Another Complexity of Data:InteractionsInteractions

3 8 13

3 8 13

Development Age

80

180

80

180

Neu

ral F

itted

PPE

AccYr: 1980 to 1984 AccYr: 1984 to 1989

AccYr: 1989 to 1993 AccYr: 1993 to 1997

Fit of Paid Development by Accident Year

Technical Predictorsof

Stock Price

A Complex MultivariateExample

Stock Prediction: WhichIndicator is Best?

• Moving Averages• Measures of Volatility• Seasonal Indicators

• The January effect• Oscillators

The Data

• S&P 500 Index since 1930• Open• High• Low• Close

Moving Averages

• A very commonly used technical indicator• 1 week MA of returns• 2 week MA of returns• 1 month MA of returns

• These are trend following indicators• A more complicated time series smoother

based on running medians called T4253H

Volatility Measures

• Finance literature suggests volatility ofmarket changes over time

• More turbulent market -> highervolatility

• Measures• Standard deviation of returns• Range of returns• Moving averages of above

Seasonal Effects

158014781637151416261560158815981565164614291602N =

Month Effect On Stock Returns

MONTH

12.0011.00

10.009.00

8.007.00

6.005.00

4.003.00

2.001.00

95%

CI 2

0 da

y re

turn

.02

.01

0.00

-.01

-.02

Oscillators

• May indicate that market is overboughtor oversold

• May indicate that a trend is nearingcompletion

• Some oscillators• Moving average differences• Stochastic

Stochastic and RelativeStrength Index

• Stochastic based on observation that as pricesincrease closing prices tend to be closer toupper end of range• %K = (C – L5)/(H5 – L5)

• C is closing prince, L5 is 5 day low, H5 is 5 day high• %D = 3 day moving average of %K

• RS =(Average x day’s up closes)/(Avg of xday’s down Closes)• RSI = 100 – 100/(1+RS)

Measuring VariableImportance

• Look at weights to hidden layer

• Compute sensitivities:• a measure of how much the predicted

value’s error increases when the variablesare excluded from the model one at a time

Neural Network Result

• Variable ImportanceSmoothed return%K (from stochastic)Smoothed %K2 Week %D1 Week range of returnsSmoothed standard deviationMonth

• R2 was .13 or 13% of variance explained

Understanding Relationships BetweenUnderstanding Relationships BetweenPredictor and Target Variables: A TreePredictor and Target Variables: A TreeExampleExample

|smooth.r<-0.00689442

smoothk<0.0365138

smoothk<-3.25521e-005

marsi<-0.810819

ma.month<0.0113787

smooth.r<0.00444326

k<0.646776-0.004

-0.030 -0.100

-0.010

0.003

0.050 0.010

0.100

What are the Relationshipsbetween the Variables?

Smoothed Return

Neu

ral F

itted

-0.04 -0.02 0.0 0.02 0.04

-0.1

0-0

.05

0.0

0.05

0.10

0.15

Visualization Method forVisualization Method forUnderstanding NeuralUnderstanding NeuralNetwork FunctionsNetwork Functions

• Method was published by Plate et al. inNeural Computation, 2000• Based on Generalized Additive Models• Detailed description by Francis in “Neural

Networks Demystified”, 2001• Hastie, Tibshirini and Friedman present

a similar method

VisualizationVisualization

• Method is essentially a way toapproximate the derivatives of theneural network model

)...|(

)...|()...|(

21,1

2,121

ni

nini

xxxfxxxfxxxf

+

−≈∆

Neural NetworkResult for SmoothedReturn

-0 .0 4 -0 .0 2 0 .0 0 0 .0 2 0 .0 4S m o o the d R e tu rn

-0 .2 0

-0 .1 0

0 .0 0

0 .1 0

Ne

ura

l Fitt

ed

Neural NetworkResult for Oscillator

0.1 0.3 0.5 0.7 0.9 1.1%K

-0.02

-0.01

0.00

0.01

0.02

Pre

dic

ted

N eural N etw ork R esult for % K

Neural NetworkResult for Smoothed Returnand Oscillator

Neural Network Result forStandard Deviation

0.01 0.03 0.05 0.07 0.09

Smoothed Standard Deviation

-0.10

-0.08

-0.06

-0.04

-0.02

0.00

Pre

dict

ed

Neural Network Result for Standard Deviation

Conclusions

• Neural Networks are a lot like conventional statistics• They address some problems of conventional

statistics: nonlinear relationships, correlated variablesand interactions

• Despite black block aspect, we now can interpretthem

• Find further information, including many references,at www.casact.org/aboutcas/mdiprize.htm

• Neural Networks for Statistical Modeling, M. Smith• Reference for Kohonen Networks:

• Visual Explorations in Finance, Guido Debok and TeuvoKohone (Eds), Springer, 1998

data mining: neural network analytics and... · data mining: neural network applications by louise...

Documents