itam-conac mÉtodos estadÍsticos en actuarÍa i dr. …€¦ · itam-conac mÉtodos estadÍsticos...
TRANSCRIPT
ITAM-CONAC
MÉTODOS ESTADÍSTICOS EN ACTUARÍA I
DR. JUAN JOSÉ FERNÁNDEZ DURÁN
EJEMPLO DE REGRESIÓN POISSON Y GAMMA
RECLAMACIONES EN SEGUROS DE AUTOS EN SUECIA
1)DESCRIPCIÓN DE LOS DATOS
Los datos contienen información sobre el número de reclamaciones y montos pagados por dichas reclamaciones reportados por un grupo de 2182 asegurados en Suecia en 1977. Las variables son las siguientes:
1. Kilometres: kilómetros recorridos promedio por año 1=menos de 1000, 2=de 1000 a 15000, 3=de 15000 a 20000, 4=20000 a 25000, 5=más de 25000 (Cual. Ordinal)
2. Zone: zona geográfica 1=Estocolmo, Gotemburgo, Malmo y sus alrededores, 2=Otras ciudades importantes y sus alrededores, 3=Ciudades pequeñas en el sur de Suecia, 4=Áreas rurales en el sur de Suecia, 5=Ciudades pequeñas en el norte de Suecia, 6=Áreas rurales en el norte de Suecia y 7=Gotland (provincia) (Cual. Nominal)
3. Bonus: Número de años más uno desde la última reclamación (Cuant. Discreta)
4. Make: 9 categorías de modelos (Cual. Nominal)
5. Insured: Número de expuestos en años-póliza (Cuant. Continua)
6. Claims: Número de reclamaciones (Cuant. Discreta)
7. Payment: Valor total de los pagos hechos por las reclamaciones en coronas suecas (Cuant. Continua).
2)BASE DE DATOS
Kilometres Zone Bonus Make Insured Claims Payment
1 1 1 1 1 455.13 108 392491
2 1 1 1 2 69.17 19 46221
3 1 1 1 3 72.88 13 15694
4 1 1 1 4 1292.39 124 422201
5 1 1 1 5 191.01 40 119373
6 1 1 1 6 477.66 57 170913
3)ANÁLISIS EXPLORATORIO DE DATOS
Kilometres
1 2 3 4 5
439 441 441 434 427
Zone
1 2 3 4 5 6 7
315 315 315 315 313 315 294
Bonus
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 4.000 4.015 6.000 7.000
Make
1 2 3 4 5 6 7 8 9
245 245 242 238 244 244 242 237 245
Insured
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.01 21.61 81.53 1092.00 389.80 127700.00
Claims
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.00 5.00 51.87 21.00 3338.00
Payment
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 2989 27400 257000 112000 18250000
4)MODELO DE REGRESIÓN POISSON: FRECUENCIA
OFFSET log(Insured)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.214998 0.026158 -46.448 < 2e-16 ***
Zone2 -0.346817 0.020823 -16.655 < 2e-16 ***
Zone3 -0.500711 0.021429 -23.366 < 2e-16 ***
Zone4 -0.728748 0.019300 -37.759 < 2e-16 ***
Zone5 -0.516962 0.033035 -15.649 < 2e-16 ***
Zone6 -0.707809 0.026944 -26.270 < 2e-16 ***
Zone7 -0.925981 0.093840 -9.868 < 2e-16 ***
Make2 -0.142938 0.060812 -2.350 0.018749 *
Make3 -0.295632 0.076515 -3.864 0.000112 ***
Make4 -1.092987 0.045508 -24.017 < 2e-16 ***
Make5 0.148346 0.046943 3.160 0.001577 **
Make6 -0.647975 0.039617 -16.356 < 2e-16 ***
Make7 -0.353962 0.063686 -5.558 2.73e-08 ***
Make8 -0.058195 0.097342 -0.598 0.549946
Make9 -0.375614 0.024274 -15.474 < 2e-16 ***
Bonus -0.259094 0.004783 -54.174 < 2e-16 ***
Zone2:Bonus 0.023908 0.003973 6.017 1.77e-09 ***
Zone3:Bonus 0.024206 0.004057 5.967 2.41e-09 ***
Zone4:Bonus 0.031791 0.003633 8.751 < 2e-16 ***
Zone5:Bonus 0.038044 0.006168 6.168 6.90e-10 ***
Zone6:Bonus 0.038753 0.005006 7.741 9.85e-15 ***
Zone7:Bonus 0.038342 0.017200 2.229 0.025806 *
Make2:Bonus 0.049883 0.010368 4.811 1.50e-06 ***
Make3:Bonus 0.023286 0.012659 1.840 0.065840 .
Make4:Bonus 0.064240 0.010080 6.373 1.85e-10 ***
Make5:Bonus -0.003098 0.008620 -0.359 0.719332
Make6:Bonus 0.053004 0.007371 7.190 6.46e-13 ***
Make7:Bonus 0.053923 0.010943 4.927 8.33e-07 ***
Make8:Bonus 0.022022 0.016206 1.359 0.174194
Make9:Bonus 0.053380 0.004361 12.241 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 34070.6 on 2181 degrees of freedom
Residual deviance: 6528.3 on 2152 degrees of freedom
AIC: 14226
Number of Fisher Scoring iterations: 4
MODELO FINAL:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.217342 0.023799 -51.151 < 2e-16 ***
dummy$Zone2 -0.346791 0.020823 -16.654 < 2e-16 ***
dummy$Zone3 -0.500734 0.021430 -23.366 < 2e-16 ***
dummy$Zone4 -0.728840 0.019300 -37.764 < 2e-16 ***
dummy$Zone5 -0.517111 0.033035 -15.653 < 2e-16 ***
dummy$Zone6 -0.708293 0.026943 -26.289 < 2e-16 ***
dummy$Zone7 -0.926013 0.093842 -9.868 < 2e-16 ***
dummy$Make2 -0.140523 0.059831 -2.349 0.018840 *
dummy$Make3 -0.293226 0.075737 -3.872 0.000108 ***
dummy$Make4 -1.090577 0.044189 -24.680 < 2e-16 ***
dummy$Make5 0.133406 0.020250 6.588 4.46e-11 ***
dummy$Make6 -0.645539 0.038094 -16.946 < 2e-16 ***
dummy$Make7 -0.351535 0.062749 -5.602 2.12e-08 ***
dummy$Make8 0.066609 0.031575 2.110 0.034897 *
dummy$Make9 -0.373195 0.021700 -17.198 < 2e-16 ***
Bonus -0.258649 0.004290 -60.288 < 2e-16 ***
I(dummy$Zone2 * Bonus) 0.023910 0.003973 6.018 1.77e-09 ***
I(dummy$Zone3 * Bonus) 0.024224 0.004057 5.972 2.35e-09 ***
I(dummy$Zone4 * Bonus) 0.031829 0.003633 8.762 < 2e-16 ***
I(dummy$Zone5 * Bonus) 0.038086 0.006168 6.175 6.61e-10 ***
I(dummy$Zone6 * Bonus) 0.038868 0.005006 7.765 8.17e-15 ***
I(dummy$Zone7 * Bonus) 0.038362 0.017201 2.230 0.025730 *
I(dummy$Make2 * Bonus) 0.049411 0.010148 4.869 1.12e-06 ***
I(dummy$Make3 * Bonus) 0.022816 0.012479 1.828 0.067493 .
I(dummy$Make4 * Bonus) 0.063770 0.009853 6.472 9.67e-11 ***
I(dummy$Make6 * Bonus) 0.052527 0.007058 7.443 9.87e-14 ***
I(dummy$Make7 * Bonus) 0.053449 0.010735 4.979 6.39e-07 ***
I(dummy$Make9 * Bonus) 0.052907 0.003807 13.898 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 34070.6 on 2181 degrees of freedom
Residual deviance: 6530.4 on 2154 degrees of freedom
AIC: 14224
Number of Fisher Scoring iterations: 4
PROBLEMA: MUY MAL AJUSTE.
POSIBLE CAUSA: SOBREDISPERSIÓN.
4B)MODELO DE REGRESIÓN BINOMIAL NEGATIVA: FRECUENCIA
(SOBREDISPERSIÓN)
OFFSET: log(Insured)
MODELO INICIAL:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.293984 0.056722 -22.813 < 2e-16 ***
dummy$Zone2 -0.336044 0.059090 -5.687 1.29e-08 ***
dummy$Zone3 -0.461724 0.059494 -7.761 8.43e-15 ***
dummy$Zone4 -0.669928 0.056897 -11.774 < 2e-16 ***
dummy$Zone5 -0.565145 0.072162 -7.832 4.82e-15 ***
dummy$Zone6 -0.682021 0.064786 -10.527 < 2e-16 ***
dummy$Zone7 -0.874743 0.125036 -6.996 2.64e-12 ***
dummy$Make2 -0.132673 0.080773 -1.643 0.100478
dummy$Make3 -0.313009 0.095060 -3.293 0.000992 ***
dummy$Make4 -1.028398 0.076786 -13.393 < 2e-16 ***
dummy$Make5 0.172831 0.071460 2.419 0.015581 *
dummy$Make6 -0.612087 0.066626 -9.187 < 2e-16 ***
dummy$Make7 -0.288240 0.083581 -3.449 0.000563 ***
dummy$Make8 -0.031708 0.111370 -0.285 0.775867
dummy$Make9 -0.287987 0.054168 -5.317 1.06e-07 ***
Bonus -0.243607 0.011837 -20.580 < 2e-16 ***
I(dummy$Zone2 * Bonus) 0.026005 0.012227 2.127 0.033426 *
I(dummy$Zone3 * Bonus) 0.019706 0.012277 1.605 0.108482
I(dummy$Zone4 * Bonus) 0.028186 0.011776 2.393 0.016689 *
I(dummy$Zone5 * Bonus) 0.042402 0.014534 2.917 0.003529 **
I(dummy$Zone6 * Bonus) 0.033581 0.013177 2.548 0.010819 *
I(dummy$Zone7 * Bonus) 0.018074 0.024600 0.735 0.462519
I(dummy$Make2 * Bonus) 0.049001 0.015675 3.126 0.001771 **
I(dummy$Make3 * Bonus) 0.030717 0.017603 1.745 0.080982 .
I(dummy$Make4 * Bonus) 0.058438 0.017061 3.425 0.000614 ***
I(dummy$Make5 * Bonus) -0.005051 0.014590 -0.346 0.729177
I(dummy$Make6 * Bonus) 0.049961 0.013871 3.602 0.000316 ***
I(dummy$Make7 * Bonus) 0.042176 0.016297 2.588 0.009654 **
I(dummy$Make8 * Bonus) 0.015368 0.020242 0.759 0.447718
I(dummy$Make9 * Bonus) 0.054562 0.011470 4.757 1.96e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(25.4515) family taken to
be 1)
Null deviance: 6173.6 on 2181 degrees of freedom
Residual deviance: 2145.9 on 2152 degrees of freedom
AIC: 10970
Number of Fisher Scoring iterations: 1
Theta: 25.45
Std. Err.: 1.88
2 x log-likelihood: -10907.93
MODELO FINAL:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.410179 0.035799 -39.391 < 2e-16 ***
dummy$Zone2 -0.224255 0.026550 -8.447 < 2e-16 ***
dummy$Zone3 -0.376511 0.026694 -14.105 < 2e-16 ***
dummy$Zone4 -0.547450 0.025485 -21.481 < 2e-16 ***
dummy$Zone5 -0.378085 0.031798 -11.890 < 2e-16 ***
dummy$Zone6 -0.534189 0.028880 -18.497 < 2e-16 ***
dummy$Zone7 -0.799159 0.056316 -14.191 < 2e-16 ***
dummy$Make2 -0.105520 0.075557 -1.397 0.162543
dummy$Make3 -0.167209 0.036335 -4.602 4.19e-06 ***
dummy$Make4 -1.005102 0.071443 -14.069 < 2e-16 ***
dummy$Make5 0.140875 0.031164 4.520 6.17e-06 ***
dummy$Make6 -0.599903 0.060381 -9.935 < 2e-16 ***
dummy$Make7 -0.271347 0.078673 -3.449 0.000563 ***
dummy$Make9 -0.273990 0.045937 -5.964 2.45e-09 ***
Bonus -0.214524 0.005999 -35.760 < 2e-16 ***
I(dummy$Make2 * Bonus) 0.041209 0.014235 2.895 0.003792 **
I(dummy$Make4 * Bonus) 0.050973 0.015780 3.230 0.001237 **
I(dummy$Make6 * Bonus) 0.044847 0.012245 3.663 0.000250 ***
I(dummy$Make7 * Bonus) 0.036240 0.014936 2.426 0.015251 *
I(dummy$Make9 * Bonus) 0.049116 0.009361 5.247 1.55e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(24.9859) family taken to
be 1)
Null deviance: 6119.9 on 2181 degrees of freedom
Residual deviance: 2151.0 on 2162 degrees of freedom
AIC: 10967
Number of Fisher Scoring iterations: 1
Theta: 24.99
Std. Err.: 1.84
2 x log-likelihood: -10924.67
EJEMPLOS DE FACTORES DE TARIFICACIÓN:
exp(-.214524)=exp(beta_bonus)=0.8069254
exp(2*(-.214524)) =exp(2*beta_bonus)= 0.6511287
exp(3*(-.214524)) =exp(3*beta_bonus)= 0.5254123
exp(0.140875)=exp(dummy$Make5)= 1.151281
exp(-1.005102)=exp(dummy$Make4)= 0.3660073
por ejemplo, para un asegurado con Make=7, bonus=3, Zone=7:
exp(dummy$Make7 * Bonus(3) + Bonus(3)+ dummy$Make7 + dummy$Zone7)=
exp(0.036240*3 -0.214524 *3-0.271347 -0.799159)=exp(-1.605358)= 0.2008176
4)MODELO DE REGRESIÓN GAMMA: SEVERIDAD
OFFSET: log(Claims)
MODELO INICIAL
MODELO INICIAL:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.3258777 0.1586479 52.480 <2e-16 ***
dummy$Zone2 -0.0822650 0.1525932 -0.539 0.5899
dummy$Zone3 0.1186210 0.1515616 0.783 0.4339
dummy$Zone4 0.1513074 0.1509229 1.003 0.3162
dummy$Zone5 0.1158077 0.1640002 0.706 0.4802
dummy$Zone6 0.1831158 0.1575154 1.163 0.2452
dummy$Zone7 -0.0919868 0.2148388 -0.428 0.6686
dummy$Make2 0.0348368 0.1812066 0.192 0.8476
dummy$Make3 0.3288051 0.1896622 1.734 0.0832 .
dummy$Make4 0.1817264 0.1864432 0.975 0.3298
dummy$Make5 0.1117441 0.1770404 0.631 0.5280
dummy$Make6 0.1101039 0.1774434 0.621 0.5350
dummy$Make7 -0.0781639 0.1834040 -0.426 0.6700
dummy$Make8 0.4232748 0.1909293 2.217 0.0268 *
dummy$Make9 0.0276298 0.1711544 0.161 0.8718
Bonus 0.0379201 0.0350455 1.082 0.2794
I(dummy$Zone2 * Bonus) 0.0026408 0.0337621 0.078 0.9377
I(dummy$Zone3 * Bonus) -0.0222333 0.0335572 -0.663 0.5077
I(dummy$Zone4 * Bonus) -0.0160044 0.0335013 -0.478 0.6329
I(dummy$Zone5 * Bonus) 0.0005123 0.0354181 0.014 0.9885
I(dummy$Zone6 * Bonus) -0.0065242 0.0344025 -0.190 0.8496
I(dummy$Zone7 * Bonus) 0.0292067 0.0444205 0.658 0.5109
I(dummy$Make2 * Bonus) -0.0192713 0.0394542 -0.488 0.6253
I(dummy$Make3 * Bonus) -0.0408060 0.0407611 -1.001 0.3169
I(dummy$Make4 * Bonus) -0.0562166 0.0412710 -1.362 0.1733
I(dummy$Make5 * Bonus) -0.0301734 0.0391372 -0.771 0.4408
I(dummy$Make6 * Bonus) -0.0216627 0.0391762 -0.553 0.5804
I(dummy$Make7 * Bonus) 0.0010806 0.0400084 0.027 0.9785
I(dummy$Make8 * Bonus) -0.0402547 0.0411310 -0.979 0.3279
I(dummy$Make9 * Bonus) -0.0175537 0.0379319 -0.463 0.6436
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.6819326)
Null deviance: 899.37 on 1796 degrees of freedom
Residual deviance: 866.55 on 1767 degrees of freedom
AIC: 42373
Number of Fisher Scoring iterations: 7
MODELO FINAL:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.51167 0.02537 335.537 < 2e-16 ***
dummy$Zone2 -0.13064 0.05362 -2.436 0.01493 *
dummy$Zone6 0.09550 0.05613 1.701 0.08904 .
dummy$Make3 0.18883 0.06576 2.871 0.00414 **
dummy$Make8 0.28775 0.06658 4.322 1.63e-05 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.684663)
Null deviance: 899.37 on 1796 degrees of freedom
Residual deviance: 874.51 on 1792 degrees of freedom
AIC: 42341
Number of Fisher Scoring iterations: 6
MODELO FINAL LOGNORMAL YA QUE EL MODELO GAMMA PRESENTA PROBLEMAS
EN EL ANÁLISIS DE RESIDUALES
OFFSET ln(Claims) – ln(Insured)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.3074 0.1141 125.360 < 2e-16 ***
dummy$Zone3 0.2324 0.1086 2.139 0.032535 *
dummy$Zone4 0.9436 0.1069 8.825 < 2e-16 ***
dummy$Zone5 -0.7525 0.1181 -6.374 2.34e-10 ***
dummy$Zone7 -2.6751 0.1661 -16.106 < 2e-16 ***
dummy$Make2 -1.5387 0.1547 -9.946 < 2e-16 ***
dummy$Make3 -1.6495 0.1611 -10.241 < 2e-16 ***
dummy$Make4 -1.1665 0.1673 -6.974 4.31e-12 ***
dummy$Make5 -1.6179 0.1546 -10.466 < 2e-16 ***
dummy$Make6 -0.5635 0.1533 -3.676 0.000244 ***
dummy$Make7 -1.7932 0.1565 -11.460 < 2e-16 ***
dummy$Make8 -2.1962 0.1625 -13.515 < 2e-16 ***
dummy$Make9 2.0419 0.1482 13.781 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.604 on 1784 degrees of freedom
Multiple R-squared: 0.4786, Adjusted R-squared: 0.4751
F-statistic: 136.4 on 12 and 1784 DF, p-value: < 2.2e-16
FACTORES DE TARIFICACIÓN DEL MODELO LOGNORMAL:
dummy$Zone3 1.261646
dummy$Zone4 2.569209
dummy$Zone5 0.471191
dummy$Zone7 0.068902
dummy$Make2 0.214661
dummy$Make3 0.192155
dummy$Make4 0.311443
dummy$Make5 0.198311
dummy$Make6 0.569236
dummy$Make7 0.166418
dummy$Make8 0.111222
dummy$Make9 7.705244