forecasting automobile insurance paid claim costs using econometric and arima models

International Journal of Forecasting 1 (1985) 203-215 North-Holland

203

FORECASTING AUTOMOBILE INSURANCE PAID CLAIM COSTS USING ECONOMETRIC AND ARIMA MODELS

J. David CUMMINS Wharton School, Philadelphra, PA I91 04, USA

Gary L. GRIEPENTROG University of South Carolina, Columbia, SC 29208. USA

Automobile insurance companies in the United States currently utilize simple exponential trend models to forecast paid claim costs, an important variable in ratemaking. This paper tests the performance of econometric and ARIMA models, as well as the current insurance industry method, in forecasting two paid claim cost series. The experiments encompass eight forecast periods ranging from 1974 through early 1983. The results indicate that automobile insurers could significantly improve their forecasts of property damage liability claim costs by adopting econometric models. For bodily inJury liability claim costs, the accuracy of the econometric and insurance industry methods is approximately the same, and both outperform the ARIMA models. Overall, a net gain in accuracy could be achieved by adopting econometric models.

Keywords: Automobile insurance, General insurance, Actuarial models, Insurance, Forecasting, ARIMA models.

1. Introduction

Automobile insurance rates in the United States are subject to approval by state regulatory authorities in about two-fifths of the states and are carefully scrutinized by regulators in about five additional states. ’ In the other states, ratemaking is competitive; but the same ratemaking techniques are used as in the regulated states. Regulatory controversy about rates usually centers on issues such as the fairness of risk classification schemes and the inclusion of investment income in ratemaking. However, imbedded in the ratemaking formulas are a number of relatively obscure but important factors that receive little regulatory attention or public scrutiny. One such factor is the severity trend factor, which involves a forecast of average paid claim costs. ’

Although the details are extremely complicated, in essence the ratemaking formula involves a test of the adequacy of the incurred loss ratio followed by a rate adjustment if the ratio is inappropriate. [See Webb et al. (1981).] The ratio is defined as follows:

l=L/P, (1)

’ This is based on unpublished material from the Alliance of American Insurers (Chicago) and the authors’ interpretation of State Insurance Codes.

* Another of these factors is the loss development factor, which is discussed in Lemaire (1982).

0169-2070/85/$3.30 0 1985, Elsevier Science Publishers B.V. (North-Holland)

204 J. D. Cummins. G. L. Grrepen~rog / Forecasting automobile insurance paid claim costs

where

L = incurred losses, P = premiums earned, and I = the incurred loss ratio.

Incurred losses are an accrual accounting concept, representing losses attributable to coverage provided during a specified period, regardless of when payment is made. A rate increase is indicated if the incurred loss ratio exceeds the expected loss ratio, which is the difference between 1.0 and the proportion of premiums needed for expenses and profit. The rate change factor is given by the following formula:

K=l/(l -e-77), (2)

where

e = the proportion of the premium needed for administrative and marketing expenses (called the expense ratio),

7~ = the proportion of the premium set aside for profit, and K = the rate change factor.

The factor K is then multiplied by the current average rate to give the new average rate. The incurred loss component of the loss ratio is based on observed losses during a recent

one-to-two year experience period. The loss experience is used as an estimate of losses that will occur in the future. Because incurred losses include estimated as well as paid claims and because revised rates based on the analysis will not go into effect immediately, several adjustments to incurred losses are made prior to computing the loss ratio. One important adjustment is loss trending, which is designed to adjust incurred losses for anticipated inflation in claim costs and changes in frequency. [See Cook (1970).] The trending usually runs from the mid-point of the experience period to the expected average claim date under policies subject to the new rates. This is usually a period of six to ten calendar quarters.

The trend adjustment is applied as follows:

~,=~,(1+&4,)‘(1 +&K (3)

where

‘a = trended incurred loss ratio, = observed incurred loss ratio during the experience period, = the predicted quarterly rate of change in average paid-claim costs (severity), = the predicted quarterly rate of change in frequency, = the number of quarters for which losses are being trended,

(‘; + b,)r = the severity trend factor and (1 + &)r = the frequency trend factor.

The severity factor p, is always used, while the frequency adjustment & is often omitted. When both factors are used, the severity factor tends to be dominant in order of magnitude. It is also more likely to be characterized by trends and other predictable patterns. Accordingly, this paper focuses on estimation of the severity factor.

Since the trend factor is applied directly to be observed incurred loss ratio, it can have a

J. D. Cummins, G. L. Griepentrog / Forecasting automobile insurance paid claim costs 205

substantial absolute and relative impact on I and hence on the new rates. For example, an error of 0.01 in the value of (1 + /3,)’ in 1982 would have implied an error of over $200 million in private passenger automobile liability rates and a $150 million error in physical damage rates nationwide. In New York, the corresponding errors would be $18 million and $17.9 million, respectively. An error of this magnitude in the severity trend factor would represent 43 percent of the average underwriting profit rate in automobile liability insurance for the period 1971-1982.

The severity projection factor (/3,) is estimated using either a linear or an exponential least-squares time-trend equation. In recent years, the exponential approach has been prevalent. The estimation equation is

log y, = a + P,t + u,,

where

j, = average paid claim costs for the year ending in quarter t, i.e., the dollar amount of claims paid divided by the number of claims paid in the year ending in quarter t,

t = time, in quarters, t = 1, 2,. . ,12, and

u, = a random disturbance term.

The estimation period for the equation is the twelve quarters immediately preceding the first forecast period, and the estimation is conducted utilizing ordinary least squares (OLS). Moving average (year-ending) data (i.e., the j,) are used in order to smooth seasonal fluctuations. Smoothing occurs because every observation (j,, all 1) includes a full year of data so that seasonal tendencies are averaged out.

In view of the extensive development of forecasting methods during the past few decades, the use of eq. (4) must be regarded as simplistic at best. A wide range of more advanced forecasting techniques is available which may yield more accurate results. The purpose of this article is to test the performance of two promising alternatives: econometric and ARIMA modeling. Econometric models were chosen for analysis because automobile insurance claims pay for goods and services whose prices are closely related to economic indices. For example, bodily injury liability claims pay for medical bills and lost wages while property damage claims cover the costs of automobiles, automobile parts, and repair-shop labor. Increases in these costs should be directly reflected in claim costs since the insurance mechanism merely passes through the costs of accidents to the insurance pool. ARIMA models also are a natural choice because they represent a more sophisticated version of the time-trending methods currently in use by the insurance industry.

A viable forecasting methodology for automobile insurance should be as simple as possible and easy to use. Simplicity is desirable because ratemaking methods must be explained to regulators and judges in the regulated states. Ease of application is important because large numbers of forecasts are carried out (e.g., for as many as five different automobile insurance sublines on a state-by-state basis) and because the statistical expertise readily available to most insurance companies is rather limited. Hence, the ideal model would have a straightforward intuitive interpretation and a small number of parameters which could be estimated rather mechanically using standard statistical packages. These criteria favor simple econometric models over ARIMA models with comparable forecast accuracy. On the other hand, ARIMA models have the advantage of not requiring forecast values of an independent variable, a potential source of error.

Previous forecasting research in other fields provides another a priori rationale for favoring simpler econometric models, i.e., simpler models typically outperform more complex models in forecasting experiments. [Armstrong (1978, 204-211).] Another relevant finding is that econometric models tend to outperform extrapolative models (such as time trend models) in long-term forecasts (forecasting

206 J. D. Cummins, G. L. Criepentrog / Forecasting automobde insurance paid claim costs

horizon of three years or more) but not in short-term forecasts (one year or less). [Armstrong (1978, 363-382).] With a two-year forecast horizon, one would expect econometric forecasts to be better than extrapolative forecasts but perhaps not substantially so. However, as explained above, a slight improvement in the accuracy of automobile insurance trend factors would be well worth the effort.

Previous research on econometric forecasting of automobile insurance paid claim costs has been conducted by Cummins and Powell (1980). ’ Simple econometric models relating claim costs to the private-sector wage rate were shown to outperform eq. (4) over several forecast periods in the early and mid-1970’s. The present article extends the Cummins-Powell results by testing ARIMA as well as econometric models, using a consistent econometric specification for all periods tested (consistency is important in defending a model in regulatory proceedings), and incorporating more recent forecast periods. The data and the forecasting experiments are explained in more detail in section 2, following a discussion of equation specification.

2. Equation specification and estimation techniques

2.1. ARIMA models

The standard ARIMA specification is utilized, i.e.,

@(B)T(BS)(l -B)“(l -BS)Dy,=S+e(B)A(BS)u,, (5)

where

B = the backward shift operator, i.e., Bqx, = x,-(, $(B) = (1 -a,B-a$‘- . . . -a,B”), O(B) = (b,+b,B+b,B’+ . . . +&/BY), r(BS) = 1 -r,BS-I’,BZS- . . . -&Bsp, A(BS) = 1 -A,BS-A2BZS- . . . -A,BSQ, S = 1 + the number of periods between the occurrence of seasonal effects. E.g., if the seasonal

effect occurs in the fourth quarter in a quarterly process, S = 4, s = a trend parameter, Y, = f (claim costs), and U, = a random error term.

The model represented by eq. (5) is a general multiplicative seasonal model of the ARIMA class of order ( p, d, q) x (P, D, Q),. Tests also were conducted with transfer function noise models which allow the use of an explanatory variable other than time in an ARIMA context. These experiments were not successful, i.e., the models incorporating an independent variable performed no better in the forecast exercises than the best ARIMA models. Detailed discussions of ARIMA modeling can be found in Box and Jenkins (1976), Nelson (1973) and Newbold (1983). A previous insurance application is described in Miller and Hickman (1973).

Our approach follows the classical three-stage procedure of ARIMA modeling [see Newbold (1983, p. 26)]: (I) Models are selected for testing based on an analysis of the claim cost series, their

’ Economic determinants of claim costs also have been considered by Masterson (1968). He constructed claim cost indices for several lines of insurance by taking weighted averages of governmental price and wage indices. These are general economic indices rather than claim costs. The weights were determined rather arbitrarily and no attempt was made to measure the degree to which the resulting indices correlate with claim cost trends.

J. D. Cumm~ns. G. L. Griepentrog / Forecasting autontobile insurance paid claim costs 201

logarithms, and lower-order differences of both of these series. The analysis involves graphical inspection of the series followed by an examination of sample autocorrelations and partial autocorrelations. (2) Parameters of the models are estimated using a non-linear least-squares routine. (The Statistical Analysis System (SAS) statistical package was used.) (3) Tests are conducted to determine whether the fitted model accurately represents the data. The tests utilized in this article involve an examination of the residuals from the estimated model. Specifically, tests of individual autocorrelations of the residuals recommended by Box and Pierce (1970) as well as the revised portmanteau test discussed in Ljung and Box (1978) are carried out. When more than one model passed the residual tests, the models with the best forecast performance are reported.

2.2. Econometric models

After experimentation, the following econometric model was chosen for comparison with eq. (4) and the ARIMA models:

log y, = P, + P, log u: + u,, (6)

where

y, = average paid claim costs in quarter t, W,= total non-farm private sector compensation per man-hour, and u, = disturbance.

Linear models and models including seasonal dummy variables also were tested. However, eq. (6) registered the best overall forecasting performance over the periods tested.

The equations initially were estimated using OLS. When autocorrelation was present in the residuals, a first-order autoregressive process was hypothesized. and the estimation was carried out using a maximum likelihood procedure suggested by Beach and MacKinnon (1978). Cummins and Powell (1980) also made an adjustment for heteroskedasticity when it was present. This adjustment was not used here because the equations corrected for autocorrelation alone tended to produce better forecasts in the Cummins-Powell experiments than those adjusted for both autocorrelation and heteroskedasticity.

Cummins and Powell tested a wide range of price and wage indices as predictors of claim costs, and other variables were tested as part of the present study. Among the variables tested in the two studies are implicit price deflators for gross national product, personal consumption expenditures, and autos and parts as well as the consumer price index (CPI) (all items), the medical care component of the CPI, and the service sector wage rate. The private sector wage rate is used here because it produced the best results in these experiments and because forecasts were more readily available for this variable than for some of the other indices such as the implicit price deflator for autos and parts.

2.3. Simpler time-trend models

For comparison with the more advanced models, a wide range of simpler time-trend models was tested. In addition to the insurance industry model [eq. (4)], linear, exponential, and quadratic equations were estimated, of the following general form:

z, = P,, + ,B,T+ PrT2 + P,D, + ADZ + kD, + u,, (7)

208 J. D. Cummins, G. L. Griepentrog / Forecasting automobile insurance paid claim costs

where

z, = y, for the linear models and log y, for the exponential models, and T = time, in quarters.

In the exponential equations, & was set equal to zero. Estimation was conducted utilizing generalized least-squares to allow for autocorrelation of the error term u,.

3. The data and the experiments

The data series tested in the study consisted of (total limits) average paid claim costs for automobile bodily injury and property damage liability insurance compiled by the Insurance Services Office (ISO). (A total limits series includes large claims at full value in computing the averages.) The IS0 is the statistical agent for several hundred property-liability insurance companies in the United States, including most of the major independent agency stock companies. The data utilized in the study are countrywide and include all companies reporting to the ISO. Data on bodily injury liability claims were available from the third quarter of 1965 (1965.3) to the first quarter of 1983 (1983.1). The property damage liability series was available from 1954.1 to 1983.1, but the analysis was restricted to the 1965.3-1983.1 data. 4

In testing the insurance-industry model [eq. (4)], average paid claim cost for quarter t is defined as the total dollar amount of claims paid in the year ending in quarter t divided by the number of claims paid during the same period. For the other models, average paid claim cost is defined as the total dollar amount of claims paid in each quarter divided by the number of claims paid in that quarter. The latter measure is preferred on a priori grounds because the use of year-ending data creates or exacerbates autocorrelation and because better adjustment techniques for seasonality are available.

As suggested above, the economic variable utilized in the econometric models is compensation per manhour in the private sector. This series is constructed by Wharton Econometric Forecasting Associates (WEFA), a major forecasting firm, using published and unpublished data from the U.S. Department of Labor, Bureau of Labor Statistics.

The procedure in the experiments was to estimate the forecasting equations over a given estimation period and then to use the equations to forecast claim costs over the eight quarters immediately following the end of the estimation period. An eight-quarter period was used because it is a typical trending period in ratemaking and because the WEFA forecasts of the wage rate cover eight quarters. For the econometric forecasts, actual data were used to estimate the equations, and predicted values of the wage rate were used to generate the forecasts. The predicted values are the estimates that actually were available from WEFA at the end of each estimation period.

For the insurance industry method [eq. (4)], a twelve-quarter estimation period is used. All other estimation periods begin in 1965.3 and end in the second quarter of the terminal year of the period. The starting point was chosen to maximize the number of observations in the bodily injury case. The same estimation periods were used for the property damage series to maintain consistency. The third quarter of each year was arbitrarily selected as the first quarter of the forecast period. This choice should have little or no impact on the results. The eight forecast periods begin in 1974.3, 1975.3, 1976.3 , . . . ,1981.3. (The last forecast period consists of seven rather than eight quarters due to data availability.)

AS a notational convenience in discussing the forecasting experiments, let y, denote the value of average paid claim costs in the last quarter of the etimation period and v,‘+, denote the predicted

4 Readers with an interest in using the data in their own research can obtain further information by contacting the authors.

l/2 RMSPE = Ti c [, ,I, ( Yi:;_i:iij2 1 x 100, (9)

TPCE = (Y/+x -Yr+x) x 1oo Y

(10) 1+X

J. D. Cummins, G. L. Griepentrog / Forecasring automobile insurance paid claim COSIS 209

value in the ith forecast period. For the insurance industry method, y,‘,, is obtained as follows:

Y,ii=Y,(l + ii)‘. (8)

where fiS is the coefficient of t in eq. (4), estimated using ordinary least squares. Note that y,‘,, and y, are quarterly and not year-ending quarterly claim costs, even though year-ending data are used in estimating &. This approach parallels the application of the insurance industry method in a practical ratemaking situation and reflects the fact that the goal of ratemaking is to predict actual costs at the midpoint of the period to which the new rates will apply. Thus, the year-ending data are used as a means to an end (i.e., estimating /3,) and not as an end in themselves. 5 For the other forecasting methods, forecasts of y,‘,, are developed using standard forecasting procedures for ordinary least squares, first-order autoregressive, and ARIMA models. ’

Two measures of forecast accuracy are reported, the root mean-square percentage error (RMSPE) and the total predicted change error (TPCE). The formulas for these statistics are

where Y,+,, Y,‘+,, respectively, are the actual and forecasted values of claim costs during the i th forecast period. All error statistics are based on claim costs regardless of the functional form of the forecasting equation. Thus, anti-logs of forecasts from logarithmic equations were used in eqs. (9) and

(10). ’ The RMSPE is a standard measure for evaluating forecast accuracy. TPCE was chosen as the

most relevant error measure for automobile insurance claim costs because it indicates the percentage error in projected average claim costs at the average claim date for policies subject to the revised rates. If all other factors in the ratemaking formula were precisely accurate, TPCE would indicate the percentage by which the revised rates are too high or too low. Mean absolute percentage errors and root mean square errors also were computed but are not reported because they support the same conclusions about model performance as the RMSPE and TPCE.

4. Empirical results

4. I. Bodily injury liability

An examination of the bodily injury claim cost series, its logarithms, and several transformations of these two series led to the selection of the first difference of the logarithms of the series as the

5 The forecast errors for the insurance industry method also were computed using two other approaches: (1) predictions of year-ending values were compared with actual year-ending values in eqs. (9) and (10); and (2) y/+, was computed as (j$+,/.F,) y,. On the average, TPCE was higher for these methods than for the method given as eq. (8).

’ See Judge et al. (1982, pp. 457-462 and 695-698). The experiments were conducted as if .v,+, were unknown during the forecast period, i.e., when the forecast functions for y,!+, called for values of y,+,-,. j 2 1, falling within the forecast period, forecasted rather than actual values of the variable were used.

’ In a logarithmic linear model, if the error terms are normally distributed so that the dependent variable y, is lognormally distributed, taking logarithms of the forecasted values amounts to predicting the median rather than the mean.

Tahlc I

Forccas~ errors: I3odily injury liahilily claim costs (in pcrccnt:lgcs)

I Y74.3 4.2 3.‘) 4.6 4.2 1075.3 S.6 4.3 3.7 3.3

1 Y76.3 4.x 4.1 3.1 2.x 1 Y77.3 3.3 3.‘) 1.X 1,s 197n.3 5.0 4.0 4.x 4.5

I Y7Y.3 4.1 3.0 5.‘) 6.2 19X0.3 2.x 2.4 2.7 3.0 IYXl.3 5.3 4.4 4.1 4.7

Ahsolute

;Ivcragc (X periods) 4.4 3.x 3.x 3.x

0.7 5.5 0.x 0.3 15.5/6.6 2.0 -1.x - 3.3 - 2.6 10.4/lO.Y

3.7 3.2 0.x 2.0 6.6/10.4 - 2.2 - 0.3 - 3.0 - 3.2 10.9/10.1

- IO.3 - I.4 - 9.5 - x.4 10.4/14.x ~ 7.9 - 2.4 - 10.3 - 10.3 10.2/14.2

0.7 - 0.4 - 4.0 ~ 4.2 14.X/lO.Y

0.0 - X.Y - Y.6 - IO.5 14.2/13.1

3.4 3.0 5.2 5.2

” All l’orccast periods arc eight quartus in Icngth cxccpt for the period hcginning in lYX1.3. which is scvcn quarters in Icny~h.

” tlxponcn[i;ll (imc rrcnd hascd on ywr-ending wlucs of IJIC depcndcnt wriahle. estimated using ordinary Icast squares over (wclve quxlcrh.

dependent variable for the ARIMA models in the bodily injury case. This series appeared stationary in a graphical analysis. An examination of the autocorrelation coefficients of this series suggested that an autoregressive model would be appropriate. After extensive experimentation, two models were selected for the forecasting experiments:

model 1: order ( p, d. y ) = (3, 1~ 0), model 2: order ( p, d. y) = (5, 1, 0).

Model 1 had slightly better statistical properties in the earlier forecast periods, while model 2 was better during the later periods.

The hypothesis of randomness was not rejected by the portmanteau test for either model 1 or model 2 over any estimation period. Nearly all of the first twelve autocorrelation coefficients for the models are within one standard deviation of their mean. (Model 1 is shown in the Appendix (table A.l), while model 2 is available from the authors on request.)

The econometric models for bodily injury liability claim costs that yielded the best forecasting results were log-linear models estimated by ordinary least squares. The autocorrelation coefficients were insignificant over all estimation periods when the maximum likelihood procedure for autocorrelation was used. The best time trend results were obtained using the insurance industry model rather than any of the models based on eq. (7). The econometric models appear in table A.3.

The forecast results for the bodily injury case are shown in table 1. The table presents the RMSPE and TPCE for the two ARIMA models, the econometric model, and the best time-trend model [eq. (4)]. The last column of the table compares the annual rate of claim-cost inflation in the eight

J. D. Cummins, G. L. Griepentrog / Forecasting automobile msurance paid claim costs 211

quarters prior to the forecast period with the rate in the forecast period. For example, the numbers 15.5/6.6 for the first forecast period indicate that the annual inflation rate was 15.5 percent during the eight quarters preceding the forecast period and 6.6 percent during the forecast period itself.

Focusing on the average RMSPE for the six periods tested leads to the conclusion that there is little difference among the models in forecast accuracy. Examination of the TPCE suggests that the econometric and insurance industry models do better thanthe ARIMA models, but that the accuracy of the econometric and industry models is about equal. Wilcoxian signed rank tests [see Lehmann (1975, pp. 123-132)] reveal that the econometric model is superior to both ARIMA models at about the 10 percent significance level, while the insurance industry model is better that the ARIMA 1 model at the 10 percent level but not significantly better than the ARIMA 2 model at that level of significance. ’ However, the test revealed no significant difference in forecast accuracy between the econometric and insurance industry models. The Friedman statistic was used to test the null hypothesis that the four models have the same forecast accuracy against the alternative hypothesis that accuracy differs among the models [Lehmann (1975, p. 262-266)]. The null hypothesis of comparable accuracy could not be rejected using this test.

Although the Friedman statistic suggests that there is no statistically significant difference among the four methods in terms of forecast accuracy, the Wilcoxian tests and visual inspection of the data suggest that the econometric and insurance industry methods are to be preferred over the ARIMA models. The econometric model might be given a slight edge over the insurance industry method because its TPCE is lower in six of the eight periods tested.

4.2. Property damage liability

The first difference of the logarithms of the property damage liability series appeared stationary on the basis of graphical analysis. Hence, the property damage ARIMA modelling was based on this transformation. Experimentation revealed that four-quarter or seasonal differencing would also be appropriate, leading to the analysis of the following series:

Y = (Y, -Y,-1) -(Y,-4 -Y,-513 (11)

where y, = log (claim costs),. Two models were fitted to this series - a seasonal autoregressive model of order 1 and a seasonal moving average model of order 1. More precisely, the two property damage models are as follows (estimated models are shown in table A.2):

model 1: order (p, d, q) x (P, D, Q), = (0, 1, 0) x (1, 1, 0),, model 2: order (p, d, q)x(P, D, Q)s=(O, 1, 0)X(0, 1, 1)4.

For both model 1 and model 2, the hypothesis of random residuals is rejected by the portmanteau test only for one case - model 1 in the estimation period ending in 1977.2. The tests of the individual autocorrelations support the randomness hypothesis although the results are somewhat less satisfactory than in the bodily injury case. For model 1, one of the first twelve autocorrelation coefficients is more than two standard deviations from its mean for the estimation period ending in 1975.2, and two of the coefficients fall outside this confidence range for the estimation period ending in 1977.2. For model 2, one of the first twelve autocorrelations is outside of the two-standard-deviation confidence range for the estimation periods ending in 1975.2, 1976.2, and 1977.2. This is not sufficient reason to

s Strictly speaking, neither the Wilcoxian signed rank test nor the Friedman test is appropriate for testing differences in forecast accuracy from tables 1 and 2. This is because the errors are not likely to be independent from one forecasting period to the next. Thus, the test results should be considered merely suggestive of comparative forecast accuracy.

212 J. D. Cummins, G. L. Griepentrog / Forecasting automobile insurance paid claim costs

Table 2 Forecast errors: Property damage liability claim costs (in percentages)

1st forecast period a

RMSPE

Time Econo- ARIMA 1 d ARIMA 2’ trend ’ metric ’

1974.3 9.7 1.7 9.6 10.6 1975.3 3.7 1.6 1.8 2.2 1976.3 3.9 2.7 1.8 4.6 1977.3 6.0 6.1 6.2 7.2 1978.3 4.5 3.8 1.7 4.0 1979.3 3.1 1.3 7.3 3.7 1980.3 3.6 1.4 3.3 2.6 1981.3 3.7 1.6 3.9 4.9

TPCE Annual

Time Econo- ARIMA 1 d ARIMA 2’ inflation r

trend h metric ’

- 12.6 2.5 - 14.0 - 14.8 5.4,‘12.0 -5.4 0 -1.3 -3.8 8.6/9.7 -5.9 - 3.9 - 1.1 -7.8 12.0,‘12.6 - 8.1 - 8.7 - 8.8 - 10.8 9.7/15.3 - 3.6 - 3.0 1.3 -3.8 12.6/13.1

3.7 -0.4 9.3 3.7 15.3/11.3 6.4 1.7 4.1 3.6 13.0/10.5 6.8 1.5 7.4 7.9 11.3/7.6

Absolute average (8 periods) 4.8 2.5 4.4 5.0 6.6 2.7 5.9 7.0

a All forecast periods are eight quarters in length except for the period beginning in 1981.3, which is seven quarters in length. h Exponential time trend based on year-ending values of the dependent variable, estimated using ordinary least squares over

twelve quarters. ’ Log-linear econometric model with as independent variable the private-sector wage rate, estimated using maximum-likeli-

hood procedure for first-order autocorrelation of the residuals. ’ ARlMA model of order (O,l, 0)X(1, 1, O),. ’ ARIMA model of order (0,1,0)x(0. 1. 1)4. r Number to the left of the slash is the annual inflation rate of claim costs during the eight quarters prior to the forecast

period. The number after the slash is the annual inflation rate of claim costs during the forecast period. Both numbers are geometric means.

reject the models, but it indicates a less satisfactory fit than in the bodily injury case. In the property-damage liability econometric models, the autocorrelation effects are much stronger

than in the bodily injury models and are statistically significant in a every estimation period. Thus, the econometric forecasts are based on log-linear models with a first-order autocorrelation adjustment. The insurance industry time-trend models again outperformed the time-trend models based on eq. (7).

The forecast results for the property damage series are shown in table 2. In this case, the econometric model does substantially better than any of the time-trend models in terms of TPCE. This superiority is confirmed by statistical testing. In pairwise Wilcoxian signed rank tests, the econometric model outperforms the insurance industry and ARIMA 2 models at better than the five percent level and outperforms ARIMA 1 at about the ten percent level. In the Friedman test, the hypothesis of no difference among the four models is rejected at the five percent significance level. Thus, for property damage liability paid claim costs the econometric model clearly produces more accurate forecasts than the other methods.

5. Summary and conclusions

The purpose of the research reported in this article was to test whether econometric and ARIMA models lead to better forecasts of automobile insurance paid claim costs than the method currently in

J. D. Cummins, G. L. Griepentrog / Forecasting automobile insurance paid c/aim costs 213

use by the insurance industry. Two series were tested - private passenger automobile bodily injury and property damage liability paid claim costs. Eight forecast periods were analyzed, beginning in 1974.3, 1975.3 ,..., and 1981.3. The estimation period in each case ended in the quarter preceding the first quarter of the forecast period.

The results indicate that the insurance industry and econometric methods produce better forecasts than ARIMA models for bodily injury liability paid claim costs. In the bodily injury case, there is no statistically significant difference in the forecasting performance of the econometric and insurance industry models, although the former has lower forecast errors in six of eight periods tested. For property damage liability claim costs, the econometric model clearly outperforms both the insurance industry and ARIMA models.

A general conclusion is that the insurance industry should give serious consideration to adopting econometric models for forecasting automobile insurance paid claim costs. These models produce a clear gain in forecast accuracy for property damage liability insurance. The average TPCE for the econometric models over the eight periods tested was 2.7 percent, compared with 6.6 percent for the industry model. Capturing even a portion of this potential gain in forecast accuracy would lead to a substantial quantitative difference in premium collections. For the bodily injury coverage, the results suggest that no clear gain in accuracy would result from adopting econometric methods. However, no accuracy would be lost; and the advantages of consistency, especially in a regulatory context, argue in favor of adopting econometric methods for both bodily injury and property damage liability insurance ratemaking.

Appendix

Table A.1 ARIMA equations: Bodily injury liability, y, = log(claim costs,) - mean[log(claim costs,)]. a

Estimation period

Model 1: (3,l. 0) w, = Y, - Y,-1; w, = 6, + e,w,-,+ +2w,+2 + &w,-3 + u,

x&n K-4

1965.3-1974.2

1965.3-1975.2

1965.3-1976.2

1965.3-1977.2

1965.3-1978.2

1965.3-1979.2

1965.3-1980.2

1965.3-1981.2

- 0.0062

- 0.0021

0

- 0.0025

- 0.0015

- 0.0022

- 0.0020

- 0.0024

- 0.541 ( - 3.45)

- 0.546 (- 3.80)

- 0.546 (- 3.98)

- 0.524 (- 4.01)

- 0.528 ( - 4.27)

-0.532 ( - 4.43)

- 0.513 ( - 4.39)

- 0.544 ( - 4.85)

- 0.470 (- 2.53)

- 0.404 (- 2.59)

- 0.427 ( - 2.76)

- 0.362 (- 2.54)

- 0.352 (- 2.58)

- 0.357 ( - 2.70)

- 0.319 (- 2.48)

-0.317 (-2.53)

- 0.623 ( - 3.46)

- 0.532 ( - 3.69)

- 0.545 (- 3.79)

- 0.537 ( - 4.05)

- 0.540 ( - 4.32)

-0.538 ( - 4.48)

- 0.520 ( - 4.43)

- 0.540 ( - 4.75)

5.02

7.37

6.36

8.14

9.40

8.93

8.50

7.38

15.51

15.51

15.51

15.51

15.51

15.51

15.51

15.51

a Q* is revised portmanteau test statistic. See Ljung and Box (1978). Q* is approximately chi-square distributed with K-4 degrees of freedom, where K is the number of autocorrelations used in computing Q*. All Q*s reported in the table are based on 12 autocorrelation coefficients. Numbers in parentheses below coefficients are r-ratios.

214

Table A.2

J. D. Cummins, G. L. Griepentrog / Forecasiing auiomobile u~surance pard claim costs

ARIMA equations: Property damage liability, .v, = log(claim costs,) - mean[log(claim costs,)]. ’

Estimation period

Model 1: (0,l. 0)x(1, 1. O), Model 2: (0, 1,0)x(0. 1, 1)4 x&n K-2 ~,=(Y,-.v,-,)-(.v,-4-_))(--5) w,=(.~,-Y,-,)-(Y,-,-Y,-5) w,=6-A,w,-,+a, w, = 6 + a, + X,ymI

8 A, Q* 6 A, Q* 1965.3-1974.2 0.0008 - 0.614 11.34 0.0003 0.727 8.11 18.31

( - 3.38) (4.97) 1965.3-1975.2 - 0.0027

1965.3-1976.2 0.0008

1965.3-1977.2 - 0.0003

1965.3-1978.2 - 0.0005

1965.2-1979.2 0.0001

1965.3-1980.2 0.0004

1965.3-1981.2 - 0.0002

(

- 0.768 - 4.32) - 0.602 - 4.24) - 0.584 - 4.33) - 0.551 - 3.92) - 0.505 - 4.04) - 0.473 - 3.74) - 0.418 - 3.32)

17.05 - 0.0020 0.788 15.15 18.31 (5.93)

15.61 - 0.0003 0.772 12.89 18.31 (6.27)

20.25 - 0.0004 0.794 15.64 18.31 (7.63)

9.79 - 0.0008 0.799 8.04 18.31 (7.62)

10.85 - 0.0005 0.733 8.63 18.31 (6.39)

16.36 0.0003 0.763 9.84 18.31 (7.92)

11.31 0 0.800 7.68 18.31 (9.42)

’ Q* is revised portmanteau test statistic. See Ljung and Box (1978). Q* is approximately chi-square distributed with K-2 degrees of freedom, where K is number of autocorrelations included in computing Q*. All Q*s reported in the table are based on 12 autocorrelation coefficients. Numbers in parentheses below coefficients are r-ratios.

Table A.3 Log-linear econometric equations for bodily injury and property damage liability. ’

Estimation period

Bodily injury liability Property damage liability

Constant In(wage) Re2 DW Constant In(wage) R - 2 P

1965.3-1974.2

1965.3-1975.2

1965.3-1976.2

1965.3-1977.2

1965.3-1978.2

1965.3-1979.2

1965.3-1980.2

1965.3-1981.2

5.325 (93.55)

5.357 (114.82)

5.355 (130.95)

5.372 (149.96)

5.388 (170.10)

5.400 (191.75)

5.390 (214.38)

5.391 (235.10)

1.287 (32.06)

1.263 (39.43)

1.265 (46.31)

1.252 (53.74)

1.241 (61.84)

1.232 (70.92)

1.239 (81.99)

1.239 (92.30)

0.97 1.59

0.98 1.66

0.98 1.70

0.98 1.67

0.99 1.70

0.99 1.69

0.99 1.69

0.99 1.79

3.940 1.133 (28.17) (11.56)

3.934 1.139 (35.11) (14.91)

3.949 1.128 (39.88) (17.18)

3.946 1.130 (46.92) (20.80)

3.915 1.154 (53.39) (24.97)

3.864 1.190 (50.73) (25.48)

3.855 1.196 (54.32) (28.28)

3.844 1.203 (59.64) (32.17)

0.99

0.99

0.99

0.99

0.99

0.99

0.99

0.99

0.81 (7.94) 0.80

(8.38) 0.81

(9.02) 0.79

(8.87) 0.76 (8.44) 0.79

(9.71) 0.80

(10.56) 0.80

(10.88)

’ The bodily injury liability claim cost equation was estimated using ordinary least squares; while the property damage liability equation was estimated using a method that corrects for first-order autocorrelation of the residuals.

J. D. Cummins. G. L. Griepentrog / Forecastrng automobile insurunce paid claim costs 215

References

Armstrong, J. Scott. 1978, Long-range forecasting: From crystal ball to computer (Wiley, New York). Beach, C.M. and J.G. MacKinnon. 1978, A maximum likelihood procedure for regression with autocorrelated errors.

Econometrica 46, 51-58. Box, G.E.P. and G.W. Jenkins, 1976. Time series analysis, forecasting, and control. revisal ed. (Holden-Day. San Francisco,

CA). Box, G.E.P. and David A. Pierce, 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average

time series models, Journal of the American Statistical Association 64, 150991525. Chatfield, C., 1975. The analysis of time series: Theory and practice (Wiley, New York). Cook, C.P., 1970. Trend and loss development factors, Proceedings of the Casualty Actuarial Society 57, l-15. Cummins, J. David and David J. Nye, 1981, Inflation and property liability insurance. in: John D. Long, ed., Issues in

insurance II, 2nd ed. (American Institute for Property-Liability Underwriters, Malvern. PA). Cummins, J. David and Alwyn Powell, 1980, The performance of alternative models for forecasting automobile insurance paid

claim costs, ASTIN Bulletin 11, 91-106. Goldberger. A.S., 1962, Best linear unbiased prediction in the generalized linear regression model, Journal of the American

Statistical Association 57 369-375. Judge, George, et al. 1982, Introduction to the theory and practice of econometrics (Wiley, New York). Kmenta, J., 1971, Elements of econometrics (Macmillan, New York). Lehmann, E.L., 1975, Nonparametrics: Statistical methods based on ranks (Holden-Day. San Francisco CA). Lemaire, Jean, 1984 Claims provisions in liability insurance, Journal of Forecasting 1, 3033318. Ljung, G.M. and G.E.P. Box. 1978, On a measure of lack of fit in time series models, Biometrika 65. 297-303. Masterson, Norton E., 1968, Economic factors in liability and property insurance claim costs: 1935-1967. Proceedings of the

Casualty Actuarial Society 55, 61-89. Miller, R.B. and J.C. Hickman, 1973, Time series analysis and forecasting, Transactions of the Society of Actuaries 25,

267-302. Nelson, C.R., 1973, Applied time series analysis of managerial forecasting (Holden-Day, San Francisco, CA). Newbold, Paul, 1983, ARIMA model building and the time series analysis approach to forecasting, Journal of Forecasting 2,

23-35. Webb, Bernard, et al., 1981, Insurance company operations, 2nd ed. (American Institute for Property-Liability Underwriters.

Malvern, PA).

forecasting automobile insurance paid claim costs using econometric and arima models

Documents