mallows' cp criterion and unbiasedness of model selection

11
Journal of Econometrics 45 (1990) 385-395. North-Holland MALLOWS’ C,, CRITERION AND UNBIASEDNESS OF MODEL SELECTION* Masahito KOBAYASHI Kvoro Uniuersity. F&to, J~pun Shinichi SAKATA Doshishrr Universi&. Kyoto, Jupon The model selection based on Mallows’ C,, criterion is biased in the sense that the probability of selecting each from two linear models is not equal to 0.5 even if they have equal mean squared prediction errors. The authors propose a bias correction of the Mallows’ CF rule by means of the second-order asymptotic distribution of the difference of Mallows’ C,, statrstic. The authors also show that the proposed correction is necessary only if the difference of Mallows’ C,, statistic is smaller than one in absolute value. 1. Introduction Researchers in econometrics often have to make a selection from several nonnested linear models. A useful rule for this purpose is to choose the model that has the smallest value of Mallows’ CP statistic (1973) which estimates unbiasedly the mean square prediction error (MSPE), namely the expected value of the squared distance from the mean of the dependent variable to its predicted value. The model with the smallest MSPE is preferable, not because it is considered correct, but because it is considered better. In Cox’s (1961) test, one model is tested against the other, assuming that either of the alternative nonnested linear models is true. In our framework the dependent variable needs not to be generated by the alternative models. Some authors suggested tests for the hypothesis that alternative nonnested models are equally preferable from the viewpoint of the goodness of models. Hotelling (1940) proposed such a test in the simplest case where each model has only one regressor variable. His test was generalized to the multivariate regression analysis by Chow (1980) and Efron (1984) independently. Efron (1984) obtained a confidence interval for the difference of MSPE using the bootstrap method and proposed a test for the hypothesis that the alternative *This research was supported by Grant 62730015 from the Ministry of Education, Japan. The authors are grateful to an anonymous referee for his suggestion of the simpler proof for the bounds of the critical value. 0304-4076/90/$3.50 r’ 1990, Elsevier Science Publishers B.V. (North-Holland)

Upload: masahito-kobayashi

Post on 21-Jun-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mallows' Cp criterion and unbiasedness of model selection

Journal of Econometrics 45 (1990) 385-395. North-Holland

MALLOWS’ C,, CRITERION AND UNBIASEDNESS OF MODEL SELECTION*

Masahito KOBAYASHI

Kvoro Uniuersity. F&to, J~pun

Shinichi SAKATA

Doshishrr Universi&. Kyoto, Jupon

The model selection based on Mallows’ C,, criterion is biased in the sense that the probability of selecting each from two linear models is not equal to 0.5 even if they have equal mean squared prediction errors. The authors propose a bias correction of the Mallows’ CF rule by means of the second-order asymptotic distribution of the difference of Mallows’ C,, statrstic. The authors also show that the proposed correction is necessary only if the difference of Mallows’ C,, statistic is smaller than one in absolute value.

1. Introduction

Researchers in econometrics often have to make a selection from several nonnested linear models. A useful rule for this purpose is to choose the model that has the smallest value of Mallows’ CP statistic (1973) which estimates unbiasedly the mean square prediction error (MSPE), namely the expected value of the squared distance from the mean of the dependent variable to its predicted value. The model with the smallest MSPE is preferable, not because it is considered correct, but because it is considered better. In Cox’s (1961) test, one model is tested against the other, assuming that either of the alternative nonnested linear models is true. In our framework the dependent variable needs not to be generated by the alternative models.

Some authors suggested tests for the hypothesis that alternative nonnested models are equally preferable from the viewpoint of the goodness of models. Hotelling (1940) proposed such a test in the simplest case where each model has only one regressor variable. His test was generalized to the multivariate regression analysis by Chow (1980) and Efron (1984) independently. Efron (1984) obtained a confidence interval for the difference of MSPE using the bootstrap method and proposed a test for the hypothesis that the alternative

*This research was supported by Grant 62730015 from the Ministry of Education, Japan. The authors are grateful to an anonymous referee for his suggestion of the simpler proof for the bounds of the critical value.

0304-4076/90/$3.50 r’ 1990, Elsevier Science Publishers B.V. (North-Holland)

Page 2: Mallows' Cp criterion and unbiasedness of model selection

386 M. Kobqushi und S. Sukatu, Mallows’ C, criierion and unhiusedness

models are equal in terms of MSPE. Chow (1980) used a slightly different measure of the goodness of models, namely the squared distance from the mean of the dependent variable to its projected point upon the subspace spanned by the regressor variables, and obtained the first-order asymptotic distribution of his test statistic under the null hypothesis.

Amemiya (1980) also employed MSPE and developed his prediction crite- rion under some assumptions upon the regression variables and the coeffi- cients. His criterion resembles Akaike’s final prediction error (1969), which is a prototype of Akaike’s information criterion (1974). This famous criterion, though similar to C,, in form and asymptotically equivalent, employs the Kullback-Leibler information as the distance between models instead of MSPE.

Takeuchi (1976) and Sawa and Takeuchi (1977) suggested that Mallows’ C, criterion is biased in the sense that it chooses the simpler model with greater probability in comparing two nested linear models equal in MSPE. Nagata (1987) examined the bias of the CP criterion in comparing three nested models. However, no analytical study on the bias of the CP criterion in comparing nonnested linear models can be found in the literature.

We here propose an asymptotically unbiased criterion of model selection by comparing the difference of Mallows’ CP statistic with the median of its second-order asymptotic distribution. We employ the framework of Efron (1984), and our results are closely related to his in that a criterion of model selection can be transformed to a test for the equality of MSPE, which he considered, by changing the critical value. The essential difference is that we employed the second-order asymptotic distribution in obtaining the critical value, instead of the bootstrap method.

Another important finding is that the asymptotic median of the distribution of our statistic is bounded between - 1 and 1, and hence that Mallows’ CP criterion and our bias-corrected criterion select the same model when Mallows’ CP statistic differs by more than 1 in absolute value. Then, correction is unnecessary when the difference of CP is greater than 1.

In the next section the model is illustrated. The main results are given in section 3, and their algebraic details are given in section 5. For an illustrative purpose a Monte Carlo experiment is reported in section 4.

2. Model and statistic

Let two nonnested sets of regressor variables be denoted by X, and X,, from which we have to make a selection. We assume that X, and X, have no common regressor variables. This assumption can be formally stated as

Page 3: Mallows' Cp criterion and unbiasedness of model selection

M. Kotqushi md S. Sukutu, Muiiows’ C, criterion und unhiusedness 387

where the subspace spanned by the regressor variables of X, be denoted by L( X,) and the origin by 0. The results given in this article are the same even if the models have any common variables, because the difference of C,, or that of MSPE remains the same, on which our results are based. It is also assumed that the true model is a mixture of X, and X,, that is, the dependent variable Y is generated by the following linear model:

~=X,fi,+X~/3~+e=~~+e, e - N,(O, a2Z), (1)

where X, is a TX K, matrix and X, is a TX K, matrix. The sets of regressor variables and the corresponding linear models are denoted by the same symbols X,4 and X,. We also assume that the cross-product matrix of the regressor variables divided by the sample size, T, approaches a certain matrix, as T -+ co.

Let the projection matrix onto L( X,) be denoted by PA. The predicted value of y by the least squares method on X, is then expressed as P,y. The MSPE for the model X,4 is defined by

MSPE, = E[~-P,Y)'(~-P/,Y)]. (4

This is the expected value of the squared distance from the mean of the dependent variable to the predicted value of y using the model X,. It can be easily shown that

MSPE/,=$(I-PA)q+K,02, (3)

where K, is the number of the regressor variables in the model X,. Mallows’ C’ statistic for X., is defined by

C,>=y’(I-P,,)y/s2+2KA-T, (4

where s2 is an unbiased estimator of u2. Then s2CP is an unbiased estimator of the MSPE. In Mallows’ CP criterion the model with the smallest CP is chosen, because it is supposed that this model is most likely to have the smallest MSPE.

We here use as the estimated variance s2 the sum of squared residuals obtained by regressing y upon L([ X, : X,]), divided by T - K, where K is the dimension of L([ X, : X,]). Let the projection matrix onto L([ X, : X,]) be denoted by PAB. Then we have s2 = y’( I - PAB)y/(T - K ), which is dis- tributed independent of PA y, P,y, and P,,y.

The R* criterion is equivalent to employing the first term of CP, namely

Y’(I - 4)r/s2* which always decreases as a new variable is added to X,. The addition of variables, however, increases the last term of C,, KA - T, which

Page 4: Mallows' Cp criterion and unbiasedness of model selection

388 M. Kohu_vushi und S. Sokatu, Mallows’ C,, criterion und unhiusedness

can be regarded as a check on unnecessary inclusion of regressor variables due to an uncritical use of the R* criterion.

3. Main results

Let Mallows’ C/, statistic for the model X, be denoted by CPA and C,, for X, by C,,a. The following criterion of variable selection,

choose X, if CPA -c,,+

choose X, if C,, - C,, > 7, (5)

with the critical value

4 = .v’( PA - p,)3Y/Y’( PA - P,)‘Y, (6)

is unbiased to order T-'/2, that is, this criterion satisfies the inequalities

Pr(X, is chosen 1 MSPE, < MSPE,) > 0.5 + o(T-~'~),

Pr(X, is chosen 1 MSPE, > MSPE,)> 0.5 + o(T-'/~), (7)

where x = o( T-1/2) means that T'/*x -+ 0 as T --, 00. These inequalities follows directly from the definition of unbiasedness that two models with equal MSPE should be chosen with equal probability.

We also see that Mallows’ Cp criterion chooses the simpler model with greater probability from two nested models equal in MSPE, because Mallows’ C/, criterion compares the difference of Cp with 0, instead of 7, and the bias-corrected criterion has 3 = 1 when the model X, is nested in the model X,. This result was first found by Takeuchi (1976) using a numerical approxi- mation.

We can also show that the asymptotic median f has an upper bound 1 and a lower bound - 1, namely

(8)

This implies that Mallows’ Cp criterion and the bias-corrected criterion choose the same model when CPA and CpB differ by more than 1 in absolute value. In this case, we can compare alternative models unbiasedly, without calculating the critical value 7, only by referring to Mallows’ original C, statistic. On the other hand, if the difference of Mallows’ C, statistic is less than 1, the conclusion of the original Cp,criterion may be reversed by the bias-corrected criterion. The use of the critical value p is recommended in this borderline case.

Page 5: Mallows' Cp criterion and unbiasedness of model selection

M. Kohqwshi and S. Sakutu, Mallows’ Cp criterion und unbiasedness 389

Table 1

Empirical probability of selecting the model X, under the condition MSPE, = MSPE,. The regressor variables X,, and X, were orthogonal to X,,, and X,, and X,, were orthogonal to each other. The length of each regressor variable and the variance of the disturbance term were

standardized to be 1.

Angle between

x,4, and xBl

sin oi

Regression coefficients

P Al PA2 Pffl W CPA - q?B ’ 0) W CPA -C,,‘?)

0.1 1.0 1.0 1.0 0.368 0.527 (A.1) 10.0 0.1 1.0 0.412 0.519 (A.2)

1.0 1.41 10.0 0.438 0.502 (A.3) 10.0 1.0 10.0 0.484 0.504 (A.4)

0.9 1.0 1.0 1.0 0.463 0.523 (B.1) 1.0 9.01 10.0 0.493 0.495 (B.2)

10.0 1.0 10.0 0.506 0.506 03.3)

500

450

400

350

6 300

B

g 250

2 200

150

100

50

0 -10 0 10 20 30

Difference of Cp

Fig. 1. Distribution of CPA - CPB for case (A.l) in table 1.

4. Sampling experiment

We made a small sampling experiment in order to examine the actual properties of Mallows’ CP criterion and our unbiased criterion under the condition MSPE, = MSPE,, whose result is summarized in table 1. Fig. 1 illustrates graphically the empirical distribution of CPA - CPB in a case where Mallows’ CP criterion was highly biased, and fig. 2 illustrates an almost unbiased case.

Page 6: Mallows' Cp criterion and unbiasedness of model selection

390 M. Kohqvushi ond S. Sakata, Mallows’ Cp criterion and unbiasedness

Difference of Cp

Fig. 2. Distribution of CPA - CpB for case (B.3) in table 1.

The model X, had two regressor variables orthogonal to each other, say X,, and XA2, with the regression coefficients /3,i and PAZ, and the model X, had a single regressor variable, say Xai, with the regression coefficient pB1. The variable X,, is orthogonal to X,, and makes an angle (Y with X,,. A small value of (Y then implies that Xa was nearly nested in X,. The length of each regressor variable was standardized to be 1. The standard deviation of the disturbance term, u, was set at 1 without loss of generality, because the asymptotic distribution of C, is dependent only upon the ratio of the regres- sion coefficients to u. The residual degree of freedom employed in estimating a2 was 17, and the sample size was 20. The number of replication was 10,000 and hence the standard error of the estimated probability was 0.005.

In table 1, the use of the estimated median 7 lessened the bias of Mallows’ C criterion, though in some cases the gain was marginal. The merit of the b!as-correction is substantial, when Mallows’ CP criterion was highly biased.

5. Algebraic details

We have only to show that 7 satisfies the condition

P’(qL4 - C,,+ IMSPE, =MSPE,)=0.5 + 0(2+'~), (9)

in order to show that f gives an unbiased criterion defined in (7).

Page 7: Mallows' Cp criterion and unbiasedness of model selection

M. Kohqmhl und S. Sukutu, Mullows’ C, criterion und unhiusedness 391

We first consider the distribution function of

s=s2(cp, - C,,)/[4O%VB - PA)29]1’2, (10)

under the condition of MSPE, = MSPE,, noting that 4a2n’( PB - PA)‘17 is the first-order asymptotic variance of the numerator s2(C,, - C,,). The distribu- tion has to be given asymptotically up to the second-order, namely up to order Tm1i2 because the last term (K, - K,)a2 in the hypothesis 0 = MSPE, - MSPi, = q’( P, - PA)q + (K, - K,)a2 is meaningless in the first-order asymptotic distribution of S.

First, express the numerator of S as

Y’QY + 2( K, - &I) s2 = 11’0~ + 2e’S)n + e’fie + 2( KA - K,)a2

+ 2( K, - K,)(s2 - a2),

where D denotes PR - PA. Note that under the hypothesis MSPE, = MSPE, the first term on the right-hand side of (11) is reduced to (Ks - KA)u2. The second term is of order T112, the third and the fourth term are of constant order, and the fifth term is of order Td112, which is negligible in our article. Then we have

S= So+ S, + o,(T~“~), (12)

where

S, = (2u)-‘G-1/2[e’Qe+ (K, - ~,)a~],

letting G denote q’ti’q and op(T- ‘12) denote a term x such that xT’12 stochastically converges to 0 as T + CQ. Note that So is of constant order and S, is of order T-‘12.

We now give the moment-generating function (MGF) of S asymptotically up to order T-‘j2 and then invert it to obtain the probability density function (PDF) of S using the one-to-one correspondence between the MGF and the PDF.

Let ~(8) denote the MGF of S, namely E[exp(BS)]. Then we have

~(0) = E[exp(BS)! = E(exp(BS,)(l + (AS,)] + o(T-li2). 03)

Page 8: Mallows' Cp criterion and unbiasedness of model selection

392 M. Kohqwhi and S. Sakatu, Mallows’ C, criterion und unhiusedness

The first term of +(8), say $+,(f?), is

%(e) = E[exp(B%)] = exp(e2/2), (14)

because S,, follows the standardized normal distribution. The term of order Tp 1/Z in c$( e), say G,(e), is then expressed as

@t(e) = E[exp(WeS,l

= (2e)-1c-1/2E[exp(eSO)e{ e’9e + (K, - K,)e2}]. (15)

The first term of @t(B), say @,,(O), is

@t,(8) = (2e))1G-1/2E[exp(f3SO)t9e’S2e]

= (a/2)c-l’2e9,(e)(2P)-T’2 (16)

~1% . ..Im exp(-(z-p)‘(z---))/2))z’Qzdzl...dzr, -13 --OS

where

p=ec- 1/2Qn and z =a-‘e.

Regarding z in this integrand as normal variables with mean vector p and unit variance matrix. we have

G’,(e) = (c/2)G-1/28&,(8)[trace(fi) +$+I

= (0/2)c-t/2e+o(e)( K, - K, + e%-hff23& (17)

since we have

trace(Q) = trace( Ps - PA) = trace( P,) - trace( PA)

= K, - K, ,p’Op = 02G-‘q’fi3q.

The second term of @t(O), say $+(O), is simply

+t2(e) = (U2)GP1’2&&NK4 - Ka)-

This term cancels out the first term of &t(S) in (17).

(18)

Page 9: Mallows' Cp criterion and unbiasedness of model selection

M. Kohqushi und S. Sakutu, Muilows’ C,, criterion und unhiasednm 393

Thus we have obtained the following asymptotic MGF for S under the

assumption of MSPE, = MSPE,:

c#I(I~) = exp(d2/2)[1 + (0/2)83G-3’217’5)317] + o(T-“~). (19)

Note that 8’ in the MGF corresponds to (-d/d x)’ in the PDF, because we have

/ E exp(Bx)(-d/dx)g(x)dx _X

= - [exp( 8x)g( x)] F, + 8/= g( x)exp( 0x)d x, PrJo

where g(x) denotes a PDF, by means of integration by parts and the first term on the right-hand side disappears. Then, using the one-to-one correspondence between MGF and PDF, we have the following asymptotic PDF for S under the assumption of MSPE, = MSPE,:

f(x) + (a/2)G-3’2$5)3q( -d/dx)3f(x), (20)

where f(x) is the PDF for the standardized normal distribution, which corresponds to the MGF &,(/3). Then the cumulative distribution function (CDF) of S is

Pr(S<x)=F(~)-(0/2)G~~‘~17’52~17(-d/dx)~f(x)+o(T-~‘~)

= F(x) - (a,‘2)G-3’2n’93_rlf(~)(~2- 1) + o(T+‘~),

(21)

where F(x) is the CDF of the standardized normal distribution. Define the critical value x0 that gives an unbiased criterion of variable

selection by

Pr( S < xg 1 MSPE, = MSPE,) = 0.5.

We then have

F( x,,) - ( a/2)G-3/277’03~f(~0)( x0’ - 1) + o( T-‘j2) = 0.5.

Expanding this CDF into Taylor’s series about 0 and equating terms of order Tp112, we have

f(O)x, + ( u/‘~)G-~‘~T$~~(O) = 0 + o(T-~‘~),

Page 10: Mallows' Cp criterion and unbiasedness of model selection

394 M. Kohqrushi und S. Sukatu, Mallows’ C, criterion und unhiusedness

since F(0) = 0.5. Then, dividing the both sides by f(O), we have

In terms of the original statistic CPA and CPs we have

p&7, - C,, < - ( 02/s2) T@~~/$~~vJ 1 MSPE, = MSPE,)

= 0.5 + o( T-l’*)

Noting that replacing a2/s2 by 1 leaves this equality unchanged up to the required order, we have

pr(C,, - cpB < -q’f23q/q’5)2q 1 MSPE, = MSPE,)

=Os+o(T-I’*). (23)

Thus we have obtained the critical value y = -n’Q3n/$Q2n for the bias-cor- rected criterion of variable selection. By definition, this critical value is the median of the second-order asymptotic distribution of C,, - C,, under the hypothesis MSPE, = MSPE,.

The estimator 7 = -y’ti323y/y’s2*y for y also satisfies the condition of asymptotic unbiasedness

P’(CpA - CpA < 7 1 MSPE, = MSPE,) = 0.5 + o( Tp 1’2), (24)

because the difference between p and y is only stochastically of order T-l/*.

Thus we have had the critical value of the asymptotically unbiased criterion given in section 3.

Next, we show that the critical value of our criterion has a lower bound - 1 and an upper bound 1. Let the nonzero eigenvalues of D = PB - PA be denoted

by X,<X2< ... IA,. where K = K, + K, is the rank of 9. Note that -A, and - X K are an upper bound and a lower bound on the estimated median, that is.

-A, 5 9 = -y’Q3y,/y’fPy 5 -A,.

If we have

-1 I&cc ... rX,<l,

then we can show that 1 and - 1 are also bounds of 7. This inequality is

Page 11: Mallows' Cp criterion and unbiasedness of model selection

M. Kohqushi uttd S. Sakatu, Mullows’ C, criterion and unhiusedness 395

shown as follows: we have - 1 I A, from

x’( P, - PA)X 2 --x'P/$x 2 -x’x for any x,

and A,< 1 from

x’( PB - PA)X I x'P,x I x’x for any x.

References

Akaike. H.. 1969. Fitting autoregressive models for prediction, Annals of the Institute of Statistical Mathematics 21. 243-247.

Akaike. H., 1974. A new look at the statistical model identitication, IEEE Transactions on Automatic Control 19, 716-723.

Amcmiya, T.. 1980, Selections of regressors, International Economic Review 21. 331-354. Chow, G.C., 19X0. The selection of variates for use in prediction: A generalization of Hotelling’s

solution, in: L.R. Klein, M. Nerlove. and SC. Tsiang. eds., Quantitative econometrics and development (Academic Press, New York. NY) 105-114.

Cox, D.R.. 1961, Test of separate families of hypotheses, in: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1 (University of California Press. Berkeley, CA) 105-123.

Efron, B.. 1984, Comparing non-nested linear models. Journal of the American Statistical Association 79, 791-803.

Hotelling, H.. 1940. The selection of variates for use in prediction with some comments on the general problem of nuisance parameters. Annals of Mathematical Statistics 11. 271-283.

Mallows. C.L.. 1973. Some comments on Cp. Technometrics 15. 661-675. Nagata. Y.. 1987, Unbiased variable selection procedure in regression model, Journal of the Japan

Statistical Society 17. 175-184. Sawa. T. and K. Takeuchi. 1977. Unbiased decision rule for the choice of regression models,

Working paper 400 (University of Illinois, Urbana-Champaign, IL). Takeuchi, K.. 1976, On sampling distribution of criteria for selection of independent variables

related with C,-statistic. in: Proceedings of the 9th international biometric conference. Vol. 1 (Biometric Society. Washington. DC) 24-36