a robust and fullv - tilastokeskus · rousseeuw and leroy’s proposal, hut the tut-off value in...

14
A robust and fullv efficient regression estimator v Daniel Gervini AJ. Carranza 1837, del> 10 (1414) Capita1 Federal Reptiblica Argentina 1 Introduction In this paper we address the problemof point estimation in the linexr reges- sion model. We are given a random sample (x 1, yl ) , . . . , (x, , gn) where xi is a vector of p explanatory variables and yz.is the response variable. They are linked by the linear relationship yi = x;e + ui (1) where 8 f IP’ is the parameter to be estimatad. The error terms {ui} are i.i.d. unobservable random variables with distribution F’o (*ja) . Without loss of generality we Will assume thai the scale of & (21.) is 1. In particular, the variance of F0 is 1 if it exists. The unknown scalc 0 > 0 plays the role of a nuisance parameter. The distribution of jxiI will be clenotcd I;;It ( .,/u) . The set of explanstory variables is a,ssumed to be deterministic, hut if {xz 3 is a random sample stochastically independent of the errors (u+} the results in Section 4 still hold. Throughout the paper we will assume that, Al. Fo is continuous, strictly increasing and symmetric about zero. A2. If ,Y is the desigx matrix with rows xi; . . . , dTI, then C,, g (X’X) /n is positive-deC.nite and the sequence of smallest eigenr-alucv of C, is boundecl away from zero. It is well known that in this model the least squares cstimator (LSE) of 8 is the maximum likelihood estimatar when Fo is the standard normal dis- tribution CII and then it attains the minimum asymptotic variance. However,

Upload: buidung

Post on 19-Jul-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

A robust and fullv efficient regression estimator v

Daniel Gervini AJ. Carranza 1837, del> 10

(1414) Capita1 Federal

Reptiblica Argentina

1 Introduction

In this paper we address the problemof point estimation in the linexr reges-

sion model. We are given a random sample (x 1, yl ) , . . . , (x, , gn) where xi is

a vector of p explanatory variables and yz. is the response variable. They are

linked by the linear relationship

yi = x;e + ui (1)

where 8 f IP’ is the parameter to be estimatad. The error terms {ui} are

i.i.d. unobservable random variables with distribution F’o (*ja) . Without loss

of generality we Will assume thai the scale of & (21.) is 1. In particular, the

variance of F0 is 1 if it exists. The unknown scalc 0 > 0 plays the role of a

nuisance parameter. The distribution of jxiI will be clenotcd I;;It ( .,/u) . The

set of explanstory variables is a,ssumed to be deterministic, hut if {xz 3 is a

random sample stochastically independent of the errors (u+} the results in

Section 4 still hold.

Throughout the paper we will assume that,

Al. Fo is continuous, strictly increasing and symmetric about zero.

A2. If ,Y is the desigx matrix with rows xi; . . . , dTI, then C,, g (X’X) /n

is positive-deC.nite and the sequence of smallest eigenr-alucv of C, is

boundecl away from zero.

It is well known that in this model the least squares cstimator (LSE) of

8 is the maximum likelihood estimatar when Fo is the standard normal dis-

tribution CII and then it attains the minimum asymptotic variance. However,

the LSE is extremely sensitive to at-pical data. A single observat,ion placed

far enough from the bulk of the data can carry the LSE arbitrarily far from

8, no mat ter how big t he sample size is. Sinte no data set in real life fits

exactly to a theoretieal model, the lack of stability of the LSE is a serious

problem in statistical applications. Several estimators that show some sta-

bility in variance and bias under small departures from model (1) have been

praposed in the 1st thirty years- However, same loss in efficiency has been

the price of obtaining this stability.

The least median of squares estimator (LMSE), proposed hy Rousseeuw

(1984)) was the first equivariant regression estimator that attained the max-

imum breakdown point of 1/2. Hut this estimatar has an important draw-

back: its rate of convergence is nA1i3 (see Davies, 1990) and hence its relative

eflfciency with respect to the LSE is 0. In order to get a more efficient

estimatar, Rousseeuw and Leroy (1987) suggested computing it weighted

LSE (UTSE), skipping th ose observations with LMSE absolute standard-

ized residuaIs greater than some frxed tut-off value. However, He and Port-

noy (1992) 71 s iowed that the rate of convergence of this WLSE is still r?/‘,

even though the weighting step does improve onthe asymptotic variance.

The first broa.d elass of high breakdown regression estimators that are

asymptotically normal at rate rn,-li2 was given by S-estimators, proposed by

Rousseeuw and Yohai (1984). Unfortunately, S--est,imators eannot achieve

high breakdown point and high efficiency simultaneorrsly. Regression es-

timators that rea& a nearly optimal efficiency and maximum breakdown

point at the same time are the Mh! estimators (Yohai, 1987) <and the r--

estimators (Yohai and Zamar, 1988). N o t ice, however, that those estimators

never achieve the nztinzwm asymptotic efficiency.

In this paper 1 introduce an estimator that attains the maximum break-

down point and is fully efficient at the same time. It is a WLSE that resembles

Rousseeuw and Leroy’s proposal, hut the tut-off value in the weighting step

is calculated from the data instead of being Iixed.

The new estimatar is defined in Section 2. Sections 3 and 4 analyxe its

robust and asymptotic behavior, respectively. Finally, Section 5 focuses on

the special case of the LMS as the initial estimatar. The proofs of the results

in this paper, together with some additional details, are gi\-en in Gervini

(1998).

2

2 The REWLS estimator

Given initial robust estimators Th of 6 and

standarrlized residuals

1’, = (% - <TCA z 7

5,

s, of CT, let us consider the

(2)

Lf an observation (xi. ,yi) has a Yarge” 1 ri 1 : we can suspet t that it is an

outlier and then it should be downweighted or eliminated. ff we assumed

that & = Q, it would seem reasonable to consider outliers those points with

1 ri j 2 2: sav. So we could define the weights

and the WLSE would be Tr, = (X’VVX) -’ X’IVY: where W = diag (u:~, . . . , wn)

anclY=(y~,...,y,)‘.

Ilhen though this weighting step improves the efficiency of the initial

estimator, it is clear that TI, cannot achieve the maximum efficiency either.

For even when the observations ad&st perfectly to the target model there is

a small probability that the standardiied absolute residuals exceed any hxed

tut-off vak. Therefore, same good data will always be discarded in large

samples. In particular, when To,l is the INSE the corresponding WLSE

remains consistent at rate YI?/~ (see He and Portnoy, 1992). So this one-

step estimatar is still inefficientforlarge samples, even though in practice it

can be much more efhcient than the original estimator for small or moderate

sample sizes.

1 propose a similar weighting scheme, where the tut-off value is computed

from the set of residuals instead of being constant. Let us consider again the

standardized residuals 7.i as in (2) and tbc empirkal distribution function of

their absolute values.

Wien the data follows the central model (l), Fz (t) converges uniformly

to Fz (t ) as n -+ CC. Therefore, we could campare r;,L (t ) with Fc (t) . If

FI (t) < f+; (t> , th e sample proportion of the absolute residuCals that exceed

i is biger than the theoretical proportion, and_hence there might be outliers

in the sample. The proportion of outliers in the sample coulc~ then be mea-

sured by ~up~>~ {max { Fi (2) - Fz (t) , 0} } . However, in practice the actual

distribution of the errors is never known, So we have to use a hypothetical

3