a robust and fullv - tilastokeskus · rousseeuw and leroy’s proposal, hut the tut-off value in...
TRANSCRIPT
A robust and fullv efficient regression estimator v
Daniel Gervini AJ. Carranza 1837, del> 10
(1414) Capita1 Federal
Reptiblica Argentina
1 Introduction
In this paper we address the problemof point estimation in the linexr reges-
sion model. We are given a random sample (x 1, yl ) , . . . , (x, , gn) where xi is
a vector of p explanatory variables and yz. is the response variable. They are
linked by the linear relationship
yi = x;e + ui (1)
where 8 f IP’ is the parameter to be estimatad. The error terms {ui} are
i.i.d. unobservable random variables with distribution F’o (*ja) . Without loss
of generality we Will assume thai the scale of & (21.) is 1. In particular, the
variance of F0 is 1 if it exists. The unknown scalc 0 > 0 plays the role of a
nuisance parameter. The distribution of jxiI will be clenotcd I;;It ( .,/u) . The
set of explanstory variables is a,ssumed to be deterministic, hut if {xz 3 is a
random sample stochastically independent of the errors (u+} the results in
Section 4 still hold.
Throughout the paper we will assume that,
Al. Fo is continuous, strictly increasing and symmetric about zero.
A2. If ,Y is the desigx matrix with rows xi; . . . , dTI, then C,, g (X’X) /n
is positive-deC.nite and the sequence of smallest eigenr-alucv of C, is
boundecl away from zero.
It is well known that in this model the least squares cstimator (LSE) of
8 is the maximum likelihood estimatar when Fo is the standard normal dis-
tribution CII and then it attains the minimum asymptotic variance. However,
the LSE is extremely sensitive to at-pical data. A single observat,ion placed
far enough from the bulk of the data can carry the LSE arbitrarily far from
8, no mat ter how big t he sample size is. Sinte no data set in real life fits
exactly to a theoretieal model, the lack of stability of the LSE is a serious
problem in statistical applications. Several estimators that show some sta-
bility in variance and bias under small departures from model (1) have been
praposed in the 1st thirty years- However, same loss in efficiency has been
the price of obtaining this stability.
The least median of squares estimator (LMSE), proposed hy Rousseeuw
(1984)) was the first equivariant regression estimator that attained the max-
imum breakdown point of 1/2. Hut this estimatar has an important draw-
back: its rate of convergence is nA1i3 (see Davies, 1990) and hence its relative
eflfciency with respect to the LSE is 0. In order to get a more efficient
estimatar, Rousseeuw and Leroy (1987) suggested computing it weighted
LSE (UTSE), skipping th ose observations with LMSE absolute standard-
ized residuaIs greater than some frxed tut-off value. However, He and Port-
noy (1992) 71 s iowed that the rate of convergence of this WLSE is still r?/‘,
even though the weighting step does improve onthe asymptotic variance.
The first broa.d elass of high breakdown regression estimators that are
asymptotically normal at rate rn,-li2 was given by S-estimators, proposed by
Rousseeuw and Yohai (1984). Unfortunately, S--est,imators eannot achieve
high breakdown point and high efficiency simultaneorrsly. Regression es-
timators that rea& a nearly optimal efficiency and maximum breakdown
point at the same time are the Mh! estimators (Yohai, 1987) <and the r--
estimators (Yohai and Zamar, 1988). N o t ice, however, that those estimators
never achieve the nztinzwm asymptotic efficiency.
In this paper 1 introduce an estimator that attains the maximum break-
down point and is fully efficient at the same time. It is a WLSE that resembles
Rousseeuw and Leroy’s proposal, hut the tut-off value in the weighting step
is calculated from the data instead of being Iixed.
The new estimatar is defined in Section 2. Sections 3 and 4 analyxe its
robust and asymptotic behavior, respectively. Finally, Section 5 focuses on
the special case of the LMS as the initial estimatar. The proofs of the results
in this paper, together with some additional details, are gi\-en in Gervini
(1998).
2
2 The REWLS estimator
Given initial robust estimators Th of 6 and
standarrlized residuals
1’, = (% - <TCA z 7
5,
s, of CT, let us consider the
(2)
Lf an observation (xi. ,yi) has a Yarge” 1 ri 1 : we can suspet t that it is an
outlier and then it should be downweighted or eliminated. ff we assumed
that & = Q, it would seem reasonable to consider outliers those points with
1 ri j 2 2: sav. So we could define the weights
and the WLSE would be Tr, = (X’VVX) -’ X’IVY: where W = diag (u:~, . . . , wn)
anclY=(y~,...,y,)‘.
Ilhen though this weighting step improves the efficiency of the initial
estimator, it is clear that TI, cannot achieve the maximum efficiency either.
For even when the observations ad&st perfectly to the target model there is
a small probability that the standardiied absolute residuals exceed any hxed
tut-off vak. Therefore, same good data will always be discarded in large
samples. In particular, when To,l is the INSE the corresponding WLSE
remains consistent at rate YI?/~ (see He and Portnoy, 1992). So this one-
step estimatar is still inefficientforlarge samples, even though in practice it
can be much more efhcient than the original estimator for small or moderate
sample sizes.
1 propose a similar weighting scheme, where the tut-off value is computed
from the set of residuals instead of being constant. Let us consider again the
standardized residuals 7.i as in (2) and tbc empirkal distribution function of
their absolute values.
Wien the data follows the central model (l), Fz (t) converges uniformly
to Fz (t ) as n -+ CC. Therefore, we could campare r;,L (t ) with Fc (t) . If
FI (t) < f+; (t> , th e sample proportion of the absolute residuCals that exceed
i is biger than the theoretical proportion, and_hence there might be outliers
in the sample. The proportion of outliers in the sample coulc~ then be mea-
sured by ~up~>~ {max { Fi (2) - Fz (t) , 0} } . However, in practice the actual
distribution of the errors is never known, So we have to use a hypothetical
3