least squares and least absolute deviation procedures in approximately linear models

Statistics & Probability Letters 16 (1993) 153-158 North-Holland

27 January 1993

Least squares and least absolute deviation procedures in approximately linear models

Thomas Mathew * University of Maryland Baltimore County, Baltimore, MD, USA

Kenneth Nordstriim * * University of Augsburg, Germany

Received May 1992

Abstract: The Approximately Linear Model, introduced by Sacks and Ylvisaker (1978, The Annals of Statistics), represents deviations from the ideal linear model y = X/3 + e, by considering y = b + X/3 + e, where b is an unknown bias vector whose components are bounded in absolute value, i.e., 1 bi 1 < rir r, being a known nonnegative number. We propose to estimate /3 by minimizing the maximum of a weighted sum of squared deviations, or the sum of absolute deviations, where the maximum is computed subject to I b, I d r,. In the former case the criterion to be minimized turns out to be a linear combination of the least squares and least absolute deviation criteria for the ideal linear model. The estimate of p obtained by the latter approach (i.e., by minimizing the maximum of a weighted sum of absolute deviations) turns out to be independent of the assumed bound rl on b,. This establishes another robustness property of the least absolute deviation criterion.

Keywords: Approximately linear model; least squares; least absolute deviation.

1. Introduction

The concept of an approximately linear model has been introduced by Sacks and Ylvisaker (1978) to deal with certain deviations from an ideal linear model. The model can be described as follows. If y is an n X 1 vector of observations, then the approximately linear model has the form

y=b+X/?+e, (1.1)

where X is a known n X m matrix of rank m (m < n), /3 is an m X 1 vector of unknown parameters, e is the random error vector satisfying E(e) = 0 and Cov(e) = a21 and b is an IZ x 1 bias vector whose ith component bi satisfies

I bi I Gri (i= 1,2 ,..., n), (1.2)

li being a known bound. In the above, u2 > 0 is an unknown parameter. If observations are subject to systematic errors, apart from random noise, it is reasonable to assume that the systematic errors are

Correspondence to: Thomas Mathew, Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, MD 21228, USA.

* Research supported by Grant AFOSR 89-0237 and by a Special Research Initiative Support award from the Designated Research Initiative Fund, University of Maryland Baltimore County.

* * On leave from University of Helsinki, Finland. Research supported by grants of the Deutsche Forschungsgemeinschaft and the Academy of Finland.

0167-7152/93/$06.00 0 1993 - Elsevier Science Publishers B.V. All rights reserved 1.53

Volume 16, Number 2 STATISTICS & PROBABILITY LETTERS 27 January 1993

bounded. Consequently, in the linear model context, (1.1) could be a reasonable way of modeling the data with b representing the systematic errors.

Sacks and Ylvisaker (1978) have provided several examples of situations where (1.1) will adequately describe possible departures from the ideal linear model

y=Xp+e. (I-3)

The model (l.l), with or without the explicit assumption (1.2), has appeared in the literature in connection with various applications. A deterministic version of (1.1) has been used in many engineering applications, where the vector b satisfying (1.2) is referred to as ‘unknown but bounded uncertainty’ (see the review article by Milanese, 1989). The model has been used by Agee, Ellingson and Turner (1985) to estimate the Cartesian coordinates of a vehicle trajectory, using radar measurements, and by Kelly (1990) in the context of aircraft position-fix algorithms.

The inference problem addressed by Sacks and Ylvisaker (1978) is the estimation of /3, or a linear function of p, using a linear estimator that minimizes the maximum mean squared error (MSE). The MSE was maximized over b,‘s satisfying (1.2). The minimax linear estimator computed by Sacks and Ylvisaker (1978) depends on the variance a2 and hence cannot be computed unless a2 is known or an estimator of a2 is available. (In a later paper, Sacks and Ylvisaker (1981) have addressed the problem of estimating v2 in the model (l.l).) Sacks and Ylvisaker (1978) also point out computational difficulties if one is interested in estimating the entire parameter vector B. It should be noted that the aim of the Sacks-Ylvisaker approach is not to obtain a model that best fits the data (according to some criterion), but to obtain a satisfactory estimator of the parameters (according to a minimax criterion). However, there are applications where the goal is to obtain a model that best fits the data; see Micchelli (1989) and the references therein. In either case, a minimax approach is perhaps a satisfactory way of guarding against the possible departures from the ideal model (1.3). The goal of the present paper is to investigate such an approach for fitting the model (1.1).

In Section 2, we consider the minimax least squares approach to the above problem. In this approach, we minimize the maximum of (y - b - X/3>‘W(y - b -X/3), where the maximum is computed subject to (1.2), and the minimization is then done with respect to /3. W is a known diagonal matrix representing prior weights and its choice is discussed in the next section. In turns out that the minimax least squares approach results in a criterion that is a linear combination of the least squares and the least absolute deviation criteria. Such a criterion has already been considerd by Arthanari and Dodge (1981) and Dodge (1984). A quadratic programming approach to minimize such a criterion is outlined in Arthanari and Dodge (1981, p. 101). Thus, our minimax least squares approach does not present any computational difficulty. The estimator of /3 so obtained will be referred to as the minimax least squares estimator of p. Our approach, obviously, does not require a knowledge of u2.

For a very special case of (1.1) with X = 1, (the IZ X 1 vector of ones) and p = p (a scalar), we carried out some simulations to compare the minimax least squares estimator of p, say bLs, with the minimax

linear estimator, say fisu, due to Sacks and Ylvisaker (1978). It turns out that unless the assumed value of a2 (used for computing fis,) is significantly different from the true value, G,, has a significant

advantage over ficLLs, in terms of maximum MSE. In their paper, Sacks and Ylvisaker (1978) do point out that their minimax linear estimator is not significantly affected by moderate departures from the true value of a2. Our computations confirm this fact (the computations are not reported in this paper). However, we would like to point out that the aim of our minimax least squares approach is to fit the model (l.l), rather than obtain estimators having good properties.

The criterion of least absolute deviation (LAD) is well known in the literature in the context of robust estimation (see Huber, 1987; and Rao, 1988). The minimax LAD approach to our problem consists of minimizing

max e wil yi - b, - x:p 1, Ib,I<r, j=,

154

(l-4)


where yi and b, are respectively the ith components of y and b, x1! is the ith row of X and wj’s are nonnegative weights (i = 1, 2,. . . , n). It turns out that any value of p that minimizes Cy= ,wi I yi -xl/3 I also minimizes (1.4). In other words, the minimax LAD estimator of p is independent of the bounds ri on the bias vector b. This establishes another robustness property of the LAD criterion. Details appear in Section 3.

2. Minimax least squares estimation

Consider the model (l.l), where X is assumed to be of full column rank and b satisfies (1.2). We now discuss the estimation of p by minimizing

,$x (y-b-XP)'W(Y-b-XP), I <r,

(2.1)

where

W=diag(w,, w2,...,wn) (2.2)

is a known diagonal matrix of weights wi’s statisfying wi > 0 and Cy= iwi = 1. (The choice of W is discussed later in this section.) Writing

y-xp =u = (Ui, u2 )...) z$)‘, (2.3)

we have

(y-b-X@‘W(y-b-X@=(u-b)‘W(u-b)= ~wjuf-2~wiuihi+ it@;. i=l i=l i=l

(2.4)

The maximum of the expression in (2.4), subject to I 6, I G ri, is attained at

bP = -ri sign( ui), (2.5)

where sign(u,) = + 1 or - 1 depending on whether ui is positive or negative. Noting that ui sign(u,) = I ui I, (2.4) and (2.5) together give

,Ty:r (’ -b)‘W(u -b) = $ Wi( 1 ui I +ri)’ <t i=l

(2.6)

Using the definition of u in (2.3), we finally arrive at the following.

,?,a, (y-b-Xp)‘W(y-b-Xp) , <r,

= ~~i(lYi-XIBI+ri)‘=(y-X~)‘W(y-XCI)+2~w,rilyi-XISI+ tw,rf. (2.7) i=l i=l i=l

From (2.7) it is clear that the minimax least squares approach results in a criterion that is a linear combination of the least squares and least absolute deviation criteria. It is also clear that p minimizing (2.7) exists uniquely, since (2.7) is a strictly convex function of /?. The unique p which minimizes (2.7) will be referred to as the minimax least squares estimator of p.

The criterion (2.7) is introduced in Arthanari and Dodge (1981, p. 101) and the properties of the estimator obtained by minimizing (2.7) is discussed in Dodge (1984) and Dodge and JureEkova (1987, 19881, for the ideal linear model (1.3). In Arthanari and Dodge (19811, the minimization of (2.7) is

155


formulated as a quadratic program. The solution can be obtained somewhat explicitly in the special case when, in the ideal linear model, the yi’s have a common mean p so that Xp = pl,, 1, being the II X 1 vector of ones. In this case (1.1) simplifies to

y=b+pl,+e, (2.8)

where b is as in (1.1). In view of (2.7), the problem now reduces to estimating p by minimizing

n Cwi(IYi-PI+rJ2. P-9)

i=l

Assume without loss of generality that the yi’s satisfy

Y1GYZQ ... GYn. (2.10)

It is clear that p minimizing (2.9) must satisfy y, < p < y,. For s satisfying 1 =G s < IZ, consider the values of p satisfying

Ys GP GYS.1. (2.11)

When (2.11) holds, we have

n s n

Cwi(IYi-PI+r~)2= Cwi(P-Yi+ri)2+ C wi(Yi-P+ri)27 i=l i=l i=s+l

which is a minimum subject to (2.11) at fi, given by

(2.12)

S wiri - C Wiri 1 QYs+l,

i=l

P,= ( Ys

Y s+l \

if y(w) + ; ’ [ f: wiri- h wiri] <ys, (2.13) i=s+l i=l

where J(W) = Cy= ,w, yi. Let

Rz= kwi(jl,-y,+r,)*+ t wi(yi-fis+ri)2y i=l i=s+l

and let sO be such that

R%, = min R:. l<s<n

(2.14)

(2.15)

Then the minimax least squares estimator of CL, say firs, is given by

(2.16)

A problem of practical importance is the choice of wi’s in (2.2). It appears reasonable to choose wi’s in such a way that the larger an ri, the smaller the corresponding wi. In other words, observations yi having large biases receive correspondingly small weights in (2.1). Indeed the minimax linear estimator derived

156


in Sacks and Ylvisaker (1978) has this property. An obvious choice of the W~‘S to satisfy this requirement is

l/ri wi = C;=ll,ri Y

(2.17)

i=l,2 ,“‘f II. If the ri’s are all equal, we can choose wi = l/n for all i. We carried out some numerical computations to compare the maximum MSE’s and squared biases of

fiLs in (2.16) with the Sacks-Ylvisaker (1978) estimator, say fisu, of p in the model (2.8) with n = 3, for a few selected values of rl, r2 and r3. Our results confirmed the observation in Sacks and Ylvisaker (19781, namely that b,, is fairly robust against moderate departures from the value of u2 used in computing bs,. However, if the true value of u2 is quite different from the value used in computing I_isy, the minimax least squares estimator b, dominates fis, in terms of maximum MSE. A similar observation was also noted for the behavior of the maximum squared bias. These computations are not reported here.

3. Minimax least absolute deviation estimation

We shall now address the problem of estimating p by using the criterion

(3.1)

We shall call the resulting estimator of p the minimax least absolute deviation estimator of p. Computations similar to those that yielded (2.7) now give

(3.2)

From (3.2), it is clear that the estimator of p resulting from (3.1) is a value of p that minimizes CyZ1wi I yi - x# I. In other words, the minimax LAD estimator of p is independent of the assumed bounds ri on the bi. Thus the LAD criterion is robust, in a minimax sense, against departures from the ideal linear model, when the departures are of the type (1.1). Of course, computation of /? by minimizing (3.2) requires choosing the w,‘s. Perhaps they can be chosen to be equal if no information is available on the magnitude of the biases in (1.1).

Acknowledgements

The work of the first author was completed while visiting the University of Augsburg, Germany.

References

Agee, W.S., A.C. Ellingson and R.H. Turner (1985), A vari-

able selection model building technique for radar mea- surement bias estimation, in: Proc. of the 13th Con& on the Design of Experiments (U.S. Army Research Office, North

Carolina) pp. 413-454.

Arthanari, T.S. and Y. Dodge (19811, Mathematical Program- ming in Statistics (Wiley, New York).

Dodge, Y. (1984), Robust estimation of regression coefficients

by minimizing a convex combination of least squares and

least absolute deviations, Cornput. Statist. Quart. 1, 139- 153.

Dodge, Y. and J. JureEkova (1987), Adaptive combination of least squares and least absolute deviation estimators, in:

Y. Dodge, ed., Statistical Data Analysis Based on L,-norm

157


and Related Methods (North-Holland, Amsterdam1 pp.

27.5-284. Dodge, Y. and J. JureEkova (19881, Adaptive combination of

M-estimator and L, estimator in the linear model, in: Y.

Dodge, V.V. Federov and H.P. Wynn, eds., Optimal De- sign and Analysis of Experiments (North-Holland, Amster-

dam) pp. 167-176.

Huber, P.J. (19871, The place of the L, norm in robust

estimation, in: Y. Dodge, ed., Statistical Data Analysis Based on L,-norm and Related Methods (North-Holland,

Amsterdam) pp. 23-33.

Kelly, R.J. (19901, Reducing geometric dilution of precision

using ridge regression, IEEE Trans. Aerospace Electron. Syst. 26, 154-168.

Micchelli, CA. (19891, Optimal sampling design for parame-

ter estimation and p-widths under stochastic and deter-

ministic noise, in: M. Milanese, R. Tempo and A. Vicino,

eds., Robustness in Identification and Control (Plenum,

New York) pp. 25-40.

Milanese, M. (19891, Estimation and prediction in the pres-

ence of unknown but bounded uncertainty: a survey, in:

M. Milanese, R. Tempo and A. Vicino, eds., Robustness in Identification and Control (Plenum, New York) pp. 3-24.

Rao, C.R. (19881, Methodology based on the L, norm in

statistical inference, SankhyZ Ser. A 50, 289-313. Sacks, J. and D. Ylvisaker (19781, Linear estimation for ap-

proximately linear models, Ann. Statist. 6 1122-1137.

Sacks, J. and D. Ylvisaker (19811, Variance estimation for

approximately linear models, Math. Operutionsforsch. Statist. Ser. Statist. 12, 147-162.

158

least squares and least absolute deviation procedures in approximately linear models

Documents