characterization of the weights of least squares adaptive polynomials

SIAM J. APPL. MATH.Vol. 48, No. 2, April 1988

1988 Society for Industrial and Applied Mathematics

014

CHARACTERIZATION OF THE WEIGHTS OF LEAST SQUARESADAPTIVE POLYNOMIALS*

ROBERT K. GOODRICH" AND RANJIT M. PASSIt

Abstract. Passi and Morel have provided a concept of adaptive least squares polynomials for timeseries data using exponential weighting Wk =a-kwo, 0<a<l. Given the least squares polynomial p(t)obtained from data Yk, k <= O, they derived a computationally efficient recursion algorithm to update thepolynomial coefficients when a new data value is received. They showed that the exponential weightingpossesses the "update property," i.e., the polynomial-update is invariant to new data value YI if Y p(1).It was this update property that made the recursive algorithm possible. In this paper we explore the weightsw which possess this update property for a more general class of functions that include the polynomials asa special case. In this general case we give a necessary and sufficient condition for a weight function tosatisfy the update property. In the case of polynomials it is shown that the exponential weights are the onlyweight functions of practical significance which satisfy the update property. An estimate of the rate ofconvergence of the algorithm of Passi and Morel is provided.

Key words, data adaptive, least squares, update property, exponential weights, interpolating functions,invariant subspace

AMS(MOS) subject classifications. 40, 46, 62, 65

1. Introduction. In a previous paper 10] the concept of least squares (LS) adaptivepolynomials was utilized for a data editing application. As the term suggests thesepolynomials adapt to the data as they come. Abraham and Ledolter 1 discuss severalother adaptive type applications under the term Discounted Least Squares. Weconsider this problem under a more general setting than polynomials. Let Y=(’’’, Y-2, Y-l, Yo), be the data to the present, w= (..., w_2, w_l, Wo) be a set ofweights, and F a vector of functions F’(k)=(fo(k),". ,f,(k)). Let 6e be the spacespanned by the elements of F. An element g of 6e

(1) g(k) g(b, k): E bjf(k)j=O

is called the least squares (LS) fit to the data Y if b’=(bo,’’ ", bn) are chosen tominimize the weighted sum of squares:

(2) 2(b)= E Wk Yk-- bjf(k)k j=0

An LS polynomial fit is obtained when F=P with p(k)= k, j=0,..., n. In ourformulation we require that the pair (w, F) have the following desirable propeies:

The impoance of the data values , relative to each other, stays fixed in timeand is expressed in terms of the weights w, j 0. A new obseation Y1 at time

1, being the most recent obseation, receives the weight Wo; Yo, the nextobseation in order, receives weight w_ and so on. We will denote theaugmented data by Y1 (Y, Y) and the shifted version of the weighting by Wl.

* Received by the editors July 21, 1986; accepted for publication (in revised form) May 27, 1987.

" Department of Mathematics, University of Colorado, Boulder, Colorado 80309.$ National Center for Atmospheric Research, Boulder, Colorado 80307. The National Center for

Atmospheric Research is sponsored by the National Science Foundation.

458

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

ADAPTIVE LEAST SQUARES POLYNOMIALS 459

Further, the pair (w, F) is required to satisfy the update property; i.e., if g(k)is the LS fit obtained from the pair (w, Y), and if g(1)-- Y1, then the LS fitobtained from the pair (Wl, Y) is g(k). This means that the LS fit should notchange if the new data agrees with the predicted value provided by g(k). Itturns out that it is this update property that makes the determination of theupdate LS fit computationally efficient.

Let wa be the exponential weighting, i.e., Wk Woa-k, O< a < 1, k <-O. Passi andMorel [11] showed that the pair (wa, P) satisfies the update property. For this casethey gave the update formulas. We show that for (w, P) and wk > 0 for all k-<_ 0, theexponential weighting w is the only possibility so that (w, P) satisfies the updateproperty. For Wk =0 for some k the polynomials reduce to interpolating solutions.Thus, we provide a complete characterization of the weights for the LS adaptivepolynomial case.

The paper is organized as follows. Section 2 provides the formulation of theproblem. Our main result, in 3, gives a necessary and sufficient condition for a pair(w, F) to satisfy the update property. As a corollary we derive a complete characteri-zation of weights w for a pair (w, P) to satisfy the update property. Section 4 dealswith the case when Wk 0 for some k. We show that the update property in this Casereduces to interpolation. In 5 we give a brief description of the recursive updatealgorithm and a result on its rate of convergence.

2. Problem formulation. Let w be a set of weights and Y be the data availablefrom the past. We fit the data with a least squares fit g as described in (1) where wechoose the coefficients by minimizing (2). Given a function h, we will use, h, h(k) andh (. ., h(-2), h(-1), h(0)) without any distinction.

Next, suppose the data are updated, i.e., we are given Y1. Using the shifted weightswe pick an updated least squares fit (k) by minimizing the update quadratic functional

(3) (q) E Yk q(k)]2Wk-1

where, as before, q is in fie.We say the pair (w, F) has the update property if whenever Y g(1) then g.

That is, no update is required if our least squares estimate agrees with the new data value.We list our assumptions on the weights"(a) Wk>O for k_-<0.

o(b) -o Wk- 1. This is the normalizing assumption.(c) o.,_f2(k)wk < for i=0, 1,..., n. This assumption is needed to insure that

the functional is defined on the subspace fie.(d) There exists a b => 0 such that Wk-1 <- bWk for k =<0.This last assumption requires some explanation. In order that makes sense we

need to assume our data { Yk} is in

(4) 12(w) {{ Yk} - YkWk < O0}"Similarly, we require of (3) to be defined for all data in I2(W), since we update ourold data to compute ft. In particular, we require that

(5) {Yk}E/2(W,) {{ Yk}

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

460 R.K. GOODRICH AND, R. M. PASSI

i.e., the shift operator T(. ., Y-l, Yo)= ("’, Y-l, Yo, 0)= ’ maps /2(w) into /2(Wl)which, by Lemma 1, will be shown to be equivalent to condition (d). This lemmafollows from the theory of weighted shift operators [14]. However, for the sake ofcompleteness, we give a direct proof.

LEMMA 1. T maps 12(w) into/2(w) if and only if there exists a constant b >-_ 0 suchthat Wk-1 <- bWk for k <- O.

Proof. First, suppose there exists a b->0 such that Wk-1 <- bWk for k_-<0. If,o YkWk < 0, then 1_ YkWk- <- YWo+ b _o y2kWk <. Next suppose T maps/2(w) into /2(w). Define T/(Y)--(’’’, 0, Y-l,’’’, Yo, O) as a mapping of /2(w) into12(Wl). Let Mk "-Wk-1/Wk. Then

o oIIT,(Y)II- E rwk-l= E rMgWk

k -1 k -l

-lNk<--O k=-l

This shows that T is a bounded operator and [IT, ll-<- (max_==o M)/. Next we showthe equality. There exists a ko such that Mo- max_==o M. Define Yg to be one ifk ko and zero otherwise. Then

YgoWgo_,) (YoMgoWgo)l/= (Mgo)l/JlYll.Thus,

) 1/2

(6) IITIlI- max Mg-l<__k<_O

By assumption, = (..., Y-E, Y-i, Yo, 0)/2(w). But we see that {T/(Y)} convergesto Y. So, for fixed Y in 12(w), {TI(Y)} is bounded. By the uniform boundedness principle[12, p. 70] sup, IIT[I is finite. Let b =sup, [IT, ll--sup M. Then Wk- <- bWk for k<_-0.

This completes the proof of the lemma.We have shown that ifT maps/(w) into/2(Wl) then T is bounded. This is important

for numerical reasons. If a vector f is small in/(w), we desire T(f) to be small or elseour LS estimates could change rapidly and our algorithm would be unstable.

In practice, one usually assumes w is monotone, i.e., Wk- <---- Wk and condition (d)is. satisfied with b 1. This monotonicity assumption serves to weigh the recent datamore heavily. We use the more general condition (d) for the sake of mathematicalcompleteness.

We list our assumptions on the vector functions F:(i) fo,’’" ,fn are linearly independent in /(w), and F(k) is defined for all

integers k.(ii) There exists an (n + 1) x (n + 1) nonsingular matrix L such that F(k / 1)

LF(k) for k-<_0.

Assumption (i) gives the uniqueness of the coefficients b in (2) and (ii) makes theupdate computations efficient [1, p. 96]. Condition (ii) is another way of saying that

is invariant under translation. Note that P as well as other classes of functions satisfythese conditions as discussed in the above reference.

3. Main result. In this section we provide a characterization of pairs (w, F) havingthe update property. Define a mapping M of/(w) by M(Y)(k)= MkYk, where Mk-"Wk-1/Wk. Let (V, Z) denote the inner product on /(w) for Y, Z e/(w) and (Y, Z)ldenote the inner product on/(Wl) for Y, Z e/2(Wl).

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


THEOREM 1. Let weight w satisfy the conditions (a), (b), (c) and (d) and F satisfyconditions (i) and (ii). Then the pair (w, F) satisfies the update property if and only if Sfis an invariant subspace under M.

The proof of the theorem will be aided by the following lemmas.LEMMA 2. The pair (w, F) satisfies the update property if and only if whenever h is

in/2(w) and (h, f,)=0 for i= 1,..., n, then (T(h), fi)l =0 for i=O, n.

Proof. If he/2(W) and (h, fi)=0 for i= 1,..., n, then Q(h)=0 where Q is theorthogonal projection of/2(w) onto b. So the LS best fit of the data h is g-Q(h),where g(k) 0 Y=o bjf(k) for k <_- 0. Then (i) gives bo b, 0. Let Y1 g(1)0, be the projection of/2(Wl) onto b, = T(h) and = ). By the update property,0(T(h)) =g=0. But this implies that T(h) is in the orthogonal complement of in/2(Wl) or that (T(h), f,)l =0 for i= 1,..., n.

To prove the converse, suppose Q(Y)=g and 2(1)-Y1. We must show thatQ(Y1) =g where Vl=( Y-i, Yo, Y1). Since Y-g= (I-Q)(Y) and V-g is in theorthogonal complement of 6e in/2(w), then (Y-g, f)= 0, i- 0,. , n. By hypothesis

othis implies that 0= (T(V-g), fi)l =_ Yk--gk)f(k)Wk- =1 Yk--gk)f(k)Wk-1-(Y1-g, fi) for i= 0,. ., n. This implies that (Y1-g) is in the orthogonal complementof 6e in/2(Wl), and O(V)= g.

LEMMA 3 [13, p. 62]. Suppose A, An and A are linearfunctionals on a vectorspace . Let

,N" {X: A1x Anx 0}.

If Ax 0 for x , then there exist scalars a l, , an such that

(7) A alA +" -1- a,A,.

Proof of Theorem 1. By Lemma 2, (h, fi) 0 for 0, , n ===> (T(h), fi)l 0 for0, , n. Let Ai be the bounded linear functional on/2(w) defined by A(h) (h,

i=O,..., n. Let Ai be the linear functional defined on l(w) by Ai(h)=(T(h),fi)lfor i= 0,..., n. Then .i is a bounded linear functional on/2(w).

Thus, we have shown that

(8) A(h)=0, i=0,...,n===>Ai(h)=0, i=0,...,n.

By Lemma 3, this implies that for each j there exist constants h, i--0,..., n suchthat j(h)= i=o AA,(h) for all h in /2(W). This implies that

(9) E f(k)hkWk-, Af(k) hkWk.=0

Now let hk be one if k ko and zero otherwise, then

(10) fj(ko)Wko-1--(i=o Afj(ko))Wko for j=0,..., n.

This must be true for all ko. Thus, for each j-0,..., n, there exists a function g in6 such that for all k<_-0 we havef(k)Wk_= g(k)Wk. Or, M(f)=g 6e for allj. Thus,6e is an invariant subspace under M. Reversing the argument and using Lemma 2, wefind the converse to be true.

COROLLARY 1. If W is an exponential weight and F satisfies conditions (i) and (ii)and (c), then (wa, F) always satisfies the update property.

Proof. In the case of an exponential weight wa(k)= woa -k, and wa(k- 1)/wa(k)=Mk- a for all k_-< 0. Thus, M is a scalar multiple of the identity operator and everysubspace is invariant including

This corollary is a partial explanation for the popularity ofthe exponential weights.

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

462 R.K. GOODRICH AND R. M. PASSI

COROLLARY 2. Let P’= (po(k),’", p,(k)) be given by pj(k)= Mforj=O,..., n.Let w satisfy conditions (a), (b), (c) and (d). Then (w, P) satisfies the update propertyif and only if w is an exponential weight.

Proof. Suppose the pair (w,P) satisfies the update property. Then, by Theorem1, the space 5 of all polynomials of degree _-< n must be invariant under M. This impliesMkPo(k) Mk is in 5e, i.e., Mk is a polynomial of degree -m-< n. But Mkpn(k)is a polynomial of degree n/ m which is in 5e=# n+ m_-< n =# m =0, and Mkw(k 1)/w(k) a is a constant function, and by induction Wk Woa-k, for k _-< 0. Thus,w is an exponential weight. The converse follows from Corollary 1.

While, under certain mild conditions on F (see Corollary 1) the exponential weightsalways work, i.e., (wa, F) satisfies the update property, one might assume that if a pair(w, F) satisfies the update property then w must be an exponential weight. We providea counterexample to this conjecture.

Example. F’(k)= (1, sin (2zrk/3), cos (2zrk/3), sin (4zrk/3), cos (4rk/3)). LetMk 1/2+1/4 sin (2rk/3). We can easily show that 1/4 _-< Mk <---- for all k _-< 0, and 5 is invariantunder M. Conditions (a), (b), (c), (d), and (i) and (ii) above are satisfied whereWk-1 MkWk.

This example opens up the possibility of weights and functions satisfying theupdate property but allowing some freedom in the choice of w. To our knowledge,such analyses have not been performed in the literature. We intend to investigate thissubject in a future paper.

4. Considerations when, some of the weights are zero. Suppose Wko--0 for someko_-<0. By our discussion following Lemma 1, we require that T maps /2(W) into/2(Wl).But, in order that this mapping be well defined, it is necessary that Wko-1- 0, and byinduction Wk 0 for all k -< ko. Then {k[ Wk > 0} {-- l, --l + 1," , 0} for some positiveinteger I. In the case when some weight is zero we list our assumptions on the weights:

(a’) Wk>O for -l<-_k<=O, and Wk-O for k<-l;(b’) o, Wk 1.

We now show that the update property reduces to interpolation.THEOREM 2. If (W,F) satisfies the update property, (a’), (b’) and (i), (ii) above,

then n dim 5e-1 and the LS fit to any Y is the element of that interpolates thedata (Y_,..., Y-l, Yo) on (-1,...,-1,0).

Proof. We specialize our proof of Theorem 1 to this case and find that (w, F)satisfies the update property if and only if 5e is invariant under the operator M, whereMk--Wk-1/Wk, k-0,""",-l. By (10), Mkf(k)=i__oAf(k for j=0,..., n, and thematrix of the operator M with respect to the basis of functions fo,fl,’" ",fn is givenby M (A). When we assume that M is nonsingular on 5e, the matrix (A) is invertible.Again, from (10), 0 i=o Af(-/) for j 0,. ., n. The nonsingularity of (A) impliesthat f(-l)= 0 for j 0,. ., n. By (ii), this implies that f are zero everywhere, but thisis ruled out by (i).

If M is singular on , then M(f)--0 for f 5e, f 0. M(f) 0 implies f is of theform (... ,f-l-l,f-l, 0,..., 0), wheref_l # 0. By (ii), the translatesf(k+j),j=O,...,are in 5e and are independent. This implies that + 1 _-< n + 1, or n. But n + 1 _-< + 1because of the independence of fo,""" ,f, as elements of/2(w), which has dimension/+1.

Since n the LS fit that makes 0 is simply the function in 5e that interpolatesY-,,’’’, Yo) on (-l,...,0).

5. Updating and rate of convergence. In this section we show how the updateproperty leads to a simple method for updating our LS estimates.

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


(11)

If (w, F) satisfies the update property, then

=("" ", Y-,, Yo, Yl-g(1)+g(1))

=t)(..., Y-l, Yo, g(1))+O(’",O,O, Yl-g(1))

g+ (Yl--g(1))l)(’’’, O, O, 1).

This shows that if a weight satisfies the update property we can update our LS estimatein a very efficient manner. We need only perform the computation to find Q(. , O, O, 1)once.

Once we use the update formula we translate the origin using (ii) so as to beprepared for the introduction of the new data point. This has been carried out explicitlyfor the polynomial case by Passi and Morel [11] and Abraham and Ledolter [1].

We describe how to get started. We never know the data into the infinite past. Ifwe start with the zero vector then the LS fit is the zero polynomial. Next we use the

u.pdate formula k + 1 times on the data Yo, Y1," "’, Yk to find the LS fit to the dataY= (..., O, O, Yo, Y1,’", Yk). Let gbethe LS fit to Y-(..., Y-2, Y-i, Yo,’", Yk).However, IIg-ll IIQ(Y)- Q(i’)II <--IIv-i’ll b/211(" , Y-2, Y-l, 0)ll. So the rate ofconvergence is (bk/2) where b =sup Wk-1/Wk.

Since any two norms are equivalent on 5e [12, p. 37], we obtain the same rate ofconvergence on the coefficients. If we knew that the missing data was bounded byB> 0, then an error estimate would be IIg-fll--< b"/=n.

6. Concluding remarks. We have provided a complete characterization of theweights which possess the update property for adaptive polynomials. For classes offunctions, other than polynomials, we have shown the possibility of pairs of weightsand functions satisfying the update property yet allowing weights that are notexponential.

The exponential weights have been used by Passi and Morel [10], [11] in theirwork on adaptive polynomials; our work provides a mathematical support to theirchoice of these weights. Exponential weights have been used in the past by Otterman[9], Amin and Passi [2], and in other electrical engineering applications [6] to achieveadaptiveness in the algorithms. It is not, however, clear if the exponential weightingis the only possible choice in all such applications.

We have provided (11) as an alternate form of the polynomial updating algorithmof Passi and Morel [11]. From both these forms we note that the updating dependson the quantity Yl-g(1), which is the difference between the data value and itsprediction. In analogous work this quantity is known as the "innovation" in the Kalmanfilter [4], [5]. A similar innovation term occurs in the Least Mean Squares (LMS)algorithm of Widrow and Hoff 15]. In all these algorithms the parameters are adjustedwhen a new data value is received. At each stage the algorithms learn from theinnovation [3], [7] and adapt to the changing structure of the data. One should,however, note that the term innovation in our paper differs significantly from thetraditional usage of the term.

Acknowledgments. We would like to thank Drs. Michael Pernice and WesleyWilson of the National Center for Atmospheric Research and Allan Steinhardt ofMassachusetts Institute of Technology, Lincoln Laboratories for their comments onthis paper. We would also like to thank a referee for pointing out the general settingfound in 1 ].

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

464 R. K. GOODRICH AND R. M. PASSI

REFERENCES

[1] B. ABRAHAM AND J. LEDOLTER, Statistical Methods for Forecasting, John Wiley, New York, 1983.[2] M. G. AMIN AND R. M. PASSI, Temporal tracking of spectral variations, Signal Process., (1986).[3] G. Z. DIDERRICH, The Kalman filter from the perspective of Goldberger-Theil estimators, American

Statist., 39 (1985), pp. 193-198.[4] R. E. KALMAN, A new approach to linear filtering and prediction problems, J. Basic Engineering, 82

(1960), pp. 35-45.[5] R.E. KALMAN AND R. S. BUCY, New results in linearfiltering andprediction theory, J. Basic Engineering,

83 (1961), pp. 95-108.[6] H. LEV-ARI, T. KAILATH AND J. CIOFFI, Least squares adaptive lattice and transversalfilters: A unified

geometric theory, IEEE Trans. Inform. Theory, IT-30 (1984), pp. 222-236.[7] R. J. MEINHOLD AND N. D. SINGPURWALLA, Understanding the Kalman filter, American Statist., 37

(1983), pp. 123-127.[8] C. MOREL AND R. M. PASSI, Use of adaptive polynomials for real time data editing, J. Atmospheric

Oceanic Tech., (1986), pp. 494-498.[9] J. OTTERMAN, The properties and methods for computation of exponentially-mapped-past statistical

variables, IRE Trans. Automat. Control, AC-5 (1960), pp. 11-17.[10] R. M. PASSI AND C. MOREL, An adaptive data editing algorithm with an application to hydrostatic

altitude integration, Ninth Conference on Probability and Statistics in Atmospheric Science, VirginiaBeach, VA, 1985, pp. 231-236.

11] ., Least squares adaptive polynomials, Commun. Statist. A---Theory Methods, (1987).[12] A. P. ROBERTSON AND W. J. ROBERTSON, Topological Vector Spaces, Cambridge University Press,

Cambridge, 1964.[13] W. RUDIN, Functional Analysis, McGraw-Hill, New York, 1973.[14] A. L. SHIELDS, Topics in Operator Theory, Mathematical surveys 13, C. Pearcy, ed., American Mathe-

matical Society, Providence, RI, 1974.[15] B. WIDROW AND M. E. HOFF, Adaptive switching circuits, IRE WESCON Conv. Rec., Part 4 (1960),

pp. 70-73.

Dow

nloa

ded

12/3

1/12

to 1

28.1

48.2

52.3

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

characterization of the weights of least squares adaptive polynomials

Documents