recursive least squares ladder estimation algorithms

15
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3, JUNE 1981 627 Recursive Least Squares Ladder Estimation Algorithms DANIEL T. L. LEE, MEMBER, IEEE, MARTIN MORF, MEMBER, IEEE, AND BENJAMIN FRIEDLANDER: MEMBER, IEEE A bstruct- Recursiveleastsquaresladderestimationalgorithmshave attractedmuchattentionrecentlybecause of their excellent convergence behavior and fast parameter tracking capability, compared to gradient based algorithms, We present some recently developed square root normal- izedexactleastsquaresladderformalgorithmsthathavefewerstorage requirements, and lower computational requirements than the unnormal- ized ones. A Hilbert space approach to the derivations of magnitude normalized signal and gain recursions is presented. The normalized forms are expected to have even better numerical properties than the unnormal- izedversions.Othernormalizedforms,suchasjointprocessestimators (e.g., “adaptive line enhancer”) and ARMA (pole-zero) models, will also be presented. Applications of these algorithms to fast (or “zero”) startup equalizers, adaptive noise- and echo cancellers, non-Gaussian event detec- tors, and inverse models for control problems are also mentioned. I. INTRODUCTION P RESENTED in [1]-[4] are various ladder form reali- zations (also known as lattice structure) of a class of exact least squares estimation algorithms for autoregressive (AR) and autoregressive moving-average (ARMA) model- ing, with applications to speech modeling and synthesis problems, joint process estimation applications (e.g., “fast startup” equalizers, and “noise cancelling and inversion”) and adaptive control. The least squares ladder algorithms are recursive both in time and order, and numerically efficient requiring only O( N) operations per time-update, where N is the order of the algorithm. Besides their modu- lar structures, nice stability properties, robustness, insensi- tivity to roundoff noise, and successive orthogonalization and decoupling of the residuals of each order, all of which are by now well known for the ladder canonical forms (see, e.g., [27], [28], [32], [35]), the least squares ladder algo- rithms also exhibit excellent convergence behavior and have very fast parameter tracking capability, as demon- strated in [5], [21]. The fast tracking behavior is due to the “exact time-update” formalism in which the time-updated work was supported in part by the Defense Advanced Projects Agency Manuscript received May 30, 1980; revised December 1, 1980. This Contract MDA903-80-C-0331, andin part by the Air Force Office of AF49620-79-C-0053, the National Science Foundation under Grant ENG Scientific Research, Air Force Systems Command under Contract -78-10003, and the Joint Services Electronics Programs under Contract D. T. L. Lee was with the Information Systems Laboratory, Stanford University, Stanford, CA. He is now with IBM San Jose Research Laboratory, San Jose, CA 95193. Electrical Engineering, Stanford University, Stanford, CA 94305. M. Morf is with the Information Systems Laboratory, Department of B. Friedlander is with Systems Control, Inc., Palo Alto, CA 94304. DAAG29-79-C-0047. quantities are the exact solutions to a least squares problem at each time step. In other words, the algorithm computes at each time step the set of model parameters, e.g., the reflection coefficients of the ladder form, which minimizes the sum of the squared errors for the input data up to that time. Thus it gives an “exact least squares” solution to the recursive estimation problem and is very different from the conventional gradient search methods, e.g., the least mean- square (LMS) adaptive algorithm [34], which are ap- proximate iterative procedures. The extremely rapid startup performance of the exact least squares algorithms and their improvements over the gradient type algorithms have recently been confirmed by other authors [6]-[8], [30], [31], [38] in the context of adaptive equalization. In particular, the computer simula- tion results given by [6], [7], [30], [31] on the comparisons of the convergence rates of three different algorithms- the LMS gradient, gradient ladder, and least squares ladder, clearly demonstrated the superior performance of the least squares algorithms over the gradient algorithms. We emphasize that the exact least squares update equa- tions, see e.g., [3], are so deceptively similar to gradient based algorithms, see e.g., [29], that one can be misled to the assumption that they give similar results. In particular the number of operations, e.g., number of multiplications, required for the two types of algorithms is very close. A comparison of the operation counts for the algorithms in [3] and [29], for instance, shows that the exact least squares algorithm requires one less operation than the gradient based one. In practice, however, the actual operation counts are highly dependent on both the hardware installations and software. implementations. The only significant difference between the two types of algorithm turned out to be a data dependent variable in the least squares case that replaces the constant (normalized) step size in the gradient case. This variable, a function of the residual energy, has a Gaussian likelihood interpreta- tion. For likely data samples, the step size is constant roughly in the order of the magnitude of the “optimal” step size of gradient based algorithms. For veryunlikely samples, the gain can become very large, far outside the steady-state stability bounds of the gradient based algo- rithms and, therefore, improving drastically the tracking behavior, while still preserving stability of the least squares algorithms. 0096-3518/81/0600-0627$00.75 0 198 1 IEEE

Upload: independent

Post on 08-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3, JUNE 1981 627

Recursive Least Squares Ladder Estimation Algorithms

DANIEL T. L. LEE, MEMBER, IEEE, MARTIN MORF, MEMBER, IEEE, AND BENJAMIN FRIEDLANDER: MEMBER, IEEE

A bstruct- Recursive least squares ladder estimation algorithms have attracted much attention recently because of their excellent convergence behavior and fast parameter tracking capability, compared to gradient based algorithms, We present some recently developed square root normal- ized exact least squares ladder form algorithms that have fewer storage requirements, and lower computational requirements than the unnormal- ized ones. A Hilbert space approach to the derivations of magnitude normalized signal and gain recursions is presented. The normalized forms are expected to have even better numerical properties than the unnormal- ized versions. Other normalized forms, such as joint process estimators (e.g., “adaptive line enhancer”) and ARMA (pole-zero) models, will also be presented. Applications of these algorithms to fast (or “zero”) startup equalizers, adaptive noise- and echo cancellers, non-Gaussian event detec- tors, and inverse models for control problems are also mentioned.

I. INTRODUCTION

P RESENTED in [1]-[4] are various ladder form reali- zations (also known as lattice structure) of a class of

exact least squares estimation algorithms for autoregressive (AR) and autoregressive moving-average (ARMA) model- ing, with applications to speech modeling and synthesis problems, joint process estimation applications (e.g., “fast startup” equalizers, and “noise cancelling and inversion”) and adaptive control. The least squares ladder algorithms are recursive both in time and order, and numerically efficient requiring only O( N ) operations per time-update, where N is the order of the algorithm. Besides their modu- lar structures, nice stability properties, robustness, insensi- tivity to roundoff noise, and successive orthogonalization and decoupling of the residuals of each order, all of which are by now well known for the ladder canonical forms (see, e.g., [27], [28], [32], [35]), the least squares ladder algo- rithms also exhibit excellent convergence behavior and have very fast parameter tracking capability, as demon- strated in [5 ] , [21]. The fast tracking behavior is due to the “exact time-update” formalism in which the time-updated

work was supported in part by the Defense Advanced Projects Agency Manuscript received May 30, 1980; revised December 1, 1980. This

Contract MDA903-80-C-0331, and in part by the Air Force Office of

AF49620-79-C-0053, the National Science Foundation under Grant ENG Scientific Research, Air Force Systems Command under Contract

-78-10003, and the Joint Services Electronics Programs under Contract

D. T. L. Lee was with the Information Systems Laboratory, Stanford University, Stanford, CA. He is now with IBM San Jose Research Laboratory, San Jose, CA 95193.

Electrical Engineering, Stanford University, Stanford, CA 94305. M. Morf is with the Information Systems Laboratory, Department of

B. Friedlander is with Systems Control, Inc., Palo Alto, CA 94304.

DAAG29-79-C-0047.

quantities are the exact solutions to a least squares problem at each time step. In other words, the algorithm computes at each time step the set of model parameters, e.g., the reflection coefficients of the ladder form, which minimizes the sum of the squared errors for the input data up to that time. Thus it gives an “exact least squares” solution to the recursive estimation problem and is very different from the conventional gradient search methods, e.g., the least mean- square (LMS) adaptive algorithm [34], which are ap- proximate iterative procedures.

The extremely rapid startup performance of the exact least squares algorithms and their improvements over the gradient type algorithms have recently been confirmed by other authors [6]-[8], [30], [31], [38] in the context of adaptive equalization. In particular, the computer simula- tion results given by [6], [7], [30], [31] on the comparisons of the convergence rates of three different algorithms- the LMS gradient, gradient ladder, and least squares ladder, clearly demonstrated the superior performance of the least squares algorithms over the gradient algorithms.

We emphasize that the exact least squares update equa- tions, see e.g., [3], are so deceptively similar to gradient based algorithms, see e.g., [29], that one can be misled to the assumption that they give similar results. In particular the number of operations, e.g., number of multiplications, required for the two types of algorithms is very close. A comparison of the operation counts for the algorithms in [3] and [29], for instance, shows that the exact least squares algorithm requires one less operation than the gradient based one. In practice, however, the actual operation counts are highly dependent on both the hardware installations and software. implementations.

The only significant difference between the two types of algorithm turned out to be a data dependent variable in the least squares case that replaces the constant (normalized) step size in the gradient case. This variable, a function of the residual energy, has a Gaussian likelihood interpreta- tion. For likely data samples, the step size is constant roughly in the order of the magnitude of the “optimal” step size of gradient based algorithms. For very unlikely samples, the gain can become very large, far outside the steady-state stability bounds of the gradient based algo- rithms and, therefore, improving drastically the tracking behavior, while still preserving stability of the least squares algorithms.

0096-3518/81/0600-0627$00.75 0 198 1 IEEE

628 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3, JUNE 1981

For low-order models, stationary Gaussian type signals and noises, and well-conditioned covariances or spectras, the gradient algorithms may perform similar to the exact least squares ladder algorithms. If any of these assump- tions is violated, however, the difference in performance can be very dramatic, see, e.g., [5], [30], [31]. This is especially true for man-made (i.e., non-Gaussian) signals, a fact that has resulted, for instance, in very high perfor- mance intrusion detection systems at Sandia Laboratories (private communication).

In this paper we present several interesting recent devel- opments of least squares ladder algorithms, namely the square root normalized forms. These normalized ladder recursions have fewer storage requirements, and much lower computational complexity than the unnormalized forms. In the simplest case the square root normalized ladder forms offer a factor of three lower in complexity, i.e., from nine equations to three equations per ladder section, and the potential of fixed point implementation due to the fact that all internal variables are in magnitude less than one. We may note that the normalized ladder, recursions can actu- ally be expressed as a sequence of circular and hyperbolic rotations and the requirement for efficient computation of these rotations can be satisfied via the use of the so called Coordinate Rotation DIgital Computer (CORDIC) or other bit-recursive algorithms, see e.g., [16], [36], [37].

Some of the nice structural features of the normalized ladder forms such as stability and excellent roundoff noise properties were discussed in [9]. The improved perfor- mance of the square root normalized algorithms over the unnormalized ones has been illustrated in [lo], in the context of multichannel spectral estimation.

The paper is organized as follows. In Section 11, we present a geometrical formalism that gives a direct deriva- tion of the least squares recursions, for both order- and time-update recursions, without using intermediate quanti- ties such as forward and backward predictors. This for- malism is based on a Hilbert space approach, where quan- tities such as inner-product, orthogonal projections, and subspaces decompositions are appropriately defined. The development appears to be similar to the approaches by [ l l ] and [ 121, in the sense that inner-products can be viewed as approximate expectations, but in fact they are quite different due to fundamental differences in the defi- nitions of the underlying linear spaces. In our approach the linear spaces are spanned by the actual observed data, and direct time-update recursions are obtained by shifting the observed data. In the formalism of [ 1 I ] and [12] time- invariant second-order statistical information (the correla- tion function) is used to define the underlying linear space, therefore, the recursions are restricted to stationary process only, i.e., no direct time-update recursions of the models can be obtained. In the present approach we can track the process dynamics, by introducing appropriate inner- products, for instance one with exponential weights. Our results also naturally extend to the so-called a-stationary or finite-rank processes (nonstationary processes that are close,

in a certain sense, to stationarity) that were introduced in [13]-[ 151. The treatment of this nonstationary case is the subject of another paper [16] and therefore we do not present it here. We note that a different geometric ap- proach to the (unnormalized) least squares recursions has independently been developed by Shensa [ 171 recently.

Given the order- and time-update recursions for the unnormalized variables, and a suitably defined normaliza- tion for these variables, the normalized ladder recursions are obtained in Section 111. In the normalized forms, all the variables are first normalized to unit variance, then an additional normalization by an angle variable that mea- sures the closeness between the two linear subspaces, en- ables us to reduce the complexity of the ladder algorithms significantly. The order- and time-update recursions for the signal variances no longer appear in the algorithm since they are embedded into the signals themselves. In fact, the final form of a normalized ladder linear predictor consists only of three recursions, the order updates of the two normalized signals (forward and backward innovations) and the time-update of the reflection coefficients, all in magnitude less than one, compared to the nine recursions required in the unnormalized algorithms. In addition, ex- ponentially weighted normalized least squares forms, use- ful in tracking time-varying parameters, have exactly the same recursions, except for the zeroth-order or gain up- date, again a remarkable property.

In Section IV, we obtained normalized algorithms for the joint process estimator, with applications to adaptive equalizers, noise cancellers, and inverse systems for control applications. In Section V, we show how to obtain the ARMA ladder recursions for pole-zero modeling in nor- malized forms. However, we will only give a brief overview there, while the full treatment is given in [ 181.

11. LEAST SQUARES LADDER RECURSIONS- GEOMETRICAL APPROACH

Least squares problems have such a long and rich his- tory, that we need not elaborate on it here. In this paper we shall use the basic fact that the solutions of such problems can be interpreted as projection operations on properly defined spaces. We will explain the projection framework by an example before giving the formal definitions and derivations.

Consider the basic problem of fitting an Nth-order autoregressive model to a data set { y t , O<t< T } (consider the scalar case for the moment). The problem is to find a set of predictor coefficients { A N , i, i= 1,. . . , N} that will minimize the sum of the squared prediction errors

z [ 4 t ) I 2 T

t-0 where

c N ( t ) g y t - ~ r ~ t - I , . . . , r - N

N

= ~ t - E AN,iyt- , . j:= 1

Writing (1) in matrix form we have E N = y - YA,

where

E b = [ E N ( o ) , ' * ' , E N ( T ) ]

Y'=[Yo,*..,Y,l A&=[AN,I,..',AN,NI

('denotes transpose)

and Y spans the subspace of past observations, i.e.,

YO

YT-I YT-2 . . .

Y T - N

YO

YT-I YT-2 . . .

Y T - N

The least squares solution to (2) is then given by the pseudo-inverse

A , = ( Y Y ) - ' Y y (3)

and the prediction errors of (2) can be expressed in the operator form of

E,,, = ( I - P,,,)y' ( 4 4

P$ Y(Y'Y) - - IY (4b)

where the matrix PN can be interpreted as a projection operator. It projects vectors onto the subspace spanned by the columns of Y, i.e., of past observations. Its orthogonal complement P$ a (I- P,) is also a projection operator which projects vectors onto a subspace that is orthogonal to the subspace associated with P,. In other words, the prediction error E, = P$y = y -$ is orthogonal to the pre- dicted estimate $= P N y , i.e., the prediction of the next observation given the past observations Y (delayed versions of y ) . The corresponding interpretation in the stochastic case is quite clear.

Since a subspace is represented by its projection opera- tor, the orthogonal decomposition of subspaces can be represented in terms of an additive decomposition of the respective projection operators. By finding a decomposi- tion of P,,, such that PN = P N - l + Q,- I , where e,-, is orthogonal to PN- we can obtain order-update recursions for the prediction errors. The ladder recursion for updating the order N is, in fact, obtained directly from the projec- tion operator representation of a successive orthogonaliza- tion of the subspace by a Gram-Schmidt procedure. The well-known Levirrson algorithm is indeed an orthogonaliza- tion of the predictors { A n , n=0; 1 ., N } with respect to the covariance function of a stationary process, see, e.g., [9].

The projection operator P, is obviously a function of time T. When a new data point yT+, is added to the observations, we would like to express PN,T+l as a sum of PN,T and a correction term due to yT+ Using the projec- tion operator approach we obtained this time-update recur-

629

sion by calculating the exact correction that is needed; interestingly, it turns out that the magnitude of this correc- tion is proportional to the closeness (in terms of angle) between the new subspace, which includes the new input data, and the old subspace. We note that all other ap- proximation schemes such as the gradient algorithms essen- tially assumed that this angle is small and constant.

Our time-update result also solves a very important problem in the stochastic case: the problem of finding ladder form whitening filters for nonstationary processes, in particular, the class of so-called a-stationary or finite- rank process, [13]-[16], [21].

With this brief overview, we begin by defining the sam- ple-product space and the inner product associated with it. We consider the full vector input/output case throughout this chapter, i.e., we assume that we are given multichannel data; a vector of observations corresponds, for instance, to the vector of outputs of all channels sampled at the same time instant. The specialization to scalars and the extension to complex inputs or even matrix valued inputs are straightforward and do not require any additional tools.

A. Projections and Order- Update Recursions Sample Space

Suppose we have observed the time samples {y , E R", 0 < t< T } of a m-dimensional real-valued vector time series and we would like to characterize the linear space spanned by these data samples. One convenient characterization considers such a data sequence as a vector in the sample- product space defined in the following way.

Define the ket-vector (actually a T+ 1 by m matrix or vector array) by

IY)T=[Yo,Yl,...,YTl'. ( 5 )

Then I y ) T lies in the sample-product space T

H , = R ~ X R * X . . . X R ~ = J J R ~ 0

i.e., H, is the linear space spanned by the T+ 1 vector observations. This definition of I Y ) ~ and HT is a particu- larly convenient choice that leads to a natural inner prod- uct to be defined shortly. However, one can work with a general Hilbert space HT and suitably define elements I y ) T and our formalism still carries through. In that case, one is not obliged to make any particular choice in the definition of the inner product. We shall henceforth consider the elements of HT as ket-vector arrays with matrices as opera- tors acting on HT and do not insist that our operators are matrices.

Given I x ) ~ , / Y ) ~ EHT, we define the natural matrix inner product (a scalar inner product in the single channel case) by

T n

( x I Y ) T = x xtr:. (6) 1=0

Therefore, the adjoint or transpose of is represented by a bra-vector array (xlT =[x0,- a , xT].

630

We denote a consistent family of norms on HT by the symbol I1 . ] I . A common norm is the (scalar) Frobenius norm defined by

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3, JUNE 1981

t =o Other useful norms are defined via suitable matrix norms of (x I y ),. For instance we can define

m

~ I ( x I Y ) T ~ ~ ~ = ~ ~ { U ( ~ I ~ ) . V } = 2 {ai> i= I

where (x I y ) T = UZV is the singular value decomposition, i.e., UU'= I, = VV' and X = diag {a,}. We may also note that I x ) ~ does not have to lie in the same space as I y ) T as long as (x I y ) T can be properly defined.

We note that the matrix inner-product is an approxima- tion to the sample covariance Eyty;, i.e.,

Orthogonal Projection

to each other (I x ) ~ I I y ),) if Two vectors I x)=, I y ) T E H , are orthogonal with respect

l l ~ X ) ~ I I ~ I I ~ X ) T + ~ y ) , A I / (8) for all m-by-m "scaling" matrices A (a true scaling in the single channel case). This is a more general notion of orthogonality in linear space than the usual one in an inner product space where I x ) I I y ) if and only if (x I y ) = 0. We say that M I N for subspaces of H , if I m ) I In) for all Im) E M and In) E N .

The projection operator, Py that projects any vector I X ) ~ E H , on 1 ~ ) ~ is defined by

p y ~ l Y ) T ( Y I Y ) , ' ( Y I T (9)

so that Py I x ) ~ is the orthogonal projection of [x), on Iy),, and P/Ix)T=(l-Py)Ix)T is the orthogonal com- plement of (x),withrespect to I Y ) ~ , i.e., Py?-/x)TIly)T. The idempotent property of Py and Py' that P.' =P., can be verified easily.

Here, we have to make the assumption that I y ), is a full rank process so that the inverse ( y 1 y );' exists.' By a full rank process, we mean that there exists no linear relation between the components of the sample vector array, i.e., the rank of I must be m, the dimension of y,. Subspace of Past Observations

observation vector defined by Let I Z - ' ~ ) ~ be the one-step delayed (or right-shifted)

l z - ' y ) , ~ [ O , Y o ,..., yT--Ill. (10) In this definition, we have "pre-windowed" the data by zeros, i.e.,y, =0, t<O. Note that I z -"y) , EH,, for O<n< T.

Let Yl, , be the subspace of HT, spanned by

{IZ-ly)T,...,JZ-ny)T}, nGT

'Actually in the degenerate case pseudoinverses can be used instead, if the inner products are defined modulo orthogonal scaling.

and be the projection operator on Y l , n , T defined by

P l , n , T e IY~,n)T(Y~,nIY~,n)T'(Y~,nI* (11)

I ~ l , ~ ~ T = [ / ~ - l ~ ~ , ~ ~ ~ ~ ~ / ~ - n ~ ~ T l ( 12)

where

( Y I , n l T = I Y l , n ) ; . (13) Then, Pf,, = 1 -PI, n , is the projection operator on the orthogonal complement of Yl,n,, in HT. So that,

Pl,n,TIX)T=IX)T, if IX)TEYl,n,T (14) and

~ , f , , , l x ) , ~ l z - ' y ) ~ , 1<z<n, for I X ) ~ E H ~ .

(15) The invertibility of ( Yl, I Yl, ,,),is straightforward to prove. Similar definitions for Po, n - I , , are also obtained. Coordinate Map

We define a coordinate map T: H, -+ R" by ~ ( l y ) ~ ) = yT, i.e., T recovers or picks out the most recent time sample of 1 ~ ) ~ . Based on our sample space formulation, we can represent 7~ by letting 1 ~ ) ~ be the Tth unit vector, i.e., I ~ ) ~ = [ O , 0 , . . . , 0 , 1 ] ~ , so that

( Y I T ) T = Y T , ( T l Y ) T = Y ; . (16) Similarly,

( Z -5 1 7 7 ) T =YT-z

(Y~,,I~)T=[Y~-~,..',YT-~I.

and

We define a coordinate projector by P, = I T ) , ( T I , such A

that

PTIY)T=[~,O,--,YTl' (17)

~ , I ( Y ~ , = ~ ~ - ~ T ~ l Y ) , = ~ Y , ~ Y l ~ ~ ~ ~ 7 Y , - l ~ ~ l ' . (18)

and

In the development of the time-update theorem, the coordinate map T plays a central role in expressing the correction terms into observable variables, see also [21]. Forward and Backward Prediction Errors (Residuals)

The projection of I y ) , on the subspace Yl,n,T gives the (deterministic) linear least squares estimate of I y ) * based on the past n observations. In other words, the n th-order forward prediction error vector is obtained by

I f n ) r = I Y ) T - P I , n , T I y ) T = P l f n , T I Y ) T (19) where the components of ( E , ) , lie in the subspace Yo, n,T

but are orthogonal to Yl, n ,T . The norm of the corre- sponding (unscaled) prediction error covariance, which is now minimized, is given by

R E n , T = ( E n I E n ) T = ( E n I y ) T = ( Y I E n ) T (20) where we have made use of the idempotent property of projection operators

P l f n , ~ P , f n , T = P l f n , T .

To see how this projection approach is related to the

LEE et d : LADDER ESTIMATION ALGORITHMS 63 1

prediction approach, we apply the coordinate map ( n I defined in (17) and get

( . IEn)T=(nly)T-(nIYl ,n)T

~ ~ ~ l , n l ~ l , f l ~ ~ l ~ ~ l , n l ~ ~ T (21)

€ ~ , T = Y ; - [ Y ; - l , y ; - 2 ,..., y; , - , lAn,r (22)

~ n , T = ~ ~ l , , I ~ l , n ~ T ' ~ ~ l , n l ~ ~ T (23)

1.e.,

where

is the linear least squares predictor.

is defined by Similarly the n th-order backward prediction error vector

Fig. 1. Geometric picture of forward and backward innovations. ~ r n ) T = ~ Z - f l y ) T - P O , n - I , T ~ Z - " ~ ) T

=Pofn-l,Tlz-nY)T

and its covariance by (24) Projecting on the Tth component and taking transpose, we

have the ladder variable order-update

A right-shifted (unit-delayed) lrn)T that will be useful in the ladder recursions is given by

I Z - l r n ) T = P i f ; l , T I Z - " - l Y ) T (26) and its covariance by

R ~ , T - l = ( z - l r n ~ z - l r n ) T = ( r f l ~ r n ) T - l . (27)

Decomposition of Subspaces Since lies in the subspace Yo, n , T but orthogonal to

Y l , n , T , we can express Yo,n,T as the direct sum of Yl ,n ,T and Le.,

Y O , n , T = Y l , n , T @ I ~ f l ) T . (28) It follows that the respective projection operators are de- composed in the following form:

~o,.,T=~l,.,T+I~n)T(~nl~fl)T'(~nIT (294 and the orthogonal complements becomes

where A

A f l + , , . = ( E n I Z - l r n ) T = ( y l P ~ n , T I z - n - l y ) T (33) is the nth-order partial autocorrelation or just partial corre- lation, i.e., it is the partial correlation between Iy)= and I z - " - ' ~ ) ~ "holding { I z - l y ) , ; . , I Z - " ~ ) ~ } fixed."

From the geometric picture of Fig. 1, depicting the order-update of (33), it is clear that A n + l , T is indeed the nth order partial correlation obtained by the correlation of 1 ~ ) ~ with the part of I z - " - ' ~ ) ~ that cannot be extracted from { I z - ' ~ ) ~ , . . , I Z - " ~ ) ~ } , i.e., the backward prediction residual, I z - l r f l ) T .

Order-update recursion for the forward prediction error covariance is given by

( ' n + l l Z f l + I ) T = ( E n I E n ) T - ( L f l I Z - l r n ) T

. (z- 'rnIz- 'rf l )T1(z- 'rnIE,)T (34a)

Likewise we have Similarly for the backward prediction error we use the

~l,.+1,T-~l,fl,T+IZ-1~n)T(Z-1~flIZ-'~n)T1(~-~~nIT - projection operator decomposition of (22) and operate on

(304 I ~ - " - l y ) ~ we get

and ~r~+I)T=~Z-1rn)T-~Zn)T(En~4n)~1('~~Z-'rn)T

I (354

~ ~ + I , T - ~ ~ , T - I Aln+l,~Rn,CTcn,T (35b)

~ ~ , n + ~ . T = ~ ~ n , T - I ~ - ' r n ) ~ ( ~ - ' r n I ~ - ' r n ) ~ ' ( ~ - ' ~ n I ~ .

(30b) or

- - These decompositions are the keys for obtaining the order- update recursions. with its covariance given by Ladder Recursions: Order- Updates

forward prediction error vectors immediately by operating . (€nl€n)~'(€nlZ-lrn)T (364 From (30), we can get order-update recursions for the (rn+l~rn+l)T=(Z-lrn~Z-lrn)T-(Z-'r~~En)T

on IY)T>i% or

l€n+I)T=l€n)T-I~-'rn)T RL+I,T-R;,T-l - - A)n+l,TRLYn+l,T. (36b) . ( ~ - ' r ~ ~ z - ' r ~ ) ~ ' ( z - ' ~ ~ ~ ~ ~ ) ~ . (31) We can express the order-update recursions for the

forward and backward prediction errors of (32) and (35) in the following form:

where

giving the ladder structure for this recursion, as shown in Fig. 2, where

K : + I , T = A ~ + ~ , T R ~ ; n

K,’+l ,T=An+I,TRi>-l n

are the so-called reflection coefficients. This set of ladder recursion is identical to the multichan-

nel extension of the Levinson algorithm [lo] for the case when covariance information rather than data samples are given.

B. Time- Update Recursions In [1]-[3], we established the time-update recursions of

the least squares ladder algorithms by showing that the time-updated quantities are the exact solutions to the nor- mal equation at each time step. This exact method gives a superior tracking behavior, because it eliminates any error that would be caused by approximate solutions, such as gradient-type methods. The techniques used in [I]-[3] are based on making use of certain “shift-invariance’’ proper- ties of the normal equation associated with the problem. The exact time-update for the linear predictors are then calculated by making use of such properties, and then in turn the new prediction errors and partial correlations are computed.

In the geometric formulation, we work directly with the projection operators and their decompositions, and there- fore the method provides an illuminating geometric picture to explain the exact time-update nature of the problem. Furthermore, the methods used are quite simple and ab- stract operator theory is not needed. The only important concept involved is the notion of angles or gaps between two linear subspaces. We will show that the gains in the exact time-update recursions are directly adjusted by these angles or gaps and thus explain the fast convergence be- havior of the algorithm.

In this formalism, we can derive the time-update recur- sions using three approaches, none of whch involves the normal equation or predictors.

( I ) Geometric Method In this method the projection operator is decomposed

into one that projects on the past observations and another that generates the correction due to a new observation. The decomposition is based on a projection operator that is not orthogonal, but an “oblique” projection.

(2) Operator Derivative Method In this approach a family of inner products is defined on

HT as T-+ 00, and a time “differential” (or difference) operator of the following form is introduced:

V ( x l y ) T ’ ( I - - z - ’ ) ( x ( ~ ~ ) ~

= ( X I Y ) T - ( X I Y ) ? , - - r . (38) This operator generates “differentiation” (or differencing) rules similar to the regular differentiation rules of func- tions, so that any operator can be updated in time by such an operator formalism.

(3) Gram-Schmidt Method The third approach is to apply the Gram- Schmidt tech-

nique directly to the vectors in f IT . It can be viewed as an extension of certain square-root array methods [20].

Due to space limitations, we only present the first ap- proach here while the other approaches can be found in

Time Update Recursions: Geometric Method

we introduce new notations

P.11, 1161.

Recalling the definition of the coordinate map of (16),

/ y , ) T a P T / Y ) T = [ O , O , . “ , y T l ’ (394

IY.-),~-p,IlY).=~Y,~Y,~~~~:Y,.-,7~1’. (39b) Note that ( x , ~ ~ , ) T = ( ~ , I y ) r = ( x j y , ) , and similarly for ( x ~ - I y - ) T .

The objective here is to obtain an exact decomposition of the projection operator on Iy). /- into one that projects on 1 y- ) T , i.e., the previous or old projector, plus a correc- tion or update that can be obtained solely based on the current input data sample, 1 Y , ) ~ , i.e., the observable com- ponents of our vectors in the sample-product space. Once such a decomposition of the projection operator is ob- tained, we can immediately derive from it the time-update recursions of variables, such as the partial correlations, prediction errors, etc. Moreover, since our decomposition is an exact one, our recursive solutions are without any approximation.

Consider the simple case of expressing the projection of on to a single vector 1 ~ ) ~ into the projection on

I y- ), and / y,, ) T as illustrated in Fig. 3, where the vertical axis represents the axis spanned by 1 y, ) and the horizon- tal axis represents its orthogonal complements, I y _ ) T .

From now on, we drop the subscript T unless specified. One simple way to decompose the projection is by

breaking /x) into /x,,) and [ x - ) and then projecting them onto 1 y ) :

Pylx) =PyIx-> + - , / X , ) (404

LEE et 01.1 LADDER ESTIMATION ALGORITHMS 633

PT

Fig. 3. Decomposition of projections.

writing the expression in full, we have

l Y ) ( Y l Y ) - ' ( Y I x ) = I Y ) ~ Y I Y ) - l ( Y l x -

+ l Y ) ( Y l Y ) - ' ( Y l x , , ) (40b)

+IY>(YlY)- ' (Y, lx) . (404

=lY)(YlY)-'(Y-lx)

We can express the decomposition in another way by first brealung I y ) into I y - ) and I yn) and then projecting [ x ) onto them. Such a decomposition is facilitated by the definition of oblique projection. Oblique Projection

In our definition of projection operators so far, we have assumed implicitly that the projection is an orthogonal one, that is Py projects vectors onto 1 y ) in the direction orthogonal (or perpendicular) to I y > . Projections which are nonorthogonal (nonperpendicular) are called oblique. In particular, we define the oblique projection operator on [ y ) , denoted by Py- , as one which projects vectors on I y ) in the direction orthogonaI (perpendicular) not to I y ) but to I y - ) . Thus the oblique projection of I x) onto 1 y ) is

la-.) =P&) ~ly)(.~-ly-)-'(Y-lx). (41) It is straightforward to verify that the definition of Py- satisfies the idempotent property of a projection operator, i.e., Py - . P = Py-. , and so does its orthogonal comple- ment, P$ kT I - Py-iL, i.e., Py"- . P k = Pk . Also, one can readily verify that PJ,- I x ) I I y.- ) .

We call the result of this oblique projection the "predic- ted estimate" of Ix) , i.e., 1 % ) is the estimate of /x) by I y ) but based on the regression coefficient obtained at the previous time samples. The oblique projection is illustrated in Fig. 4.

Using Py- and PJ,&, we obtain an important decomposi- tion of (33), which leads to the main result, the exact time-update recursions. We first give an explanation of the decomposition using a simple geometric picture and then give the proof using projection operators.

The decomposition we seek is the following:

P y l x ) = P y ~ - I x ) +P,P,P,I_Ix) (42) and is illustrated by the geometric picture of Fig. 3. In the

PT

Fig. 4. Decomposition of projections by oblique projection.

figure, I x) and I y ) are represented as vectors in R', with 1 y ) decomposed into 1 y,, ) , parallel to the z-axis and I y- ) lying in the horizontal plane. That is, the coordinate pro- jection P, extracts the vertical component of the vector in this geometric picture. The angle between I y- ) and I y ) denoted by 9, is a measure of the closeness between the subspaces spanned by I y - ) and 1 y > , respectively.

The first term on the right side of (42), (Fig. 4(a)), is the oblique projection of Ix) on 1 y ) , based on the previous . projection coefficient, i.e., on 1 y - ) . The second term is then the correction factor to make up for the difference between the orthogonal and oblique projections. The pres- ence of the PT operator in the middle of the second term needs some explanation. Recall that P, is the coordinate projection that extracts the current (observable) compo- nent of any vector in the sample-product space. It enables us ,to obtain the correction term in the decomposition directly from the current input data sample. In the geomet- ric picture of Fig. 4, P k I x ) is represented by the vector b, P , P k I x ) is c, whch is now parallel to the z-axis and also in the' plane spanned by I y ) and 1 y- ) . Projecting c on I y ) , i.e., PyPTP,,k Ix), we have d; and together with u gives us Py I x ) , the left-hand side (LHS) of (42).

We now give a proof of (42): The first term of the right side of (~OC), call it a , can be

rewritten as

~=IY)(YIY)-l[(Y-lY)(Y-lY)-'l(Y-lx) =ly) ( Y I Y ) - l [ ( Y I Y ) -(Y,IY)l(Y-lY)-'(Y-lx) =IY)(Y-lY)-'(Y-lx) - l Y ) < Y l Y ) - '

= P y J x ) -P,P,q,-lx) . ( Y , I Y ) ( Y - - I Y ) - ' ( Y - l x )

and adding the above to the remaining part of (40c) gives a+P,P,(x> =Py-Ix) +PyPTPy+Ix)

which gives (42). Q.E.D.

634 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASP-29 , NO. 3, JUNE 1981

It is straightforward to verify the idempotent property of

We go one step further to obtain the decomposition of the RHS of the above decomposition.

Py" [ x ) = I x ) - P y l x ) using (42):

P,"Ix) = I x ) - P y J x ) -P,P,P,I_Ix)

=P,'_lx) -PyPTP$IX)

=PT'P,I_Ix) -P,P,P,I_(x). (43)

The idempotent property of the RHS of (43) can also be readily verified. Thus taking the inner product with itself and reintroducing the correct time indexes we have

(xIP,"Ix)T=(xlP,IIx)T-I

+(XI~,I l " )T(71.I~Yl_IX)T (44)

V(XI~ylIX)T=(XI~yll")T(71.I~$Ix).. (45)

or rewritten in the time difference notation of (38)

The result of (45) contains the main result of the exact least squares time-update recursions, where we have ex- pressed the time-updates of the orthogonal projection oper- ators in terms of the product of the most recent compo- nents of the projection results.

It remains to show that the most recent (observable) component of the oblique projection is actually propor- tional to that of the orthogonal projection. The most striking result is that the components are related to each other simply by a scale factor cos2 9, that can be interpre- ted as a measure of closeness between the two subspaces. This measure can be directly updated from the prediction errors, hence we have a complete set of equations for time updating the desired estimates. Angle Between Two Subspaces: Scalar Case

The notion of angles or measure of separation and closeness between two subspaces [23], and the associated operator techniques for manipulating these angles have been extremely useful in analyzing problems that involve any kind of geometric interpretation, (e.g., [22]-[25]).

Let the angle between I y ) and 1 y - ) (the horizontal axis), be denoted by 9 so that we have the geometric relation (we assume the scalar case for the moment)

IIIy)IIsin26=IIIy,,)II (46)

Thus 9 or sin2 9 is a measure of the gap between the subspaces spanned by I y ) and I y- ) and also .a measure of how much new information (innovation) is provided by the most recent component of the observations, i.e.,

sin29+0 as IIIy,,)II+O

and

This notion of angle or gap between two linear sub- spaces can be generalized to higher dimensional subspaces, as well as more "outputs," i.e., instead of a scalar v we

might have, say, a orthogonal coordinate maps T or ob- servable components and an associated possibly indefinite measure.

In the n-dimensional case, consider the subspace Y,,,, then we can define the angle a,,, between Iy,,,) and PTIY, , , ) by

sin2'1,n=(nIP1,nIm)

=(~IY~,~)('~,nIYl,~)-'(Yl,nI~) ('Sa)

C O S 2 ~ l , n =("P+flI71.) or alternatively by

= 1 - sin2 G I , , . (48b)

The angles obey order-update recursions which are ob- tained from the decompositions of the projection operators given by (29) and (30):

c o s 2 9 1 , n + l , T ~ c o s 2 9 1 , ~ , T ~ r ~ , T - l R < ~ - l r f l , T - l

(494 cos2 QO,,,T = cos2 9 l , , , T -c;, TR<>z, ,T. (49b)

The recursions show that the angles are actually updated and computed from the prediction errors. They are also important in the derivation of the normalized recursions. Note that we have also referred to the measure sin' I Y ~ , , , ~

using the symbol yI , ,,? in [1]-[3]. With the 'angles defined, we want to derive the relation

between the components of oblique and orthogonal projec- tions. Indeed we can show that [21]

PTPyLIx )=PTPy , Ix ) cos28 (50)

where

cos2~=1-~"IY)(YIY)-1(Y171.). See Fig. 3.

This is a very nice result because we can write the exact time-update recursions in a symmetrical form given by the following formula. Exact Time- Update Formula I

Let Yl, n,T be the linear subspace spanned by the ob- servations { I z - ' ~ ) ~ , . . . , I Z - " Y ) ~ } and Pl,n,T its orthogo- nal projector. Then the exact time-update for the orthogo- nal complement, P t n , is given by

v(ulP?n,TIv)T

' (UIP,~,,TID)T-(UIP~~;,,T--II~))T--~

= ( U I P ~ f l , T I " ) T ( " I P i , T I v ) S e C 2 9 1 , n , T

(51) where

sec2~l,.,.=l/("IP,tn,.l")T for any 1 u ) , I D ) EH,. Partial Correlation and Prediction Error Covariance Time- Updates

We can immediately establish the time-update for the partial correlations of (33), so that

' _,

LEE et ul.: LADDER ESTIMATION ALGORITHMS 635

A n + l , T - A n + l , T - l + ( ~ ~ P i f - n , T ~ a ) T - V=l-AZ- ' (56 )

'(TIP,f,, ,lz-n-'y)Tsec261,,, , then, it can be proved that the time-update recursions for the projection operator has the same form as (52), and, for

- ' n , T r l , T - l -An+l,T--l + 1- sin26, ,n , ,

(52) example, the time-update for An+ simply becomes

An+l,T-XAn+l,T-l-t'n,Tr~,T-lseC291,n,T (j7) -

where i.e,, X is a "forgetting factor" operating on the old estimate s i n 2 ~ l , n , T = ( ~ ~ Y l , , ) ~ ( Y l , n ~ Y l , ~ ) ~ 1 ( Y l , n ~ ~ ) T (53) of A. Different weighting windows are possible, for in-

stance a sliding or growing rectangular window as in the so-called covariance method, see e.g., [5], [21] or more complicated windows.

We may note that the operator T, referred to as coordi- nate map here, plays a central role in generalizing our

1211, 1391, 1401.

and is equal to the likelihood variable y,, =- as defined in the original exact time-update recursions in [1]-131.

The time-update for RZ,T and RL, can be easily ob- tained as

R ' , , T = R ~ , T - ~ ' ' n , T Z ~ , T S e C 2 6 1 , n , T (Ma) algorithms to finite rank of a-stationary process, see e.g.,

R ~ , ~ ~ ~ ~ , ~ - l ~ r n , ~ r ~ , T s e c z ~ O , n - l , T (54b) 111. NORMALIZED LADDER RECURSIONS

where

sin260,n-l,T

~ ~ T ~ Y O , n - I ~ T ~ Y O , n - l ~ Y O , n - l ~ ~ l ~ Y O , n - l ~ T ~ T

-Yn- I ,T* -

Note that we have omitted the proof that sin2 n, G 1, a fact that can be easily verified from (53) realizing that projection operators have eigenvalues less than one. Remarks

1) In general, a measure of the difference between two subspaces can be obtained via suitable norms of trigono- metric functions of the angle 6 between them. In the exact time-update theorem, the correction term is inversely pro- portional to the cosine square of this angle. Thus, the recursion can adjust its correction or gain according to how much new information (innovations) is provided by the new observations. Therefore, if there is any sizable change in the observed data, as indicated by large gaps or angles, the gains will be adapted very quickly to reflect the change. The geometric relationship thus explains the advantage of the exact updates over approximate methods, such as gradient-type techniques.

2) The trigonometric variable sin2 6 n, also has an important interpretation as a likelihood variable. We have successfully used this likelihood variable as a detection test statistic in the context of pitch detection in speech process- ing 1261.

3) Sliding Exponential Windows A sliding exponential window that reduces the influence

of past data exponentially backwards in time, can be introduced into the definition of inner products by simply inserting a diagonal weighting matrix of the form

T ( x I y ) T = x,XT-'y:, O t A S l . ( 5 5 )

t = O

Under this new inner product, the order-update recursions retain the same form as the unweighted case, and only a slight change is needed for the time-update recursions.

If we define the time differential operator under a h weighted metric by

In this section we present the normalized ladder recur- sions using a normalization scheme that enjoys some ad- vantages. Our major objective is to reduce the complexity of the ladder recursions and at the same time improve the numerical conditioning on the ladder variables. In particu- lar, we show that the ladder recursions are reduced to a minimal set of only three recursions per order per time sample and that the magnitudes of the forward and back- ward innovations and the reflection coefficients are always bounded above by unity.

For expository reasons, the normalization scheme is presented as a two-step procedure. A direct one step proof is easy to obtain with hindsight. The first normalization is a variance normalization where our variables are normal- ized by their respective covariance matrices. This step essentially gets rid of the two order-update recursions for the forward and backward covariances. The second nor- malization, which can also be considered to be a renormali- zation, is an angle normalization, where we get rid of the order- and time-update recursions for the angle variables. The end result is that the number of recursions is reduced to a minimal set of three recursions, that of 1') ,I r } and p. Furthermore, the exponential weight factor does not ap- pear at all except at the zeroth order, thus rendering extreme flexibility for an adaptive implementation. We emphasize that our aim is to reduce the complexity of the recursion and we shall not discuss in any detail the im- provements and implementations in terms of numerical conditioning.

We first establish some conventions about matrix square roots. Define the (nonsymmetric) square root, or strictly speaking the factor matrix R'12 of any positive- definite matrix R as the lower triangular matrix satisfying R = R'I2RTI2, ( RT12 = (R'I')'), where R112 is made unique by defining the diagonal elements to be positive. For convenience define also R - '1' i? ( R 1 / ' ) - and R - T / ~ & ( R - ~ P ) ? . First Normalization: Variance Normalization

We normalize the forward and backward innovations by the square roots of their respective covariances, and call

636 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3 , JUNE 1981

them the first or unit variance normalized innovations

IV,)T~lE,)T(E,IE,)TT/2 (584

171,)T~l‘n)T(Y,lY,)TT/2 (58b) where

(E,IE,)T=(‘nl~n)~2(~~*lE,)~2

(7, I rn ) T = (m I r, )Y2(r, I r, )Y2- and

Now we can define the normalized partial correlations by

P , + l , T = ( ~ n l Z - l l n ) T A

=(C,IE,)T1/2A,+l,T(z-’r,Iz-‘r,),T/2. (59)

The ladder recursions for the variance normalized innova- tions are obtained as follows. From order-update recur- sions we get

l~,+I)T(E,+IlE,+1~~2(E,lE,)T - T/2

=IVn)T-Iz-lT,)P,:+l,T. (60) Squaring (60) and taking square roots, we obtain

(E,+IIE,+1)~2(E,IE,)TT’2=[1-P~+l,TP~+I,T1 . T/2

(61) Now (60) can be written as

I~ ,+1>T=[ l~n)T-IZ-171, )TP, :+I ,Tl

’[r-P,+I,TP,:+I,TI-T’2. (62)

Similarly, for the backward innovations, we have

171 ,+ l )T=[ Iz -1~n)T- l~ , )TPn+l ,T l

-[I-P,:+l,TPn+l,* ]-T/2. (63) By applying the coordinate map 7~ on the vectors, we get

the normalized order-recursions written in terms of the observable variables as [E:::;] = [ Un+LT -

O I-‘/’[ -Pnf+l,T 0 v , + l , T I 1 P n t 1 . T

a [ V n ’ T ] (64) V n , T - I

where U,+ l ,T=I -Pn+l ,TP ;+ l ,T

~ , + l , T = r - P ~ + l , T P , + - I , T .

Note that the normalized partial-correlation P , + ~ , ~ has singular values of magnitude less than one when {y , } is a full rank process. Second Normalization: Angle Normalization

weighting as Rewrite the time-update recursions with exponential

h(E,lE,)T-I=(f~(E,)T-(E,)71.)T(7TIE,)TSeC2gl,~,T.

Define the second (or “information”-) normalized forward innovations by

I’,)T~I’V,)T~~~’I,.,T. (66) The geometric meaning of t h s normalization can best be

illustrated by the geometric picture of Fig. 4. Apply the coordinate map 71 to (66) gving us Fn,T=~n,Tsec91,,,T, whose magnitude is equal to the projection of I V , ) on the plane spanned by I y ) and I y - ) , which is a more relaxed subspace than In-). Therefore, the requirement that we must work with the natural observables, i.e., v , , ~ and is too restrictive, and by relaxing the restriction a little, which is what the second normalization is doing, we save one projection operation while the decomposition of the projection operator is still maintained. In other words, by propagating the second normalized variables, we still get the same answer for the reflection coefficients, but the time and order-update recursions for the angle variables will not be needed.

We continue by obtaining the following relation (it will prove to be very useful later):

( € , ~ E , ) ~ ~ l ( € , ~ E , ) ~ ~ ~ 2 = h - - 1 ~ 2 [ r - ( ~ , ~ 7 T ) T ( 7 T ~ ~ , ) T ] T ~ 2 .

(67) Similarly, defined the second normalized backward innova- tions by

117n)T~IV,)TSeC~0,,-l,T (684

IZ-1 .17n)T=IZ-1T, )TSeC91,n ,T ( 6 W

and

so that, we have

( z - ’ y , J z - 1 r , ) ~ ~ l ( z - ’ r , l z - ’ r , ) T T / 2

=X-’/’[~-(Z-’~,I~T)~(~TIZ-’~~~)~]~’~. (69)

Then, we can immediately derive an important time-update recursion for the normalized partial correlations. Normal- ize the quantities in (59) accordingly and include the exponential weighting, then

P,+l,T-(E,l~,)T1/2(~~lE,)1T~IXPn+l,T-I -

. ( z - ’ r , I z -1r , )~~1(z -1r , I z - ’ r , ) ,T /2

+ ( V , I ~ ) T ( ~ I Z - l % z ) T * (70) However, given the results of (67) and (69), (70) is reduced to the component form

P , + l , T - [ ~ ” ~ , T ~ ~ , T I ” 2 P ~ + l , T - l -

d - % , T - I G , T - I l T/2 + & , T G , T - l . (71) Note that the exponential weight X does not appear in the above recursion, since we have combined the three time- update recursions for R;,=, Ri,T, and A n + l , T into only one recursion for pn+ with X being conveniently cancelled

(65) out! This is actually quite surprising, since from the unnor-

LEE et ul.: LADDER ESTIMATION ALGORITHMS

malized recursions we can see that it does indeed have an influence on the p’s. We must conclude from this that the magnitudes of F.and i‘j carry the X information. Ladder Recursions

The ladder recursions for the second normalized innova- tions are obtained as follows. Using (62) and (66) we obtain

. [ z L ~ n + l , T p L + l , T ] - ‘I2. (72) From the subspace decomposition given by

P l , n + l , T = P l t n , r - I z - l ~ n ) T ( Z - l j i n l T I

we get

Therefore, (72) can be written in component form

’ n + l , T - [ ’ - ~ n + l , T ~ ~ + I , T ] - [ ’ n , T - & + l , T q n , T - 1 1 - 1/2 -

. [ l -~~,T-lqn,T- , l -1/2. (74) Similarly we get the recursion for I i jn ), - 1 / 2 - T n + l , T = [ ~ - P ; + l , * ~ n + l , T l - [ T n , T - l - P ; + l , T F n , T l

-[l-Y;,TFn,r]-1/2. (75)

We have reduced the ladder recursions to just three equa- tions per stage, i.e., (71), (74), and (75). It is plausible that this is the absolute minimum obtainable, i.e., n-equations for computing the partial outputs {Y}, the states { i ‘ j } , and the parameters { p } .

We can rewrite the ladder recursions in a very compact way. For notational convenience and also important con- nections with rotations, we define the “complement” and the associated inverses and transposes of a matrix M as

MC 4 [ 1- ”’1 1/2 M-“k[[I-”’]-’/2

M “ T [I-”’]T/2 ” C + [ I - ” ’ M ] ’ / 2

” - c + [ I - “MI - 1/2 M f C T & [ I - M ” ] T / 2 .

Then we can rewrite (71), (74), and (75) as - - c

~ n + I , T - P , + I , T [ Y , , T - P + l , , i ’ j n , T - l I i ’ j ’ ~ ~ - l (764

I I ~ + ~ , T = P ’ ~ + c ~ , T [ ~ ~ ~ , T - - - P ; + ~ , T v , , T ~ ~ ’ ~ , C T (76b) -

~ n + l , T - ~ n , T ~ n + l , T - l ~ n , T - l + F n , T ~ , T - l ~ (76c) - --c -cT

If we rewrite the time-update recursion of (76c) in reverse we have

p n + l , T - l = F ’ ~ ~ [ p n , T - F ~ , T ~ ~ , T - l l i ’ j ’ ~ C T - , (76d) i.e., it has the same form as the order recursions of (76a) and (76b)!

Indeed, the three update recursions (76a)-(76d) can be compactly expressed by a generator and its inverse of the

637

form2

G ( A , B,C, D)=B-c[A-BD-’C]C’-C (774

G(A, B , C , D ) = B ~ A c ~ ~ + B D c (77b)

so that the three-ladder recursions can be expressed as functions of this generator and its inverse by

F n + l , T = G ( F n , T , P n + l , T , T 7 , , T - i , I ) (784 -

- - ~ n + l , T - G ( i ’ j n , T - l , p , : + l , T , ’n,T, (78b)

- P n + l , ~ - - G ( ~ n + l , * - l ~ ’ n , T ~ I I n , r - l ~ ~ ) ~ - (784

- - -

The significance of this generator in terms of rotations in Hilbert space is discussed in [16], [35]. Initializations

The initial conditions are =q0,-1 =o

P ~ , - ~ = O , l<n<nmm. When yT is observed, the input to the ladder is de-

(1) rT = R;:i2yT (794

(2) R’T/’ =R’,(2,[ AZ+yT?&]’/2 (79b)

termined by the following procedure:

(3) ijo,T=F0,t=R;1/2yT. (794

Note that a positive a priori value for R‘i ; is needed ‘at T=O, however, the effect of this value is insignificant after a few time samples. Alternatively, set Fo,o = 1 - c , E > O or define the result of possible divisions by zero as zero. Note also that the exponential weight h appears only once in (79b). This offers much flexibility in choosing its value at each time sample. Remarks

The reflection coefficients in the information or magni- tude normalized version of the exact least squares ladder forms are normalized to have magnitude (norm3) less than or equal to unity and thus provide a very convenient (numerical) stability check for the resulting prediction filters. The complexity of the normalized ladder recursions is much reduced, from nine recursions to only three recur- sions per order-and time-update. The requirement for effi- cient computation of square roots can be satisfied, for instance via the use of the so-called CORDIC or other bit-recursive algorithms, see, e.g., [36], [37]. The exponen- tial weighting factor appears only once in the zeroth-order gain and thus provides much flexibility in choosing its value possibly at each time step for better tracking capabil- ity.

Since the underlyng geometric framework for the de- rivatives of the square root normalized ‘algorithms can be viewed as a natural extension of the Gram-Schmidt based

2o typically plays the role of a signature matrix in the finite rank or

31f m > 1. all the sineular values lu.) of o obev OGu. < 1, see, e.g., [lo]. a-stationary process case, see [21], [16], [39], [MI.

638 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3, JUNE 1981

square root array methods [16], and such square root methods are known to have superior numerical properties, it is expected that the square root normalized ladder algo- rithms have better numerical properties than the unnormal- ized versions. One example of such superior performance has been illustrated in [lo], in the context of multichannel spectral estimation.

IV. LADDER RECURSIONS FOR JOINT PROCESS ESTIMATION

In [3], [ 5 ] , we have presented the technique of embedding two processes into a single multichannel process to obtain ladder recursions for applications such as adaptive equali- zation, and noise cancelling, ARMA modeling of time series and adaptive control. In this section, the normalized versions of the various ladder recursions are presented. Joint Process Estimation

Given observations of two related processes {xt, yr,O<t < T } , the problem of finding the least squares estimator of {x,} based on {y,} can be formulated in a geometric framework similar to the one discussed in Section 11. For simplicity, we assume that x, has the same dimension as yr (and lies in the same i f T ) , since the extension to the arbitrary dimension (and different spaces) case is straight- forward. The process {xr} can be considered as a vector I x ) ~ € i f T . Its orthogonal projection onto the subspace Yo, n , is the n th-order least squares estimate of I x ) T based On { l ~ ) T , * * ' , l z - f l y ) } ~

~ ~ n ~ T ~ ~ y O , n ~ T ~ y O , n ~ y ~ , n ~ ~ l ~ y O , n ~ x ~ T - -'O,n,TIX)T (80)

and

I ' ; ) T ' I X ) T - I ~ n ) T = P O t n , T I X ) T (81) where { I c ; ) ~ , O < ~ < N } are the residuals that are propa- gated in the ladder.

The order-update recursions, therefore, follow from the orthogonalization of Yo,n,T into {lri)T,O<i<n} and is given by

I';:)T=IE~-I)T-Irn)T(rnlrn)~l(rnl'~-l)T. (82) The component form is obtained via the coordinate map I T )

',",T-';-l,T-AXnl,TRn,;rn,T -

(83)

A;,T A (rnlC;-l)T (84)

where

is the nth-order partial regression coefficient of {xr} on {y,} . From (64), the error covariance order-update recur- sion is obtained as

R C ( X ) - RC(X) n , T - n-1,T-'Xnl,TR,!&,T. ( 8 5 )

The time-update recursions for A:, and R:(,XT) immediately follow from (51) and are gwen in component form with A as weighting factor

A~,T=AA~,T-l'rn,T',"~1,TSeC290,n-l,T (86)

Fig. 5. Joint process ladder estimation.

R'(X) n-1 ,T- -Re(") n-l,T-l+'~-1,T'n-l,TSeC290,n-l,T X I (87)

where a0, n- ,, is defined by the trigonometric relation

~ i n 2 9 0 , n - l , ~ ~ ~ T ~ y 0 , n - l ~ T ~ y 0 , n - l ~ y 0 , n - l ~ ~ 1

*(yO,n-lIT)T. (88) The initial conditions for (64) are given by

At,T=O, forn>T (89)

and c X ~ , ~ = X ~ , forT2O. (90)

Thus the recursions (65), (66)-(68) together with the pre- dictor recursions of Section 11, give a ladder realization of our exact least squares estimation problem. The structure of this ladder form is shown in Fig. 5 . Normalized Recursions

The normalized recursions for the estimator above are obtained using the same procedure as described in the last section. The residuals are first normalized with respect to their covariance, i.e.,

I ~ ~ x ) T ' I ~ E ; : ) T ( c I E ~ ) T (91) x -T /2

then the normalized partial regression coefficients are de- fined by

&,T'(qnl':-l)T (92) where I qn ) T are the normalized backward innovations defined by (58). Next we apply the second normalization by sec Go, n- so that

IF$n")T=IF;)TSeC60,n,T (93) and obtain the normalized recursions in component form as

' f ,T=['-&:TP;,T]- T / 2 [vn-1,T-&:T?7n,T] -x

' [ 1 - % z , T ~ n , T l - 1 / 2 (94) with a time-update on P , " , ~ given by

~ , " , T = [ ' - ~ n , ~ ~ n , T l - T ' 2 ~ ; , T - l [ ' - ' ~ , T ' ~ , T ] T ' 2 ' ~ n T , qx: n T '

(95) Note that the weighting factor A is again cancelled out in all recursions and appears only in the initialization proce- dure as described in the last section. Remarks

1) In noise cancelling applications (see e.g., [3]), {y,} is the noise estimation or reference signal, assumed to be

LEE et ui.: LADDER ESTIMATION ALGORITHMS

observed together with the primary observations contain- ing the signal { x r } which we would like to estimate. In adaptive equalization applications { y t } is the channel out- put and { x I } is some known training sequence. In some applications when x, is not readily available at time T, then an alternate estimate that is based on predicted parameters in the estimator, the so-called (parameter) pre- dicted estimate is used. The predicted estimate is given by

I~fl,r-l)T=IYo,fl)r(Yo,~l~,fl~~~l(~,~lx)r (96)

an oblique projection as defined in (41), since

~ ~ , , , l ~ ~ l ~ o , ~ ~ ~ = ~ ~ o , , , l ~ , f l ~ ~ - l ~ Ladder recursions for both order- and time-update of oblique projections can also be obtained in quite a straightforward manner.

2) The ladder recursions for a fixed-lag smoothing filter is readily obtained by projecting I z -"x) , into and then orthonormalizing Yo,?,, by {[vi),, O G i S n } . In fact, the projection formalism 1s general enough so that both order- and time-update recursions can be obtained for any arbitrary k-step prediction or smoothing filter for Ix ) , given I y ), (or vice versa for inverse filters).

V. ARMA LADDER RECURSIONS

The ARMA ladder recursions presented in [3] and [5] can be extended into the normalized forms via the projec- tion formalism. Here we only present a basic framework on how to set up ARMA problem. The full treatment is given in [18] and [21].

In the ARMA case, we expand our sample-product space to include the inputs {u, , 0 < t S T } { 1 y ),, . . , ~ z ~ " y ) ~ , ~ u ) , , ~ ~ ~ , ~ ~ ~ " u ) ~ } and denoted by

W , n , T = Y o , f l , T @ ' o , n , T *

Then the joint forward and backward innovations are defined by orthogonal projection operators on the sub- space of Wo,fl,,. Time-updates are obtained either from geometric considerations or operator formalisms discussed in the previous sections.

The ARMA modeling problem can be viewed as an application of the two process {x, y } embedding type. In the known input case we can distinguish two cases, either { u t } is uncorrelated, i.e., 1 ~ ) ~ I (a case assumed in [3], [5] ) or l u ) , is correlated. In either case a joint AR assumption on { u , y } has to be made, i.e., in z-transform notation

a(z - l>y( z )=b ,u ( z )+b , ( z - ' )u ( z )

if {u} is uncorrelated (white), and

if { u } is correlated. However, in this case { e , e} has to be

639

uncorrelated, a commonly made but often tenuous assump- tion. More generally {e , E } could be modeled as a moving average process, this is the same as the assumption that { y , u } is jointly ARMA, a general but more complicated case. We note there that b ( z - ' ) / Z ( z - I ) is the inverse model transfer function (input = { y } , output = { u}), useful in adaptive control and some communication problems, see, e.g., [33].

In the case where I u ) ~ is not directly observable, it can be replaced by the normalized orthogonal projection of I y ), onto Wp, q , T - l . The resulting ladder structure con- tains a single feedback path, as shown in [3], or a distrib- uted feedback, depending if R i , T is factored into upper times lower or lower times upper triangular factor. This second version appears to be preferable if the order of the underlying ARMA model is not a priori known, since the high order feedback coefficient can converge to zero without structural changes, hence. adaptively adjust to the proper order.

VI. CONCLUSIONS We have presented a class of normalized least squares

ladder estimation algorithms based on a geometrical for- malism. The normalized ladder recursions have few storage requirements, very low computational complexity, and well suited for fixed point implementation and integration (e.g.,

The parameter tracking capability of these normalized forms is even more flexible because the exponential weight factor only appears in the zeroth-order update or power normalization stage.

Our new geometric formalism offers a powerful method for the derivations of the exact least squares recursions, and in particular, the nature of the exact time-update recursions was made clear. A first normalization enables the variables to have unit power and a second normaliza- tion compensates for the effect of new information ob- tained at each time-update.

The geometrical formalism also extends to the develop- ment of ARMA least squares algorithms and the finite rank process or a-stationary case, as well as various appli- cations such as noise cancelling, adaptive filtering, adaptive control, channel equalization, echo cancelling, etc.

VLSI).

APPENDIX I SUMMARY OF THE NORMALIZED, EXPONENTIALLY

WEIGHTED LADDER RECURSIONS, FOR THE SCALAR CASE

Input Parameters q,, maximum order of ladder form h exponential weighting factor u prior covariance yT data sequence.

R, estimated covariance of y p,,, , reflection coefficients Ffl, , normalized forward innovations q,,, normalized backward innovations.

Variables

-

640 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 3, JIJNE 1981

Initialization Ro U+Y,2

’0,o V0,o ‘Yo /&I Kp,o 0, forp= 1;. *, nmm.

At each time step we have available jJn,T, ijn,T- I ,

Main Loop

P ~ + ~ , ~ - , , RT-l and a new data pointy,. (i) Set

R T y A R T - 1 +y;

~ ~ , T = v O , T = Y T / K .

(ii) For n = O to p = min {T, nmm} - 1, do the follow- ing:

P n + l , T - P n + l , T - I ( l - ’ ~ , T ) - ( 1 - V i , T - l ) 1 ’ 2 + ~ n , T V n , T - l 1/2

Remarks (i) To speed up the estimation of the covariance func-

tion RT, it may be useful to start with X = 1, and then reduce X to its desired value.

(ii) An alternative startup procedure is the following: (a) Startup using the unnormalized ladder recursions. (b) At T=nmaw compute the normalized quantities and

(iii) It can be shown that if the prior u=O the proper startup can be achieved by setting to zero the answer of a possible division by zero, i.e., using pseudoinverses.

proceed with the normalized recursions.

APPENDIX I1

LADDER ESTIMATOR SUMMARY OF THE NORMALIZED JOINT PROCESS

We present the following summary of one version of the normalized joint process ladder estimator. We consider the exponentially weighted scalar case only. Input Parameters

nmm maximum order of ladder form A exponential weighting factor, (0( A( 1) yT noise reference sequence u, noise prior covariance xT signal data sequence uX signal prior covariance.

R$ estimated covariance of y R+ estimated covariance of .x pn, reflection coefficient of y Fn, normalized forward innovations of y T J ~ , ~ normalized backward innovations of x P;,~ regression coefficient of x on y

normalized estimate of x

Variables

-

F: normalized residuals. Initialization Given yo, x.

R { = u , +y;, RG =aX +xi

’0,o =770,0 ‘Yo/JRo P ~ , ~ - P ~ , O , forn=1;.-,nm,. - x

Main Loop At each time step we have available

Vn,T~~n,T~Pn+l.T--l~vn,T,Xn,T,Pn+l,T-l~ R ? - I , R & I - - -X Y

and new signal xT and reference yT. (i) Set

R+=AR+-I +y;

’O,T’VO,T=- Y T

JR? R”,AR”,l + X , 2

(ii) For n=O to n=min {T, nmaw}-l, do

P n + l , T - P n + l . T - l ( l - ’ ~ , T ) - (1 -xn ,T - l )1 ’2 1/2

- 2 n + l , T - x T - v n + l , T ~

-X END

REFERENCES M. Morf, D. Lee, J. Nickolls, and A. Vieira, “A classification of algorithms for ARMA models and ladder realizations,” in Proc. IEEE Int . Conf. Acoustics, Speech, Signal Processing. (Hartford, CT), pp. 13- 19, May 1977. Also appeared in Modern Spectrum Analysis, D. G. Childers, Ed., IEEE Press, NY, 1978. M. Morf, A. Vieira and D. Lee, “Ladder forms for identification and speech processing,” in Proc. IEEE Conf. on Decision and Control, (New Orleans, LA), pp. 1074- 1078, Dec. 1977. M. Morf and D. Lee, “Recursive least squares ladder forms for fast parameter tracking,” in Proc. IEEE Conf. on Decision and Control, (San Diego, CA), pp. 1362-1367, Jan. 1979. M. Morf, “Ladder forms in estimation and system identification,” in Proc. 11th Annual Asilomar Con$ on Circuits, Systenzs und Com- puters, (Pacific Grove, CA), pp. 424-429, Nov. 1977.

Tech. Rep. M303-I, Information Systems Lab., Stanford Univ. M. Morf and D. T. L. Lee, “Fast algorithms for speech modeling,”

E. H. Satorius and J. Pack. “AoDtication of least sauares lattice Stanford, CA, Dec. 1978.

algorithm to adaptive equdizatioh,” IEEE Trans. C&mun., vol. COM-29, pp. 136-142, Feb. 1981. E. H. Satorius and M. J. Shensa, “On the application of recursive least squares methods to adaptive processing,” Int. Workshop on Applications of Adaptive Control, Yale University, New Haven,

D. D. Falconer and L. Ljung, “Application of fast Kalman estima- tion to adaDtive eaualization.” IEEE Trans. Comm., vol. COM-26,

CT, Aug. 23-25, 1979.

pp. 1439- 1-46, O&. 1978. A. H. Grav. Jr. and J. D. Markel. “A normalized dizital filter

2 .

structure,” IEEE Trans. Acoustics, Speech, Signal Proc&ing, vol. ASSP-23, pp. 268-277, June 1975. M. Morf, A. Vieira, D. Lee, and T. Kailath, “Recursive multichan- nel maximum entropy spectral estimation,” IEEE Trans. Geosci. Electron. vol. GE-16, pp. 85-94, April 1978. Also appeared in

LEE et a/.: LADDER ESTIMATION ALGORITHMS 64 1

Modern Spectrum Analysis, D. G. Childers, Ed., IEEE Press, NY, 1978. F. Itakura and S. Saito, “Digital filtering techniques for speech analysis and synthesis,” in Proc. 7th Int. Cong. Acoust., Paper 25-C-I, pp. 261-264. Budapest, 1971. J. D. Markel and A. H. Gray, Jr., “On autocorrelation equations as applied to speech analysis,” IEEE Trans. Audio Electroacoust., vol. AU-21, p 69 79, Apr. 1973. T. Kailati; L. i j ung , and M. Morf, “Generalized Krein-Levinson equations for efficient calculation of Fredholm resolvents of nondis- placement kernels,” in Topics in Functional Analysis, vol. 3, pp. 169-184, I. C. Gohberg and M. Kac, ed., New York: Academic, 1978.

Levinson and Chandrasekhar eauations for general discrete-time B. Friedlander, T. Kailath, M. Morf, and L. Ljung, “Extended

linear estimation problems,” I E k E Trans. Altomatic Contr., vol.

B. Friedlander, M. Morf, T. Kailath, and L. Ljung, “New inversion formulas matrices classified in terms of their distance from ToeDlitz

AC-23, pp. 653-659, Aug. 1978.

matrices,” Linear Algebra and Its Applications, vol. 27, pp. 31-60, 1979. M. Morf, C. Muravchik, and D. T. Lee, “Hilbert space array methods for finite rank process estimation and ladder realizations for speech and adaptive signal processing,” in Proc. 1981 IEEE Int. Conf, Acoust., Speech, Signal Processing, (Atlanta, GA), Apr. 1981. M. J. Shensa, “Recursive least squares lattice algorithms-A geo- metrical approach,” Naval Ocean Systems Center, San Diego, CA, Tech. Rep. 552, Dec. 1979. D. T. L. Lee, M. Morf, and B. Friedlander, “Recursive ladder algorithms for ARMA modeling,” in Proc. 19th IEEE Con$ Deci- sion and Control, (Albuquerque, NM), pp. 1225- 1231, Dec. IO- 12, 1980. IEEE Trans. Automatic Control, to be published. G. W. Stewart, Introduction to Matrix Computations, New York: Academic, Feb. 198 I . M. Morf and T. Kailath, “Square-root algorithms for least-squares estimation,” IEEE Trans. Automat. Contr. vol. AC-20, pp. 487-497, Aue. 1975. D. ‘?. L. Lee, “Canonical ladder form realizations and fast estima-

Aug. 1980. tion algorithms,” Ph.D. dissertation, Stanford Univ., Stanford, CA,

T. Kato, Perturbation Theory for Linear Operators. New York: Spnnger-Verlag, 2nd ed., 1976. C. Davis, “Separation of two linear subspaces,” Acta Sei. Math. Szegld, vol. 19, pp. 172-187, 1958. C. Davis and W. M. Kahan, “The rotation of eigenvectors by a perturbation 111,” S I A M J . Numer. Anal., vol. 7, no. 1, pp. 1-46, 1970. (i. W. Stewart, “Error and perturbation bounds for subspaces associated with certain eigenvalue problems,” SIAM Reo., vol. 15. no. 4, pp. 727-764, 1973. D. T. L. Lee and M. Morf, “A novel innovation based approach to

and Signal Processing, (Denver, CO), pp. 40-41, Apr. 9-11, 1980. pitch detection,” in Proc. I980 IEEE Int. Conf, Acoustics, Speech

A. H. Gray, Jr. and J. D. Markel, “Digital lattice and ladder filter synthesis,” IEEE Trans. Audio Electroacoust., vol. AU-21, pp. 491- 500, 1973. J. D. Markel and A. H. Gray, Jr., “Roundoff noise characteristics of a class of orthogonal polynomial structures,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-23, pp. 473-486, 1975. L. Griffiths, and R. S. Medaugh, “Convergence properties of an adaptive noise cancelling lattice structure,” in Proc. I978 IEEE

Jan. 12, 1979. Con$ on Decision and Control, (San Diego, CA), pp. 1357- 1361,

V. U. Reddy, B. Egardt and T. Kailath, “Optimized lattice-form adaptive line enhancer for a sinusoidal signal in broad-band noise,”

W. S. Hodgkiss, and J. A. Presley, “Adaptive trackiig to multiple IEEE Trans. Acoust., Speech, Signal Processing, to be published.

sinusoids whose power levels are widely separated, IEEE Trans. Acowt., Speech, Signal Processing, pp, 710-721, this issue. J. Makhoul, “A class of all-zero lattice digital filters: Properties and applications,” IEEE Trans. Acoust., Speech, Signal Processing, vol.

M. J. Shensa, “A least-squares lattice decision feedback equalizer,” in Proc. 1980 Int. Communications Conf,, (Seattle, Washington), Oct. 1980. L. J. Griffiths, “A continuously adaptive filter implemented as a lattice structure,” in Proc. IEEE Int. Conf. Acoust., Speech, Signa(

M. Morf, and D. T. L. Lee, “State-space structures of ladder Proc., (Hartford, CT), pp. 683-686, Apr. 1977.

canonical forms,” in Proc. 19th IEEE Con$ on Decision and Control, (Albuquerque, NM), pp. 1221- 1224, Dec. 1980. J. S. Walther, “A unified algorithm for elementary functions,” in Proc. 1971 Joint Spring Computer Conf,, pp. 379-385, 1971. H. M. Ahmed, M. Morf, D. T. L. Lee, and P. -H. Ang, “A VLSI speech analysis chip set based on square-root Normalized Ladder Forms,” in Proc. I981 IEEE Int. Con$ Acoust., Speech, and Signal Processing, (Atlanta, GA), Apr. 1981. Also, H. M. Ahmed, P. -H.

coordinate rotation arithmetic,” in Proc. 1981 Int. Conf. Circuits Ang, and M. Morf, “A VLSI speech analysis chip set utilising

ASSP-26, pp. 304-314, Aug. 1978.

[38] E. H. Satorius and M. J. Shensa, “Recursive lattice filters---A brief overview,” in Proc. 19th IEEE Con{ Decision & Control, (Al-

[39] J. -M. Delosme and M. Morf, “A tree classification of algorithms buquerque, NM), pp. 955-959.

and doubling Type algorithms,” in Proc. 19th IEEE Conf. on for Toeplitz and related equations including generalized Levinson

[40] J. -M. Delosme and M. Morf, “Mixed and minimal representations Decision and Control, (Albuquerque, NM), pp. 42-46, Dec. 1980.

for Toeplitz and related systems,” in Proc. 14th Annual Asiiomar Conj on Circuits, Systems and Computers, (Monterey, CA), Nov. 17-19, 1980.

* Daniel T. L. Lee was born in Hong Kong in 1952. He received the B.S. degree in electrical engineering from Cornell University, Ithaca, NY, in 1973 and the M.S. and Ph.D. degrees, both in electrical engineering, from Stanford University, Stanford, CA, in 1975 and 1980, respectively.

From 1973 to 1976 he was a Research Assis- tant at the Stanford Radioscience Laboratory where he worked on propagation of ultra-low- frequency (ULF) waves in the magnetosphere and ionosphere. From 1977 to 1978 he was a

Research Assistant, and in 1979 a Research Affiliate at the Stanford Information Systems Laboratory working on fast estimation algorithms, speech modeling, and compression. During the summer of 1978 he was at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, where he worked on speech recognition. Since November 1979, he has been with the IBM San Jose Research Laboratory, San Jose, CA. His current research interests include estimation and identification, informa- tion theory, signal processing algorithms, speech and image processing, algorithms and architectures for efficient computations, and VLSI sys- tems.

Dr. Lee is a member of Tau Beta Pi, Eta Kappa Nu, ACM, AMS, SIAM, and American Geophysical Union.

+ Martin Morf (S’70- ”74) was born in

Winterthur, Switzerland, on June 16, 1944. Hc received the Engineer’s degree in electrical en- gineering from the Federal Institute of Tcchnol- ogy, Zurich, Switzerland, in 1968, the M.S.E.E. and Ph.D. degrees in 1970 and 1974, respectively, from Stanford University, Stanford, CA.

In 1968 he was a Research Scientist at the Technical Physics Laboratory, Federal Institutc of Technology, Zurich. In 1974 he joined the Stanford University faculty. In September 1980

he was promoted to Associate Professor in the Department of Electrical Engineering, Stanford University. His research interests are in the general field of signal processing, with emphasis on the structure of multivariable and multidimensional systems, estimation, identification, control and efficient computation algorithms and architectures for VLSI. He is intcr- ested in the application of computers to speech processing, large scale systems, image processing and reconstruction, and biomedical applica- tions.

Dr. Morf is a member of Sigma Xi, ACM. and SIAM.

engineer at Systems ( interests include adap tive control and the dr large scale systems.

Benjamin Friedlander (S’74-M76) was born on February 24, 1947. He received the B.Sc. and M.Sc. degrees in electrical engineering from the Technion, Israel Institute of Technology, in 1968 and 1972, respectively, and the Ph.D. degrec in electrical engineering and the M.S. degree in statistics from Stanford University. Stanford, CA, in 1976.

From 1968 to 1972 he served in the Israel Defense Force as an electronic engineer. Since September 1976, he has been a senior research

Zontrol, Inc., Palo Alto, CA. His current research ltive signal processing, system identification, adap- :velopment of estimation and control algorithms for

and Systems, Apr. 1981. Dr. Friedfander is a member of Sigma Xi,