analysis of lms-newton adaptive filtering algorithms with variable convergence factor

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. 3, MARCH 1995 611

Analysis of LMS-Newton Adaptive Filtering Algorithms with Variable Convergence Factor

Paulo S . R. Diniz, Senior Member, IEEE, Marcello L. R. de Campos, Student Member, IEEE, and Andreas Antoniou, Fellow, IEEE

Abstract- An analysis of two LMS-Newton adaptive filtering algorithms with variable convergence factor is presented. The relations of these algorithms with the conventional recursive least-squares algorithm are first addressed. Their performance in stationary and nonstationary environments is then studied and closed-form formulas for the excess mean-square error (MSE) are derived. The paper deals, in addition, with the effects of roundoff errors for the case of fixed-point arithmetic. Specifically, closed- form formulas for the excess MSE caused by quantization are obtained. The paper concludes with experimental results that demonstrate the validity of the analysis presented.

I. INTRODUCTION DAF'TIVE filters are used extensively in communications A applications for channel equalization, echo cancellation,

and system identification [l], [2]. The most widely used among the gradient-based adaptation algorithms is the least- mean-square (LMS) algorithm. Unfortunately, however, the convergence of this algorithm is dependent on the statistics of the input signal. The recursive least-squares (RLS) and the LMS-Newton (LMSN) algorithms are powerful altema- tives when the spread of the eigenvalues of the input-signal correlation matrix is large despite their higher computational complexity.

The appropriate choice of the convergence factor in the LMS and LMSN algorithms and the forgetting factor in the RLS algorithm are key to assuring good performance for the adaptive filter. These choices are environment dependent and optimal fixed values for these factors are difficult to determine especially in nonstationary environments. In order to address this problem, several algorithms have been proposed in the past, in which a variable convergence factor is utilized [3]-[ 141. The algorithm in [ 141 has some attractive properties, such as fast convergence even when the statistics of the input signal are unknown. This algorithm has been successfully applied to adaptive subband filtering, and an improvement in convergence speed over the fixed step-size LMSN algorithm has been observed.

Manuscript received December 12, 1992; revised May 25, 1994. This work was supported by CAPESMinistry of Education of Brazil, the Natural Sci- ences and Engineering Research Council of Canada, and Micronet, Networks of Centres of Excellence program. The associate editor coordinating the review of this paper and approving it for publication was Dr. Fuyun Ling.

S . R. Diniz is with the hog. de Engenharia Elktrica e Depto. de Eletrbnica, COPPE/EE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.

M. L. R. de Campos and A. Antoniou are with the Department of Electrical and Computer Engineering, University of Victoria, Victoria, Canada V8W 3P6.

IEEE Log Number 9408221.

This paper provides a detailed performance analysis of the LMSN algorithm proposed in [14]. In Section 11, the LMSN algorithm is reviewed and its relation with the IUS algorithm is clarified. Section 111 describes the use of a variable convergence factor in the LMSN algorithm and proposes a simplified algorithm. The algorithm in [14] as well as the simplified version proposed here are analyzed in stationary and nonstationary environments in Section IV and closed-form formulas for the mean-square error (MSE) are obtained. In Section V, a closed-form formula for the excess MSE caused by finite-wordlength implementation for the case of fixed-point arithmetic is derived. Section VI presents experimental results that demonstrate the accuracy of the analysis presented.

11. THE LMS-NEWTON ALGORITHM

The general LMSN algorithm is an approximate implementation of Newton's method for minimizing a function of several variables, which employs computationally efficient estimates for the autocorrelation matrix of the input signal and for the gradient vector of the objective function. The objective of the LMSN algorithm is to generate iteratively the filter coefficient vector that minimizes the MSE defined by

J (n ) E [ l e ( 4 l 2 1 (1)

where

(2) A

e ( n ) = d ( n ) - wH(n - l)u(n)

where E[.] expected value of [.] 4 n ) desired signal w(n - 1) 4.) input signal vector

coefficient vector obtained at iteration n - 1

and superscript H denotes the complex conjugate transpose. For such an algorithm, the coefficient vector is recursively updated in the direction given by the negative of the estimated gradient pre-multiplied by the inverse of the estimated input- signal autocorrelation matrix as [ 11

w(n) = w(n - I) + 2pR-'(n)u(n)e*(n) (3)

where p is a convergence factor, e * ( n ) is the complexAconju- gate of the a priori output error defined in (2), and R(n) is an estimate of the autocorrelation matrix of the input-signal vector, namely, R = E[u(n)uH(n)]. R will be referred ,to as the input-signal autocorrelation matrix hereafter. Matrix R(n)

A

1053-587X/95$04.00 0 1995 IEEE

618 IEEE TRANSAmIONS ON SIGNAL PROCESSING, VOL. 43, NO. 3, MARCH 1995

can be calculated using the Robbin-Monro procedure [15] which solves the equation

(4) E[u(n>uH(n) - RI = 0.

R(n) = R ( n - 1) + CY(n>[u(n)uH(n) - R(n - l ) ]

The solution of (4) is given by

( 5 )

where a(n) is also referred to as a convergence factor. This procedure assures a good approximation for R , which is positive definite, provided that the input signal is, at least, weakly persistently exciting of order M, i.e., its spectrum is nonzero at M or more points in the frequency -w,/2 to w,/2, where w, is the sampling rate [16]. M i s the number of coefficients. In addition, the inverse matrix R-'(n) can be easily calculated using the matrix inversion lemma [ 171 which yields

R-l(n) = - R-l(n - 1)

}. (6) - R-'(n - l)u(n)uH(n)R-l(n - 1) * + uH(n>R-l(n - l )u(n)

If a(n) = a for all n such that

CY = 1 - x = 2p (7)

it can be shown that the LMSN algorithm minimizes a weighted sum of a posteriori output errors, E(n), defined by

n

E(n ) 2 p - - ( l d ( i ) - wH(n)u(2)l2. (8) i=l

Note that this is the objective function of the RLS algorithm [17]. Therefore, the LMSN algorithm can be regarded either as a stochastic algorithm that uses noisy estimates for the input autocorrelation matrix and the gradient vector or as a deterministic least-squares algorithm when (7) is satisfied. In this case, (3) and (6) represent the solution of the deterministic counterpart of the normal equation [17] given by

R(n)w(n) = p(n) (9)

where R ( n ) and p(n) are, respectively, time-average approximations of the input autocomelation matrix and the crosscor- relation vector between the input-signal vector and the desired signal, d ( n ) , defined by p 2 E[u(n)d*(n)].

The coefficient vector at instant n is an estimate of the Wiener solution of the normal equation, WO = R-lp. Even in the presence of additive noise, this estimate is unbiased since its mean value tends to the optimal solution. However, an excess MSE, denoted as Jex(n),-, is present at the output of the system after the mean of the coefficients has converged. Assuming that the estimate of the input autocorrelation matrix is accurate', we obtain [17], [18]

'This assumption was found to work well in most simulated situations. However, a more accurate formula can be derived for nearly uncorrelated input signals using information based on fourth-order statistics [18].

where M is the total number of coefficients in the adaptive filter and

is the minimum MSE. In a nonstationary environment, the overall excess MSE has

an additional component due to time variations in the statistics of the signals involved. In order to evaluate the excess MSE due to lag, denoted by Jex(n)l, the desired signal is modeled as a first- or second-order Markov process wi$ a time constant greater than that of the algorithm [18]-[21]. For a first-order Markov model, the excess MSE due to lag is given by [ 181

where u: is the variance of the zero-mean white noise in the model for the variable coefficients and tr[-] denotes the trace of matrix [.I. For a stationary zero-mean input-signal vector defined as

A u(n) = [u(n) u(n - 1) . . . U(. - M + l)]' (13)

tr[R] is equal to Mu: and the total excess MSE is given by

where u: is the variance of the input signal. Note that the above analysis is based on the mean-square value of the a priori error defined in (1) that is calculated using the a priori coefficient vector w(n - 1). However, at each iteration, a new and more accurate estimate of the coefficient vector is available that can be used to generate a better estimate of the reference signal at the filter output, if enough processing time is provided.

111. VARIABLE CONVERGENCE-FACTOR LMSN ALGORITHMS

A number of problems are associated with the use of fixed convergence factors, as follows:

1) A good choice of convergence factor that results in a high speed of convergence as well as low output MSE is difficult to make and depends on a good knowledge of the environment characteristics.

2) In nonstationary environments, the choice of convergence factor is difficult since Jex(n)l can be high if the tracking performance is poor or Jex(n), can be high if fast convergence is attempted.

3) Algorithms with fixed convergence factors suffer from high sensitivity with respect to their parameters.

4) In certain applications different convergence factors are optimal under different circumstances; for example, in subband filtering a different convergence factor is required for each subband.

Many of the above problems can be eliminated through the use of a variable convergence factor that is adjusted in every iteration according to a certain optimality criterion.

DINIZ et al.: ANALYSIS OF LMS-NEWTON ADAITIVE FILTERING ALGORITHMS WITH VARIABLE CONVERGENCE FACTOR 619

A. Algorithm I The use of a variable convergence factor in the LMSN

algorithm as proposed in [14] will now be examined. The updating formula in (3) can be modified to incorporate a time-varying convergence factor

P ( n ) = ba(n) (15)

where b is a constant and a(n) is the convergence factor used to update R-'(n). The choice of ~ ( n ) must take into account all the available information. One such choice that leads to good results can be deduced by minimizing the square of the a posteriori instantaneous output error defined as

After some manipulation, the solution required is obtained as

1 a(.) =

1 + (2b - 1)T(n)

where

Since matrix R-'(n- 1) is positive definite, at least in the case where infinite-precision arithmetic were to be used, and ~ ( n ) is a quadratic function of the input-signal vector, the variable convergence factor given by (17) is positive and less than one provided that parameter b is chosen to be greater than 0.5.

Note that if b # 0.5 in (15), the algorithm does not minimize (8) and, therefore, does not solve the deterministic counterpart of the normal equation, described by (9).

It can be shown from (17) that

1 - a(.) -- - (2b - l )uH(n)R-l(n - l )u(n) . 4.1

Therefore, the equations describing Algorithm I can be deduced as follows:

t(n) 2 R-l (n - l )u(n)

T(n) 4 UH(n)t(n)

1 -k (2b - 1)T(n) (2b - 1)T(n)

R-l(n) = [a-1(, - 1) -

B . Memory Interpretations We will now examine the mode by which variable con-

vergence factors in genefal, and the one examined here in particular, affect matrix R(n) . When the convergence factor used in the evaluation of matrix R(n) is constant in all iterations, the estimate of the input autocorrelation matrix is an exponentially-weighted time average of the outer product u(i)uH(i). However, for a variable convergence factor a(n) , matrix Rln) remesents a different weighted average. as in (5 ) .

In this case, the relation between a(.) and the weights of the past samples, X(n), can be established as

(24) ( 2 b - 1)T(n)

X(n) = 1 - a(.) = 1 + (2b - 1)7(n)

and (5) can be rewritten as

n n n

R(n) = n X ( i ) R ( O ) + J-J X ( j ) [ l - X(i)]U(i)UH(i). i=l i=l j=i+l

(25) Note that X(n) is always positive and less than unity. The

above equation also allows a rough analysis of the influe?ce of constant b in the memory of the algorithm fh,at estimates R(n). For example, b --+ 0.5 results in ~ ~ ( n > R - ~ ( n ) u ( n ) 1. When b is large, the algorithm has long memory and R(n) tends to become invariant for large n and stationary input signals. On the other hand, as b -+ 0.5, we have a(.) --$ 1 for all n. Therefore, the weights X(n) are all equal t! zero. In this case, the algorithm has no memory to evaluate R(n) and also there is perfect correspondence between the LMSN and RLS algorithms. In fact, the algorithm attempts to minimize a weighted sum of past a posteriori errors according to the least- squares principle and also the instantaneous a posteriori error due to the variable convergence factor. The result is a solution that does not take into account the past a posteriori errors and, as a consequence, matrix R(n) carries only the information contained in the last input-signal vector, U(.). The time- dependent characteGstic of the forgetting factor X(n) used in the updating of R-'(n) introduces large variations in this matrix which can be undesirable in cases where matrix R has a high eigenvalue spread since R-l (n) becomes ill-conditioned.

C . Algorithm I1 An altemative algorithm to the one proposed in [14] can

be derived by noting that the development of the variable convergence factor based on the minimization of (16) does not'necessarily require (15) to be satisfied. Consequently, a variable convergence factor p ( n ) can be chosen to minimize the instantaneous a posteriori error while a fixed a(n), i.e., g(n) = CY, can be used in the recursion that determines R-l(n) . The resulting convergence factor is given by

fnd yields zero a posteriori error regardless of how matrix R-'(n) is estimated. Note that this is also the case for the convergence factors proposed in [7] for the LMS algorithm and in [14] for the LMSN algorithm for any value of b.

The updating equations for Algorithm I1 comprise (20)-(22) together with the following two equations:

620 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. 3, MARCH 1995

D. Comparison of Algorithms I and II Algorithms I and I1 are, in fact, different versions of the

orthogonalized projection algorithm [ 161, [22] which employ different methods for the estimation of matrix R-l. Algo!thm I1 uses an exponentially weighted average to compute R(n), while Algorithm I uses variable weights. The advantages of the one over the other depend heavily on the application. Algorithm I1 provides better control over matrix R(n) since a constant convergence factor is used, while Algorithm I is more attractive when little knowledge about the input signal is available. There is only a single free parameter in each algorithm, 13 for Algorithm I and a fox' Algorithm 11. These parameters do not influence directly the speed of convergence and the excess MSE. Their main role is to control fhe amount of memory that will be used in the evaluation of R-'(n) or, in other words, how accurately R-l(n) approximates R-l. Algorithm I presents an extra cost in terms of computational complexity of 2 multiplications and 1 division when compared to the conventional RLS algorithm, whereas Algorithm I1 has the same complexity.

In many applications, it may be desirable to control the misadjustment at the expense of convergence speed. This can be achieved in both algorithms by introducing a reduction factor q in the coefficient updating equation, as will become clear after the analysis to be carried out in the next section. This parameter can be used to perform fine tuning of the algorithms. Equation (22) becomes

where q is the reduction factor and t (n) and r(n) are computed as before.

Iv. ANALYSIS OF ALGORITHMS 1 AND 11

In this section, expressions for the MSE in Algorithms I and I1 are derived for the cases of stationary and nonstationary environments.

Owing to the presence of the normalization factor, ~ ( n ) , the computation of E [ l / ~ ( n > ] becomes necessary during the analysis. The approximation

will be made, which, for ergodic processes, corresponds to interchanging the harmonic and arithmetic averages of an infinitely long sequence { ~ ( n ) } , n 2 0. For white Gaussian noise signals and M > 20, the approximation introduces an error of less than lo%, which is reduced further as M is increased above 20. However, for many practical input signals this approximation may not be valid. Similar approximations were used by Bottomley and Alexander [23], and Samson and Reddy [24] for slowly varying denominators.

A. Excess MSE in Stationary Environment

On assuming that R(n - 1) and w(n - 1) are independent of U(.) [2] [19], the mean value of the coefficient vector, for

both algorithms, is obtained from (29) as

The speed of convergence of the algorithm can be roughly measured using (31). For an initially relaxed system, we can expect convergence of all coefficients to within 10% of the relative error between E[w(n)] and WO after n' iterations where

(32) 1

12' = - 1% (1 - $1

For the LMSN algorithm with fixed convergence factor

(33)

iterations are required.

we have

A Defining ~ ( n ) = w(n) - W O as the coefficient error vector,

v(n - 1) 1 R-l(n - l)u(n)uH(n) uH(n)R-l(n - l)u(n)

(34)

where eg(n) = d*(n ) - uH(n)wo is considered a zero-mean white sequence and I is the identity matrix. The error in the coefficients ~ ( n ) results in an excess MSE that can be expressed as [ l ]

Jex(n + l)p = E[vH(n)Rv(n)] = tr[RK(n)] (35)

where K(n) is the covariance matrix of v(n). From (33, the difference equation describing the excess

MSE is obtained as shown in (36), which appears at the top of the next page. Evaluating each term in the above equation separately and assuming that the model R ( n - 1) = R + AR(n- 1) holds [ 171, it can be easily shown by interchanging the trace and expected value operators the result in (37), which appears at the bottom of the next page, is obtained where the second-order error terms in RR-l(n - 1) were discarded, independence between v(n - 1) and u(n) was assumed [2] [ 191, and for large M, T( n) was considered independent from each element of U( n)uH (n) taken separately. The fourth term can be similarly simplified to yield

] } Jmin R-l(n - 1)u(n)uH(n)R-.-l(n - 1)

[u*(n)R-l(n - l)u(n)I2 q2 tr{ RE [

Jmin . 1 1 uH(n)R-l(n - l)u(n)

= q2E

DINIZ er al.: ANALYSIS OF LMS-NEWTON ADAPTIVE FILTERING ALGORITHMS WITH VARIABLE CONVERGENCE FACTOR 62 I

I) R-l(n - l)u(n)uH(n)v(n-l)vH(n-l) uH(n)R-l(n- l)u(n)

I} V(n-l)VH(n-l)U(n)UH(n)ft- l (n - 1) UH (n)R-1 (n - l)u(n)

-qtr {RE[

I ) R-~(n-l)u(n)u~(n)v(n-l)v~(n-l)u(n)u~(n)R(n - 1)

[u~(n)R- l (n - 1) U(7L)l2 +q2tr {RE [

]}Jmin, R-l(n - l)u(n)uH(n)R-l(n - 1)

[u.(n)R-l(n - 1)u(n)]2 +q2 tr{ RE [

Using (37) and (38), we can rewrite (36) as error vector is of the form

Jmin. 1

uH(n)R-l(n - l)u(n)

(39)

After convergence, we obtain

v(n - 1) 1 R-yn - l)u(n)uH(n> uH(n)R-l(n - l)u(n)

where the second term represents an excess MSE due to gradient noise (see (34)) and is also present in stationary- environment operation. The excess MSE due to lag [ 181 can, therefore, be estimated as

This solution is applicable to both Algorithms I and 11 since it is independent of the method used to estimate the input-signal autocorrelation matrix. In fact, it is applicable to orthog-

assumptions made in the above analysis are valid.

where K'(n - l ) is the covariance matrix Of vector given

-

v'(n - 1) - v(n+ 1).

(44) 1 onalized projection algorithms in general, as long as the v/(n) = R-'(n - l)u(n)uH(n)

Ll*(n)R-'(n - l)u(n)

An analysis similar to that carried out previously for the B. Excess MSE in Nonstationary Environment excess MSE in stationary environments reveals that

In order to study the behavior of Algorithms I and I1 in a

(45) nonstationary environment, the first-order Markov model aZa:M2

J e x ( n ) l = ~ 29 - 92 . wo(n + 1) = wo(n) + v(n + 1) (41)

is assumed for the optimal coefficient vector evolution [18], where the elements of v(n + 1) are zero-mean white Gaussian

Therefore, the total excess MSE is given by

(46) q J m i n a:u;M2

Jex(n) = - + ~

2 - 9 2 9 - 9 2 . noise samples with variance U;. In this case, the coefficient

I) y2 tr{ RE [ R-l(n - l)u(n)uH(n)v(n - l)vH(n - l)u(n)uH(n)R-l(n - 1)

11 " g 2 E ( tr[ v(n - l)vH(n - 1)u(n)uH(n)R-'(n - l)u(n)uH(n)

[uH(n)R-l(n - l)u(n)l2

[uH(n)R-l(n - l)u(n)]2

] } M g2 tr{ E [ U(n)UH(n)

uH(n)R-l(n - l)u(n) (37)

u(n)uH(n)v(n - l)vH(n - 1)

[uH(n)R-l(n - l)u(n)] M q2E{tr[

622 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. 3, MARCH 1995

C . Minimization of MSE The excess MSE has a term proportional to the parameter

q and a term that is inversely proportional to it. Therefore, an optimal value, qo, that minimizes the total MSE can be deduced as

(47) a M d m - a2M2

2 40 =

where a2 = ~:n;/J,,,i,,. Applying the same procedure to the conventional LMSN

algorithm or the IUS algorithm with 2p = 1 - X yields [ 181 a

2 + a ' Po = -

In practice, both algorithms must present similar performance if q/(2M) (the average value for the variable convergence factor) in the normalized algorithm is made equal to the value of p used in the conventional algorithm. It should be noted that the optimal value of p. given by (48), may sometimes be high. In such cases, the approximations used to obtain (48) do not hold, and the true value of the excess MSE would be much larger than expected.

(54)

w(n)Q = w(n - l)Q + s(n)Q. (57)

The constant N is a power-of-2 scaling factor that must be introduced to ensure that the values of T ' ( ~ ) Q lie within the range of the other variables of the algorithm. It can be verified from (21) that the mean value of ~ ( n ) is M.

Matrix R- ' (~)Q is updated either as

for Algorithm I [14] or as

V. FINITE-PRECISION EFFECTS k-'(n)Q = [' (k-'(n - 1)s 1--(Y

In this section, the effect of roundoff errors is examined for the case of fixed-point arithmetic using an approach similar to that in [23]. The following assumptions are made:

Numbers are represented using two's-complement for- - mat. Additions and subtractions do not introduce quantization errors. Each internal operation is performed with enough bits in the integer part such that no overflow occurs. Quantized versions of 'all external signals and constants are available. Multiplication and division operations introduce white noise given by

(494 A &(ab) = (ab)Q - ab

and

respectively.

L \

for Algorithm 11. Since e(.) = e(n)Q - qe(n), it follows that

e(n) = d ( n ) - wH(n - l)u(n)

= d ( n ) - [WH(n - 1)Qu(n)]Q - '%(n)- (60)

Defining E' (n) as the error introduced when evaluating the inner product [wH(n - l)Qu(n)], according to (@a), namely,

&1(n) e &[WH(n - 1)Qu(n)] = [WH(n - 1)Qu(n)]Q - WH(n - l)Qu(n) (61)

we have

d ( n ) - wH(n - l)u(n) = d ( n ) - wH(n - l ) ~ u ( n ) - qe(n) - E I ( ~ ) (62)

The quantization error in each variable is defined as the difference between its quantized and nominal values, i.e., which yields

qe(n> = -v,H(n - l)u(n) - &1(n)- (63) (50)

A qa = aQ - a. An analogous procedure leads to similar equations for the

A. Quantization Errors errors in t(n) and ~ ' ( n ) , i.e., Algorithms I and I1 in finite precision are as follows:

qt(n) = 77dn - l)u(n) + &2(7L)

where (53)

DIN12 er ai.: ANALYSIS OF LMS-NEWTON ADAFTIVE FILTERING ALGORITHMS WITH VARIABLE CONVERGENCE FACTOR 623

and for Algorithm 11, where

[t (n)Qt (n>$Q after the additions, then ~ l ( n ) , ~ 3 ( n ) , and the elements in ~ 2 ( n ) can be modeled as zero-mean white Gaussian noise sources with variance equal to 2-2b/12 [25]. X ( A R-’(n- 1 ) ~ - { [ 2 b T ( n ) Q I Q }3]

(77) The errors in g’(n), r(n) , and s(n) can be modeled as r

and

where

and

L J

From (57) the quantization error in the coefficient vector

(74)

can be expressed in terms of the recursive relation

vw(.) = vw(n - 1) + vs(n). The quantization errors in the updating of matrix R-l(n),

after neglecting all second-order errors, can be represented by

- &s(n) R-l(n) + E 7 ( n ) (75) 1 for Algorithm I and

By considering the effects of quantization errors on the error signal, the excess MSE due to finite-precision arithmetic can be calculated as

Jex(n)* E[1ve(n)l21 = E{ tr[u(n)uH(n)vw(n - 1)vz(n - I)]} + ozl

= tr[RK”(n - l)] + (T:, (85 )

where ve(n) is defined in (63) and

(86) A K”(n) = E[vw(n - l ) v z (n - l)].

In (85) , it was assumed that &1(n) is a zero-mean white Gaussian noise with variance equal to azl.

The difference equation that gives vw(n) can be expressed as a function of instantaneous quantization errors, infinite- precision quantities, and v ~ ( n - 1 ) as

624 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43. NO. 3, MARCH 1995

Considering that the error sources in (87) are uncorrelated to each other and that the output error after convergence, e ( n ) , can be modeled as a zero-mean white noise, (85)-(87) give

+ q 2 J ( n ) tr RE - ( {..tn, 1 R-l(n - l)u(n)uH(n>

.(n> x [I-

I>> u(n)uH(n)R-l(n - 1)

.(.) x [I-

x tr{RE[R-'(n - l)u(n)

x uH(n)R-l(n - l)]}

The above equatiFn can be solved by neglecting the second- order terms in RR-'(n - 1) and assuming independence between ~ ( n ) and each element of u(n)uH(n) taken separately. If we also assume that

g<1 (89)

and use the approximation in (30), then

Jex ( n ) p = M20z6 a: + ( M + q2)uzl +q2M2az4 / N 2 + q2M2az5 a : / N 2

4 2 - q ) (90)

If, in addition, = = a:, then

Unfortunately, for both methods of %valuation of the .quantized version of R-'(n), namely, R - l ( n ) ~ , either with variable or constant convergence factor, positive definiteness is not guaranteed, as can be shown by using an analysis similar to that in [23]. This may result in divergence of the coefficients and a stabilization scheme must be incorporated, for example, the one proposed in [23]. This strategy is applicable since VR has no significant influence on Jex(n), , as can be seen in (90). Another solution for preventing divergence of the algorithm relies on an implementation based on the QR decomposition. This technique was successfully applied in [26] to the variable-convergence-factor LMSN algorithm discussed here, and found to yield very good results in finite precision.

-"a.E 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 h

Fig. 1. Modeling a 22nd-order nonstationary plant (U: = 0 dB, Jmin = -80 dB, and U: = -60 dB). RLS algorithm: - Best performance of the LMSN algorithm: - - - worst performance of the LSMN algorithm: .....

VI. SIMULATION RESULTS

In order to verify the usefulness of the algorithms examined and the accuracy of the formulas presented in the preceding sections, some simulations have been performed. In all of them the adaptive filter was used to identify an unknown system modeled as a moving average process, with additional noise present. Both the additive noise and the input signal to the unknown system were pseudo-random sequences with normal distribution and zero mean.

The first and second experiments compare the mean-squared error of the variable-convergence-factor LMSN and the RLS algorithms. The additive noise variance in the first experiment was -80 dB and the input-signal variance was 0 dB. Fig. 1 shows the ensemble average of the MSE obtained in 100 simulations. Different values of the forgetting factor are used for the RLS algorithm. The unknown system was a 22nd-order nonstationary system with coefficients modeled as a first-order Markov process with a relaxation time constant of 1.0. The variance of the white noise used as input sequence to the first- order Markov process was -60 dB. The horizontal dashed and dotted lines represent the best and the worst performances obtained, respectively, with Algorithm I for q = 1 and b ranging from 0.6 to 100.0. Fig. 2 shows the results obtained when a 64th-order nonstationary system was to be identified. The additive noise and the input-signal noise variance were -30 dB. The variance of the sequence used as input for the first-order Markov process was - 10 dB. It is easy to note from these examples how badly the conventional RLS algorithm performs when the value of the forgetting factor is not close to its optimal value, and how difficult it is to find an optimal value when the characteristics of the environment are unknown. On the other hand, when a variable convergence factor was used without any tuning, very good results were obtained especially for the high-order filter.

The results of other experiments are presented in tables, where the theoretical values are compared with those obtained

D I N E er al.: ANALYSIS OF LMS-NEWTON ADAPTIVE FILTERING ALGORITHMS WITH VARIABLE CONVERGENCE FACTOR

-6

625

-

TABLE I J,, IN NONSTATIONARY ENVIRONMENT IN DECIBELS

Q Misadjustment Lag Total

Eq. (40) Simulations Eq. (45) Simulations Eq. (46) Simulations

0.5 -34.8 -34.4 -27.6 -28.1 -26.9 -27.2 0.6 -33.7 -33.4 -28.1 -28.7 -27.1 -27.4 0.7 -32.7 -32.3 -28.5 -29.5 -27.1 -27.6 0.8 -31.8 -31.4 -28.7 -29.2 -27.0 -27.2 0.9 -30.9 -30.6 -28.8 -29.8 -26.7 -27.1 1.0 -30.0 -29.7 -28.9 -29.8 -26.4 -26.8 1.1 -29.1 -28.8 -28.8 -30.0 -26.0 -26.4

6 t

41

1

I I

-2 O&

by averaging the results of an ensemble consisting of 100 experiments. The performance parameter shown is the excess MSE, i.e., the difference between the MSE and the variance of the additive noise.

In the third experiment, an adaptive filter based on Al- gorithm I1 was used to identify a 35th-order nonstationary system with a relaxation time constant of 0.999. The system's time-varying coefficients were modeled as a first-order Markov process and a white-noise sequence of variance -30 dB was used as input. The variance of the input signal and additional noise were -30 dB. The measured MSE is compared with that predicted by the derived formulas for several values of parameter q. Paramenter cr was chosen such that time convergence of mamx R-'(n) could be achieved. Table I illustrates the results obtained.

Table I1 shows the results obtained for a fixed-point implementation for different values of wordlength and q = 1. The unknown system in this experiment was time invariant of order 20. As can be noted, the simulation results agree with the theoretical results as long as a sufficient wordlength is provided to prevent the coefficients from freezing. Table 111

TABLE I1 .Jex DUE TO FINITE WORDLENGTH FOR q = I IN DECIBELS

Number of Bits Eq. (91) Simulations

6 -20.1 -21.9 7 -26.2 -27.1 8 -32.2 -32.8 9 -38.2 -38.6 12 -56.3 4 6 . 5

TABLE Ill I, , DUE TO FINITE WORDLEVGTH FOR 9 B m I\ DECIBELS

Q

0.5 0.6 0.7 0.8 0.9 1 .o 1.1 1.2 1.3 1.4 1.5

Eq. (91)

-37.0 -37.5 -37.9 -38.1 -38.2 -38.2 -38.1 -38.0 -37.7 -37.3 -36.8

Simulations

- 3 i 4 -38 4 -38 6 -38 7 -38 3 -38 U -38 h

-3b i -38 3 -3s U -37 4

shows the results obtained for different values of parameter q with a constant wordlength of 9 bits. In order to avoid overflow and to simplify the intemal scaling. the input-signal variance was made 0 dB and 3 bits were used to represent the integer part in intemal registers in all the finite precision simulations. The variance of the additional noise used in the simulations shown in Tables I1 and 111 is -20 dB. Using this scheme, only the register storing ~ ( n ) had to be treated separately; its content was shifted 3 bits to the right, i.e., -V = 8. Tables I and 111 also show very little sensitivity of the algorithms with respect to parameter q.

Matrix R-'( n ) ~ was evaluated in double-precision floating-point arithmetic and then quantized. This approach prevents divergence. We implicitly assumed that some strategy will be used in practice to guarantee the stability of R - l ( n ) ~ . as discussed in the previous section.

626 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. 3, h4AfCH 1995

VII. CONCLUSION Two LMSN adaptation algorithms that incorporate a vari-

able convergence factor have been analyzed. The performance of the algorithms in stationary and nonstationary environments has been investigated and the relations between the LMSN and RLS algorithms have been established. The analysis indicates that the two algorithms can perform in most situations as well as the RLS algorithm without requiring as much knowledge of the signal statistics. The algorithms proved themselves especially suited for applications in nonstationary environments where algorithms with fixed convergence factors can very easily fail. Several simulations in different applications support these conclusions.

A roundoff-error analysis of the algorithms for the case of two’s-complement fixed-point implementation has then been undertaken. In this analysis, the interaction among different roundoff errors and their accumulation have been considered. A closed-form formula for the excess MSE due to quantization has been derived which shows that the roundoff errors in the evaluation of the input autocorrelation matrix do not have significant influence on the excess MSE.

The results obtained are in terms of closed-form formulas, which yield mean-square-error estimates that agree well with those obtained by simulation.

REFERENCES

B. Widrow and S . D. Steams, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985. J. E. Mazo, “On the independence theory of equalizer convergence,” Bell Syst. Tech. J. , vol. 58, pp. 962-993, May-June 1979. T. R. Fortescue, L. S. Kershenbaum, and B. E. Ydstie, “Implementation of self-tuning regulators with variable forgetting factors,” Automutica, vol. 17, pp. 831-835, 1981. B. Toplis and S. Pasupathy, “Tracking improvements in fast RLS algorithms using a variable forgetting factor,” IEEE Trans. Acoust., Speech Signal Processing, vol. 36, pp. 206-227, Feb. 1988. J. B. Evans and B. Liu, “Variable step size methods for the LMS adaptive algorithm,” in Proc. In?. Symp. Circ. Syst., Philadelphia, PA, 1987, pp. 422425. J. B. Evans, “A new variable step size method suitable for efficient VLSI implementation,” in Proc. Int. Conf. Acoustics, Speech Signal Processing, Toronto, ON, Canada, 1991, pp. 2105-2108. F. F. Yassa, “Optimality in the choice of the convergence factor for gradient-based adaptive algorithms,” IEEE Trans. Acoust., Speech Signal Processing, vol. ASSP-35, pp. 48-59, Jan. 1987. T. J . Shan and T. Kailath, “Adaptive algorithms with an automatic gain control feature,”IEEE Trans. Circ. Syst., vol. 35, pp. 122-127, Jan. 1988. S . Kami and G. Zeng, “A new convergence factor for adaptive filters,” IEEE Trans. Circ. Syst., vol. 36, pp. 1011-1012, July 1989. S. Kami and G. Zeng, “Comments on ‘Adaptive algorithms with an automatic gain control feature,”’ IEEE Trans. Circ. Syst., vol. 37, pp. 974-975, July 1990. T. J. Shan and T. Kailath, “Comments on “Comments on ‘Adaptive algorithms with an automatic gain control feature,”’ IEEE Trans. Circ. Syst., vol. 37, p. 975, July 1990. V. J. Mathews and Z. Xie, “Stochastic gradient adaptive filters with gradient adaptive step sizes,” in Proc. Int. Conf. Acoust., Speech Signal Processing, Albuquerque, N M , 1990, pp. 1385-1388. P. S . R. Diniz and L. W. P. Biscainho, “Optimal convergence factor for the LMSlNewton algorithm,” in Proc. Int. COI$ Acousr., Speech Signul Processing, Toronto, Canada, 1991. -, “Optimal variable step size for the LMS/Newton algorithm with application to subband adaptive filtering,” IEEE Trans. Signal Processing, vol. 40, pp. 2825-2829, Nov. 1992. L. Ljung and T. Siiderstriim, Theory and Practice of Recursive Identifi- cation. London: MIT Press, 1983. G. C. Goodwin and S . K. Sin, Adaptive Filtering Prediction and Control.. Englewood Cliffs, NJ: Prentice-Hall, 1984.

[17] S . Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice- Hall, 1992, 2nd. ed.

[18] E. Eleftheriou and D. D. Falconer, “Tracking properties and steady-state performance of RLS adaptive filter algorithms,” IEEE Trans. Acoust., Speech Signal Processing, vol. ASSP-34, pp. 1097-1110, Oct. 1986.

[19] B. Widrow, J. M McCool, M. G. Larimore, and C. R. Johnson Jr., “Sta- tionary and nonstationary learning characteristics of the LMS adaptive filter,” Proc. IEEE, vol. 64, pp. 1151-1162, Aug. 1976.

[20] B. Widrow and E. Walach, “On the statistical efficiency of the LMS algorithm with nonstationary inputs,” IEEE Trans. Inform. Theory, vol.

[21] S . McLaughlin, B. Mulgrew, and C. F. N. Cowan, “Performance bounds for exponentially windowed R J S algorithm in a nonstationary environment,” in Proc. IMA Conf. Math. Signal Processing, Warwick, England, Dec. 1988, pp. 449-464.

[22] D. F. Marshall and W. K. Jenkins, “A fast quasi-Newton adaptive filtering algorithm,” IEEE Trans. Signal Processing, vol. 40, pp. 1652-1662, July 1992.

[23] G. E. Bottomley and S . T. Alexander, “A novel approach for stabilizing recursive least squares filters,” IEEE Trans. Signal Processing, vol. 39, pp. 1770-1779, Aug. 1991.

[24] C. G. Samson and V. U. Reddy, “Fixed point error analysis of the normalized ladder algorithm,” IEEE Trans. Acoust., Speech Signal Pro- cessing, vol. ASSP-31, pp. 1177-1 191, Oct. 1983.

[25] A. Antoniou, Digital Filters: Analysis, Design, and Applications. New York: McGraw-Hill, 1993, 2nd ed.

[26] M. L. R. de Campos, M. G. Siqueira, A. Antoniou, and A. N. Willson Jr., “A QR-decomposition LMS-Newton adaptive filtering algorithm with variable convergence factor,” in Proc. IEEE Pacific Rim Conf. Commun. Comput. Signal Processing, Victoria, Canada, 1993, pp. 350-353.

IT-30, pp. 211-221, Ma. 1984.

Paul0 S. R. Diniz (S’80-M’844M92) received the B.Sc. degree from the Universidade Federal do Rio de Janeiro (UFRJ) in 1979, the M.Sc. degree from COPPE/UFRJ in 1981, and the Ph.D. degree from Concordia University, Montreal, PQ, Canada, in 1984, all in electrical engineering.

Since 1979 he has been with the Department of Electronic Engineering, UFRJ. He has also been with the Program of Electrical Engineering, COPPWFRJ, since 1984, where he is presently Professor and Chairman. From Jan. 1991 to July

1992 he was a visiting Research Associate in the Department of Electrical and Computer Engineering of University of Victoria, Victoria, BC, Canada. His teaching and research interests are in analog and digital signal processing, electronic circuits and adaptive signal processing.

Marcello L. R. de Campos (S’89) received the B.Sc. degree (cum laude) from Universidade Fed- eral do Rio de Janeiro (UFRJ) in 1990 and the M.Sc. degree from COPPERTFRJ in 1991, both in electrical engineering. Since 1992 he has been working toward the Ph.D. degree at the University of Victoria, Canada.

His research interests are in analog and digital filtering.

DINE ef al.: ANALYSIS OF LMS-NEWTON ADAPTIVE FILTERING ALGORITHMS WITH VARIABLE CONVERGENCE FACTOR

Andreas Antoniou (F‘82) received the B.Sc. (Eng.) and Ph.D. degrees in electrical engineering from London University in 1963 and 1966, respectively.

From 1966 to 1969, he was Senior Scientific Offi- cer at the Post Office Research Department, London, and from 1969 to 1970, he was a member of the Scientific Staff at the R&D Laboratories of Northern Electric Company Ltd., Ottawa, Canada. From 1970 to 1983, he served in the Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada, as Professor from June 1973 and

as Chairman from December 1977. He served as founding Chairman of the Department of Electrical and Computer Engineering, University of Victoria, Victoria, Canada, from July 1, 1983 to June 30, 1990 and is now Professor in the same department. His teaching and research interests are in the areas of electronics, network synthesis, digital system design, active and digital filters, and digital signal processing. He published extensively in these areas. He is the author of Digital Filters: Analysis, Design, and Applications (New York: McGraw-Hill, 1993) and the co-author with W . 4 . Lu of Two-Dimensional Digital Filters (New York: Marcel Dekker, 1992). Dr. Antoniou is a member of the Association of Professional Engineers and

Geoscientists of BC, and Fellow of the Institution of Electrical Engineers. He was elected Fellow of the Institute of Electrical and Electronics Engineers for contributions to active and digital filters, and to electrical engineering education. He served as Associate Editor, IEEE TRANSACTIONS ON Crrccurrs AND SYSTEMS from June 1983 to May 1985 and as Editor from June 1985 to May 1987. One of his papers on gyrator circuits was awarded the Ambrose Fleming Premium by the Institution of Electrical Engineers, UK.

627

analysis of lms-newton adaptive filtering algorithms with variable convergence factor

Documents