topic 8-mean square estimation-wiener and kalman filtering

8/10/2019 Topic 8-Mean Square Estimation-Wiener and Kalman Filtering

1/73

Topic 8-Mean Square Estimation: Wiener and Kalman Filtering

Papoulis Chapter 13 2 weeks

Minimum Mean-Square Estimation (MMSE)

Optimum Linear MSE estimation ---the orthogonality principle

Wiener Filtering

Kalman Filtering

Adaptive Filtering (not in Papoulis)

1


2/73


3/73

SupposeY is a RV with known PMF or PDF

Problem: Predict (or estimate) what value of Ywill be

observed on the next trial

Questions:

What value should we predict?

What is a good prediction? We need to specify some criterion that determines what is a

good/reasonable estimate.

Note that for continuous random variables, it doesnt make

sense to predict the exact value of Y, since that occurs with

zero probability.

A common estimate is the mean-square estimate.

Review: Parameter Estimation (Predicting the value of Y)[Review from Topic 4]

3


4/73

We will let "be the mean square estimate of the observable random

variable, YwithE[Y] = The mean-squared error (MSE) is defined as:

e=E[(Y ")2] We proceed by completing the square

E[(Y ")2] =E[(Y

+ ")2]

=E[(Y )2+ 2(Y )( ") + ( ")2]= var(Y) + 2( ")E[Y ]+ ( ")2= var(Y) + ( ")2 > var(Y) if "#

Clearly the MSE minimized when

"= "= is called the minimum- (or least-) mean-square error (MMSE

or LMSE) estimate

The minimum mean-square error is var(Y).

The Mean Square Error (MSE) Estimate

(8-1)

(8-2)

(8-3)

4


5/73

LetXand Ydenote random variables with known joint distribution

Suppose that we observe the value ofX (that is,Xis the observed

signal/data).How can we find the MMSE estimateof Y, denoted by

"that is a function of the observed data,X? Can the MMSE estimate, ", which is a function ofX, do better than

ignoringXand estimating the value of Yas "= Y =E[Y]? Yes! Denoting the MMSE estimate "byc(X), the MSE is given by

Note that the above integrals are positive, so that ewill be minimized

if the inner integral is a minimum for all values ofx.

Note that for a fixed value ofX, c(x) is a variable [not a function].

The MSE of a RV Based Upon Observing another RV

e =EXY{[Y! c(X)]2}= [y! c(x)]2 fX,Y

!"

"

#!"

"

# (x,y)dxdy

= fX(x) [y! c(x)]2!"

"

#!"

"

# fY|X(y |x)dydx (8-4)

5


6/73

The MSE of a RV Based Upon Observing another RV-2

Since for a fixed value ofx, c(x) is a variable [not a function], we can

minimize the MSE by setting the derivative of the inner integral, withrespect to c, to zero:

Solving for c after noting that

gives

Thus the MMSE estimate, ", is the conditional mean of Ygiven theobservation (or data) X.

The MMSE estimate is, in general, a nonlinear function ofX.

d

dc[y! c(x)]2

!"

"

# fY|X(y|x)dy = 2(y! c)!"

"

# fY|X(y|x)dy = 0

Y = c(X) = yfY|X!"

"

# (y)dy =E[Y |X]

c(x)fY|X!"

"

# (y)dy = c(x) fY|X!"

"

# (y)dy = c(x), where the integral is one.

(8-5)

(8-6)


7/73

11

1

!

(1!2)1/2

Let the random point (X, Y) is uniformly distributed on a semicircle

The joint PDF has value 2/"on the semicircle

The conditional PDF of Ygiven thatX= !is a uniform density on [0,

(1!2)1/2].

So, "=E[Y|X= !] = (1/2)(1!2)1/2 and this estimate achieves theleast possible MSE of var(Y|X= !) = (1!2)/12

Intuitively reasonable since If |!| is nearly 1, the MSE is small (since the range of Yis small)

If |!| is nearly 0, the MSE is large (since the range of Yis large)

MMSE Example

7

X

Y


8/73

11

1

The Regression Curve of YonX

"= E[Y|X=!] as a function of !is a curve called the regressioncurve of YonX ) (plotted as the lower curve above)

Graph of (1/2)(1!2)1/2 is a half-ellipse

GivenXvalue, the MMSE estimator of Ycan be read off

from the regression curve

8

X

Y


9/73

Example: As an example, suppose Y=X3is the unknown.Then the best MMSE estimator is given by

Clearly in this case Y=X3 is the best estimator for Y.Thus the

best estimator can be nonlinear.

.}|{}|{ 33 XXXEXYEY === (8-7)

9


10/73

10

Example :Let

where k> 0 is a suitable normalization constant. To determine the best

estimate for Yin terms ofX, we need

Thus

So, the best MMSE estimator is given by

"#


11/73

11

Once again the best estimator is nonlinear. In general the best

estimator is difficult to evaluate, and hence next we

will examine the special subclass of best linear estimators.

.1

)1(

3

2

1

1

3

2

13

2

)|(}|{)(

2

2

2

31

2

3

1

2

1

21

1

2

1

22

|

x

xx

x

x

x

y

dyydyy

dyxyfyXYEXY

x

xxx x

y

x XY

!

++

=

!

!=

!=

==

===

"""

!!

#

Y = E{Y | X}

(8-8)


12/73

Linear MMSE Estimation I

Suppose that we wish to estimate Yusing a linearfunction of the

observationX.

The linear MMSE estimate of Yis = aX+ b,where aand bare

chosen to minimize the mean-square errorE[(Y aX b)2]

LetZ =Y - " = Y aX b be the error, then we will show that theminimum occurs when

and the minimum MSE (with a linear estimate) is

a =!XY"

Y

"X

b = Y! a

X

emin =!y2(1! "

2

XY)

12

(8-9)

(8-10)


13/73

Optimum Linear MSE Estimate: Proof

Suppose that ais fixed, then the problem is to estimate thequantity Y - aX by the constant b.

But, we know from the previous example [see (8-3) that, under

those circumstances,

b =E [Y aX ] =Ya X

With bdetermined as above, the mean-square error becomes

E[(Y aX b)2] =E{[(Y Y) a(X - X)]2

= ($Y)2 -2a%XY$X$Y + a

2($X)2

Minimization of (8-12) is accomplished by simply differentiating

this expression with respect to a givinga= (%XY $Y)/$X

Substituting these values of aand binto the MSE gives

13

emin =!y2(1! "

2

XY)

(8-11)

(8-12)

(8-13)

(8-14)


14/73

Linear MMSE Estimation The Orthogonality Principle

As before, letZ =Y aX bbe the estimation error, then the MSE

is

e =E[(Y aX b)2] =E[(Z2)]

Setting the derivative of the MSE with respect toa to zerogives

Which says that the estimation error,Z = Y- ", is orthogonal, (thatis uncorrelated) with the received dataX.

This is referred to as the orthogonality principle of linearestimation.

When the estimation error is uncorrelated with the observed (data),X, and, intuitively, the estimate has extracted all the correlated

information from the data.

!e

!A=E[2Z("X)]=E[(Y" Y)X]= 0

14

(8-15)

(8-16)


15/73

Orthogonality Condition: A Geometric View(from D. Snider text section 6.3)

Recall the properties of the dot product in 3-dimensional vector analysis:

We can use the dot product to express the orthogonal projection of one vector

onto another, as in the figure below.

15

!

v !

u =| !

v | cos! | !

u|, !

v !

v =| !

v|2

, !

u!

u =| !

u|2

(8-16b)


16/73

The length of is ; its direction is that of the unit

vector ; thus

Now compare these identities with the expressions for the second moments

for zero-mean random variables

The dot products are perfectly analogous to the second moments if weregard #Xand #Yas the "lengths" ofXand Y, and the correlation

coefficient "as the cosine of the "angle betweenXand Y." After all, "lies

between -1 and +1, just like the cosine . In this vein, we say two random

variablesXand Yare "orthogonal" if E{XY} = 0 (so the angle is 900). Note

that this nomenclature is only consistent with these analogies when the

variables have mean zero.

The vector analogy is useful in remembering the least-mean-squared-error

formula. Furthermore, note that is orthogonal to in the figure. By analogy,

it reasonable that the prediction error is orthogonal (in the statistical sense)

to the LMSE predictor. 16

!

vproj |

!

v | cos!!

u/ | !

u|

!

vproj =| !

v | cos! !

u/ | !

u| =| !

v | cos! |

!

u|

| !

u|

!

u| !

u|=

!

v

!

u!u

!

u

!

u

XY =!X"!

Y, XX =!

X

2, YY =!

Y

2

(8-16c)

(8-16d)


17/73

Gaussian MMSE = Linear MMSE[From Topic 4]

In general, the linear MMSE estimate has a higher MSE than the(usually nonlinear) MMSE estimateE[Y|X]

IfXand Yare jointly Gaussian RVs, it can be shown that the

conditional PDF of YgivenX= !is a Gaussian PDF with mean

Y+ ("#Y/#X)(! X)

and variance

(#Y)2(1 "2)

Hence,

E[Y|X = !] =Y

+ ("#Y/#X)(! X)

which is the same as the linear MMSE.

For jointly Gaussian RVs, MMSE estimate = linear MMSE estimate

Another special property of the Gaussian RV.17

(8-17)

(8-18)

(8-19)


18/73

Minimum Mean-Square Error (MMSE) Linear Estimate of a Random Process

From Topic 4, we know that the optimum mean-square(generally, non-linear) estimate of a random process S(t) is the

conditional mean and is defined as

A linear estimate takes the form

The objective is to find h(&) so as to minimize the MS error

S(t) ! E[S(t) |X(!),a " ! " b], where a " t" b

S(t)= h(!)X(!)d!a

b

! , where a" t" b

E{[S(t)! S(t)]2}=E{[S(t)! h(!)X(!)d!]2}a

b

"

18

(8-20)

(8-21)

(8-22)


19/73

From the orthogonality condition, the mean-square error will be a

minimum if the observed data is orthogonal to the estimation error

over the observation interval:

so that the optimal estimator, h(&), can be found as the solution ofthe integral equation

In general the above equation can only be solved numerically.

19

Minimum Mean-Square Error (MMSE) Linear Estimate-2

ES,X

{[S(t)! h(!)X(!)d!]X(")}a

b

" = 0 a # "# b

RSX

(t,!)= h(")RXX

a

b

! (",!)d" a" !"b

(8-23)

(8-24)


20/73

Examples of Linear MMSE(assume all random processes are WSS)

Prediction: We want to estimate the future value of S(t +') basedon its present value. The optimum linear estimate is given by

The optimum linear estimate satisfies the orthogonality condition

and we can solve for aas

a = RS

(')/RS

(0)

20

S(t+!) = E[S(t+!) |S(t)]= aS(t)

E{[S(t+!)! aS(t)]S(t)}= 0

(8-25a)

(8-25b)

(8-26)


21/73

Examples of Linear MMSE

Filtering: We want to estimate the present value of S(t) based on

the present value of another processX(t). The optimum linearestimate is given by


and we can solve for aas

a = RSX(0)/RXX (0)and applying (8-14) we see that the minimum MSE (MMSE) is

21

S(t)=E[S(t) |X(t)]= aX(t)

E{[S(t)! aX(t)]X(t)}= 0

(8-27)

(8-28)

(8-29a)

emin =!S2(1!"

2

SX) = R

SS(0)! aR

SX(0) (8-29b)


22/73


Interpolation: We want to estimate the value of a process S(t) at a point t +'in the interval (t, t +T ) based on 2N+1 samples S(t+kT) that are within the

time interval (see Fig 13-1 in the text reproduced below).

The optimum linear (interpolation) estimate is


22

S(t+!) = ak

k=!N

N

" S(t+ kT) 0 # !# T (8-30)

(8-31)E{[S(t+!)! akk=!N

N

" S(t+ kT)]s(t+nT)}= 0 |n|# N 0 # !# T


23/73

From which it follows that

This is a system of2N + 1 linear equations that can be solved

to yield the 2N + 1 unknowns ak.

23

akk=!N

N

" RS(kT! nT) =RS(!!nT), ! N# n # N, 0 # !# T (8-32)


24/73


Smoothing: We want to estimate the present value of S(t) based

on the value of another processX(t) which is the sum of thesignal S(t) and a noise signal, ((t):

X(t) = S(t) + ((t)

The optimal estimate can be written as the conditional mean

and the linear estimate is

Note that the estimate !(t) is the output of a linear filter withimpulse response h(&) and with inputX(t).

The orthogonality condition gives

24

S(t) =E[S(t) |x(!),!" < !


25/73

The previous equation is equivalent to:

Which becomes

To determine h(t) we need to solve the above integral equation

which is easy to so since it is a convolution of h()) withRXX( ))that hold for all values of ).

Taking transforms of both sides we obtain

SSX(*) =H(*)SXX(*)or

which is known as the non-causal Wiener filter.

Why is this a non-causal solution? 25

E{[S(t)!

h(!

)X

(t!!

)d!

]X

(t!"

)}=

0 !"#t# "!"

"

$ (8-37)

RSX

(!) = h(")RXX

!"

"

# (!!")d" for all ! (8-38)

H(!)=SSX

(!)

SXX

(!)

(8-39)


26/73

26

Figure showing that the estimate is

the output of a linear filter

H(!)=SSS

(!)

SSS

(!)+ S""

(!)(8-41)

So with the independent signal and noise,

SSX

(*) = SSS

(*) and SXX

(*) = SSS

(*) + S((

(*) (8-40)

and (8-39) simplifies to

Is this an intuitively reasonably solution? What happens when

the noise gets very small?


27/73

If the spectra SSS(*) and S(((*) shown below do not overlap, thenH(*) = 1 in the band of the signal andH(*) = 0 in the band of thenoise, and the MMSE is zero.

Which can seen by extending (8-14)

27

emin =!S2(1!"2

SX) =

1

2#[S

SS

!"

"

# ($)!H*($)SSX($)]d$

=1

2#

SSS($)S

%%($)

SSS($)+ S%%($)!"

"

# d$(8-42)


28/73

28

Nonlinear Orthogonality Rule

.

Interestingly, a general form of the orthogonality principle also holds

in the case of nonlinear estimators also.

Nonlinear Orthogonality Rule:Let represent anyfunctional

form of the data and the best estimator for Ygiven With

we shall show that

implying that

This follows since

)(Xh}|{ XYE

.X

}|{ XYEYe !=

).(}|{ XhXYEYe !"=

,0)}({ =XehE

E{eh(X)}=EX{(Y!EY|X[Y | X])h(X)}

=EX

{Yh(X)}!EX

{EY|X

[Y | X]h(X)}

=EX

{Yh(X)}!EX

{EY|X

[Yh(X) |X]}

=EX

{Yh(X)}!EX

{Yh(X)}= 0.PILLAI

(8-43)

(8-44)


29/73

Discrete Time Processes

The non-causal estimate, ![n] of a discrete time process in terms of theobserved data

X[n] = S[n] +([n]

is the estimate

which is the output of a linear time invariant, non-causal system with input

X[n] and impulse response h[n].By the orthogonality principle we have

so that

Taking z-transform of both sides of (8-48) gives

29

(8-45)

S[n]= h[k]x[n! k]!"

"

# (8-46)

E{(S[n]! h[k]x[n! k]!"

"

# )X[n!m]}= 0, for all m

RSX[m]= h[k]RXXk=!"

"

# [m! k], for all m (8-48)

(8-47)

H(z) =S

SX[z]

SXX

[z](8-49)


30/73


31/73

The Wiener-Hopf equations cannot be directly solved withZ

transforms, since the two sides are not equal for every value of m.

There is fairly complicated (mathematically) spectral theorydescribed in the text that factors the transfer function of the impulse

response h[n] into causal and anti-causal sequences. We will not

discuss this approach.

Instead we will consider the more practical case of a predictor that

uses a finite number of past samples.

The solutions involve (straightforward) matrix inversion operations.

31


32/73

Causal Prediction-UsingLPast Samples

Consider the estimation of a process S[n] in terms of its pastLsamples

S[n-k], k +1:

The objective is to find theLfilter constants h[k] so as to minimize the

MSE. From the orthogonality principle the error S[n]-![n] must beorthogonal to the data S[n-m] giving

which gives the Wiener-Hopf (discrete) equation

Equation (8-51) is a system ofL equations expressing the unknowns h[k]

in the terms of the autcorrelationRS[m]32

S[n]=E[S[n] | s[n! k],1" k" L]= h[k]S[n! k]k=1

L

# (8-52)

E{(S[n]! h[k]S[n! k]k=1

L

" )S[n!m]}= 0, 1#m # L(8-53)

RS[m]= h[k]R

S

k=1

L

! [m" k], 1#m # L (8-54)


33/73

By rewriting [8-54] as

we recognize [8-55]can be written as a matrix-vector equation

where the matrix Ris anLxLToeplitz matrix with thekmthelement equal toRm-k, h is anLx1column vector with k

th

element equal to h[k], and r is aLx1column vector with kth

element equal toRk.

A Toeplitz matrix is a matrix in which each descending

diagonal from left to right is constant. For example, if the

Toeplitz matrix A has an ijthelementAi,j, thenAi+1,j+1 =Ai,j

There are many computationally efficient algorithms for

inverting a Toeplitz matrix.

33

RS

k=1

L

! [m"k

]h[k

]= R

S[m

], 0 #m#L

[8-55]

Rh = r [8-56]


34/73

It is easy to see that that matrix R is Toeplitz by displaying

the elements of the matrix-vector equation

which can be solved by standard matrix inversion techniques

to give

34

R0 R

1 R

2! R

L!1

R1 R

0 R

1! R

L!2

R2 R

1 R

0! R

L!3

"

RL!2 R2 ! R0 R1

RL!1 RL!2 !RL!1 R0

"

#

$$$

$$$$

%

&

'''

''''

h1

h2

h3

"

hL!1

hL

"

#

$$$

$$$$

%

&

'''

''''

=

R1

R2

R3

"

R0

"

#

$$$

$$$$

%

&

'''

''''

[8-57]

h =R!1r [8-58]


35/73


36/73

EstimationError: e(n)

DesiredResponse: d(n)Input: x(n)

H(z)Linear Filter

FilterOutput:y(n)

+

_

Least Mean Square Filtering- Generic Problem

The basic concept behind Wiener Filter theory is to minimize the difference

between the filter output,y(n), and some desired output, d(n). Noise could be

present in the filter output. This minimization either performs a matrix inversion

such as in (8-58) to find the Wiener filter when the model is known, or when the

model has some unknown parameters will use the least mean square (LMS)

approach, which adaptively adjusts the filter coefficients to reduce the square of

the difference between the desired and actual waveform after filtering. As before,we will assume thatH(z) is a feedforward finite impulse response (FIR) filter with

coefficients h(k) = hk ,K=1,2,L.

The system is described by the following equation

36e(n)= d(n)!y(n)= d(n)! h(k! n)x(n)k=1

L

" [8-59]


37/73

Least Mean Square Filtering--- System Identification

EstimationError: e(n)

Input:x(n)

H(z)

Linear FilterEstimatedOutput: y(n)

+

_

UnknownSystem

The LMS approach has a number of other applications in addition to standard

filtering including systems identification, interference canceling, and inverse

modeling or de-convolution. For system identification, the filter is placed inparallel with the unknown system and the parameters can be adapted (i.e.,

changed) to minimize the estimation error.

The desired output

is the output of theunknown system,

and the filter

coefficients are

either (1) computed

(Wiener) or (2)

adjusted (LMS

adapted) so that the

filter output best

matches that of the

unknown system in

the MMSE sense.37


38/73

Wiener Filter ~ MATLAB Implementation

If the system parameters are known, the Wiener-Hopf equationcan be solved using MATLABs matrix inversion operator(\) as shown in the following example.

The MATLAB toeplitzfunction is useful in setting up thecorrelation matrix. The function call is:

Rxx = toeplitz(rxx);

where rxx is the input row vector. This constructs asymmetrical matrix from a single row vector and can be usedto generate the correlation matrix in the Wiener-Hopf equationfrom the autocorrelation function rxx .

38


39/73


40/73

Example (continued): The solution uses the routine wiener_hopfto calculatethe optimum filter coefficients.

This program computes the correlation matrix from the autocorrelation functionand the toeplitz routine, and also computes the crosscorrelation function.

function b = wiener_hopf(x,y,maxlags)% Function to compute LMS algol using Wiener-Hopf equations

% Inputs: x = input

% y = desired signal% Maxlags = filter length% Outputs: b = FIR filter coefficients

%

rxx = xcorr(x,maxlags,'coeff'); % Compute the autocorrelation vectorrxx = rxx(maxlags+1:end)'; % Use only positive half of symm. vector

rxy = xcorr(x,y,maxlags); % Compute the crosscorrelationvector

rxy = rxy(maxlags+1:end)'; % Use only positive half

%rxx_matrix = toeplitz(rxx); % Construct correlation matrix

b = rxx_matrix\rxy; % Calculate FIR coefficients using matrix

% inversion, 40


41/73

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-2

0

2

Time(sec)

SNR -8 db; 10 Hz sine

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-500

0

500

Time(sec)

After Optimal Filterin g

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

Frequency (Hz)

Optimal Filter Frequency Plot

Example: Results

The original data(upper plot) are

considerably lessnoisy after

filtering (middleplot).

The filtercomputed by theWiener-Hopf

algorithm has theshape of a

bandpass filter

with a peakfrequency at the

signal frequencyof 10 Hz.

41


42/73

LMS Adaptive Filters

Error: e(n)

DesiredResponse: d(n)

Input:x(n)

H(z)

Response:y(n)

+_

Classical filters (FIR and IIR) and optimal Wiener filters have fixed frequency

characteristics and can not respond to changes that might occur during the

course of the signal. Adaptive filters can modify their properties based onselected features of signal being analyzed, or can work when the frequency or

statistical characteristics are not known a priori.

The LMS Algorithm consists of two basic processes

Filtering process: calculate the output of FIR filter by convolving input and

taps. Calculate the estimation error by comparing the output to desired

signal Adaptation process: Adjust tap weights based on the estimation error

A typical adaptive filter paradigm is shown below, where the arrow denotes a

quantity that is being adapted. Typically the filter is a FIR filter (which is what

we will assume) with impulse response h(k).

42


43/73

Stochastic Gradient Approach

Most commonly used adaptive filtering algorithm.

Define cost function as mean-squared difference

between filter output and desired response (MSE).

If parameters known, uses the method of steepest

descent to invert the Wiener matrix

Move towards the minimum on the error

surface to get to minimum (the MSE has a

single minimum)

Requires the gradient of the error surface to be

known

When parameters not known, the most popular

adaptation algorithm is the LMS algorithm

Derived from steepest descent

Does not require gradient to be know: it isestimated at every iteration

43

update value

of tap-weigth

vector

!

"

###

$

%

&&&=

old value

of tap-weight

vector

!

"

###

$

%

&&&+

learning-

rate

parameter

!

"

###

$

%

&&&

tap'

input

vector

!

"

###

$

%

&&&

error

signal

!

"#

$

%&

Mean-Square Error (MSE) as a (Convex)Function of the Tap Weight

h2

h1

Mean-S

quareError(MSE)

Weight Values

estimated gradient


44/73

Least Mean Squared (LMS) Approach to Adaptive Filtering

If the MSE, e, were available, then the algorithm would use the MSE to compute

the optimum filter coefficients. But, in most practical situations, the MSE is

unknown or changing, while the instantaneous error enis often available.

Note that the MSE is defined asE[(en)2], and, ideally, we are interested in the

gradient, or derivative, of the MSE with respect to the adjustable parameters h(k).

Taking the derivative of the MSE with respect to h(k) gives:

where we have used the property that the differentiation and expectation

operations are interchangeable.

So, since we dont have access to the average MSE, we will drop theE operationand use the fact that

is an unbiased estimate of the gradient [8-60] to approximate the gradient.

!E[(en )2 ]!h(k)

=E!(en )2

!h(k)

!(en)2

!h(k)

[8-60]

[8-61]

44


45/73

Least Mean Squared (LMS) Approach to Adaptive Filtering

The LMS algorithm uses the estimated gradient [8-61] to adjust the filter

parameters.

The LMS algorithm adjusts the filter coefficients so that the sum of the squared

errors, which approximates (estimates) the MSE, converges toward this

minimum. The LMS algorithm uses a recursive gradient method known as the

steepest-descentmethod for finding the filter coefficients that produce the

minimum sum of squared errors. A modified steepest-descent algorithm updatesthe adjustable parameters to move in the direction of the negative gradient. The

symbol hn(k) denotes the impulse response coefficient h(k) at the nthiteration of

the LMS algorithm.

Filter coefficients are modified using an estimate of the negative gradient of the

error function with respect to a given hn(k). This estimate is given by the partialderivative of the (instantaneous)squared error, en, with respect to the

coefficients, hn(k): using the chain rule for differentiation and [8-59] we have

[8-62]!en

2

!hn(k)= 2e(n)

![d(n)"y(n)]

!hn (k)= "2e(n)x(n" k)

45


46/73

LMS Algorithm (continued)

Using this (estimate) for the gradient to construct an error signal,

the LMS algorithm updates the filter parameters in the direction of

the negative gradient, so if the filter parameter h(k) at the nth

iteration of the LMS algorithm is denoted by hn(k), the LMS

algorithm computes hn+1(k) as

hn+1(k) = hn(k) +,e(n)x(n-k) , k = 1,2,.,N and n=1,2,,

where,is a constant learning rate parameter that controls the rateof descent and convergence to the filter coefficients.

The equation [8-63] can be written as a vector iterative equation as

hn+1 =hn +,e(n)xn , n=1,2,,

wherexn is aNx1column vector whose mth

entry isx(n-m).

[8-63]

[8-64]

46


47/73

Example 2:Applying the LMS algorithm to a systems identification task.The unknownsystem will be an all-zero linear process with a digital

Transfer Function of:

H(z) = .5 +.75z-1+ 1.2 z-2

Confirm the match by plotting the magnitude of the Transfer Function for

both the unknown and matching systems.

b_unknown = [.5 .75 1.2]; % Define unknown processxn = randn(1,N);

xd = conv(b_unknown,xn); % Generate unknown system outputxd = xd(3:N+2); % Truncate extra points (symmetrically)

%% Apply Weiner filter

b = wiener_hopf(xn,xd,L); % Compute matching filter coefficientsb = b/N; % Scale filter coefficients..Calculate and plot frequency characteristics.

47


48/73

Example Results

0 50 100 150 200 2500

1

2

3

4

5

6

7

Frequency (Hz)

|H(z)|

Unknown Process

0 50 100 150 200 2500

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Frequency (Hz)

|H(z)|

Matching Process

Original Coefficients: 0.5; 0.75; 1.2

Identified Coefficients: 0.44; 0.67; 1.1

The identified TransferFunction and coefficients

closely matched those ofthe unknown system.

In this example, theunknown system is an

all-zero system so the

match by an FIR filterwas quite close. A

system containing both

poles and zeros wouldbe more difficult to

match.

48


49/73

Adaptive Noise Cancellation

AdaptiveFilter

Error Signal

e(n) = x(n) -N*(n)

N*(n)

Desired outputAdaptive Noise

Cancellation

x(n) + N(n)

N(n)

Signal Channel

Reference Channel

+

-

Adaptive noise cancellation requires a reference signal that contains

components of the noise, but not the signal. The reference channel

carries a signal N(n), that is correlated with the noise N(n), but not

with the signal of interest, x(n). The adaptive filter will produce an

output N*(n), that minimizes the overall output. Since the adaptive

filter has no access to the signal, x(n), it can only reduce the overalloutput by minimizing the noise in this output.

49


50/73

Adaptive Line Enhancement (ALE)

DelayD

AdaptiveFIR Filter

Error Signal

e(n) = Bb(n) + Nb(n) - Nb*(n)

Nb*(n)

Desired output:Broadband Signal

(InteferenceSuppression)

B(n) + Nb(n)

Decorrelationdelay

Desired output:Narrowband

Signal(Adaptive LineEnhancement)

A reference signal is not necessary to separate narrowband frombroadband signals. In Adaptive Noise Enhancement, broadband and

narrowband signals are separated by delay: only narrowband signals will berelated to delayed versions of themselves. The error signal contains both

broadband and narrowband signals, but the filter can reduce only thenarrowband signals. Hence the adaptive filter output contains the filtered

narrowband singal. The decorrelation delay must be chosen with care.

50


51/73

Example: 3 Given the same sinusoidal signal in noise as used in

Example 1, design an adaptive filter to remove the noise. Just as in

Example 1, assume that you have a copy of the desired signal.

% Same initial lines as in Example 8-1 .....

% xn in the input signal containing noise% x is the desired signal (as in Ex 8-1 I nose free version of the signal)

%% Calculate Convergence Parameter

PX = (1/(N+1))* sum(xn.^2); % Calculate approx. power in xn

delta = a * (1/(10*L*PX)); % Calculate!%

[b,y] = lms(xn,x,delta,L); % Apply LMS algorithm (see below)

%

% Plotting identical to Example 8-1....

The adaptive filter coefficients are determined by the LMS algorithm

51


52/73

LMS Algorithm

function [b,y,e] = lms(x,d,delta,L)% Simple function to adjust filter coefficients using the LSM algorithm

% Adjusts filter coefficients, b, to provide the best match between% the input, x(n), and a desired waveform, d(n),

% Both waveforms must be the same length% Uses a standard FIR filter

%

M = length(x);b = zeros(1,L); y = zeros(1,M); % Initialize outputs

for n = L:Mx1 = x(n:-1:n-L+1); % Select input for convolutiony(n) = b * x1'; % Convolve (multiply) weights

with inpute(n) = d(n) - y(n); % Calculate error

b = b + delta*e(n)*x1; % Adjust weights

end

The LMS algorithm is implemented in the function lms.The input is x, the desired signal is d, delta is the

convergence factor and L is the filter length.

52


53/73

Example: Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-2

0

2

4

Time(sec)

x(t)

SNR -8 db; 10 Hz sine

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.5

0

0.5

Time(sec)

y(t)

After Adaptive Filtering

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8x 10

-7

Frequency (Hz)

|H(f)|

Adaptive Filter Frequency Plot

Application of an

adaptive filter usingthe LSM recursivealgorithm to data

containing a singlesinusoid (10 Hz) in

noise (SNR = -8

db). The filterrequires the first 0.4

to 0.5 seconds to

adapt (400-500points), and that the

frequencycharacteristics after

adaptation arethose of abandpass filter with

a single cutofffrequency of 10 Hz.

53


54/73

In the next example an ALE Filter is constructed using the LMSalgorithm. The desired waveformis just the signal delayed. The

best delay was found empirically to be 5 samples.

Adaptive Line Enhancement (ALE)

delay = 5; % Decorrelation delaya = .075; % Convergence gain

%%Generate data: two sequential sinusoids, 10 & 20 Hz in noise (SNR = -6)

x = [sig_noise(10,-6,N/2) sig_noise(20,-6,N/2)];.. Plot original signal .

%

PX = (1/(N+1))* sum(x.^2); % Calculate waveform power for delta

delta = (1/(10*L*PX)) * a; % Use 10% of the max. range of delta%

xd = [x(delay:N) zeros(1,delay-1)]; % Delay signal to decorrelate noise[b,y] = lms(xd,x,delta,L); % Apply LMS algorithm

Plot filtered signal ..

54


55/73

Example 4: Results Unlike a fixed Wiener Filter, an adaptive filter can trackchanges in a waveform as shown in this example where two sequential

sinusoids having different frequencies (10 & 20 Hz) are adaptively filtered.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-4

-2

0

2

4

6

Time(sec)

x(t)

10 & 20 Hz SNR -6 db

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-0.4

-0.2

0

0.2

0.4

Time(sec)

y(t)

After Adaptive Filtering

55


56/73

Example 5: Adaptive Noise Cancellation (ANC). The LMS algorithm

is used with a reference signal to cancel a narrowband interference

signal.

0 0.5 1 1.5 2 2.5 3 3.5 4-1

-0.5

0

0.5

1

x(t)

Original Signal

0 0.5 1 1.5 2 2.5 3 3.5 4-2

-1

0

1

2

x(t)+n

(t)

Signal + interference

0 0.5 1 1.5 2 2.5 3 3.5 4-2

-1

0

1

2

Time(sec)

y(t)

After Adaptive Noise Cancellation

In this

application,

approximately

1000 samples(2.0 sec) are

required for the

filter to adapt

correctly.

56


57/73

Phase Sensitive Detection

Phase Sensitive Detection,also known as Synchronous orCoherent Detection,is a technique for demodulatingamplitude modulated(AM) signals that is also veryeffective in reducing noise.

From a frequency domain point of view, the effect ofamplitude modulation is to shift the signal frequencies to

another portion of the spectrum on either side of themodulating, or carrier,frequency.

Amplitude modulation can be very effective in reducing noisebecause it can shift signal frequencies to spectral regionswhere noise is minimal.

The application of a narrowband filter centered about the new

frequency range (i.e. the carrier frequency) can then be used toremove the noise.

A Phase Sensitive Detector functions as a narrowband filterthat tracks the carrier frequency. The bandwidth can be quitesmall.

57


58/73


59/73

Phase Sensitive Detection (continued)

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

1.2

Frequency (Hz)

fc

BW

Frequency characteristicsof a phase sensitive

detector. The frequencyresponse of the lowpass

filter is effectively reflected

about the carrierfrequency producing a

bandpass filter that tracksthe carrier frequency. By

making the cutoff

frequency small, the

bandwidth of the virtualbandpass can be verynarrow.

59


60/73

Example 6: Using a Phase Sensitive Detection to demodulate the

signal amplitude modulated with a 5 Hz sawtooth wave. The AM

signal is buried in -10 db noise. The filter is chosen as a second-order

Butterworth lowpass filter with a cutoff frequency set for best noiserejection while still providing reasonable fidelity to the sawtooth

waveform.

wn = .02; % Lowpass filter cutoff frequency[b,a] = butter(2,wn); % Design lowpass filter

%% Phase sensitive detection

ishift = fix(.125 * fs/fc); % Shift carrier by 1/4 periodvc = [vc(ishift:N) vc(1:ishift-1)]; % using periodic shift

v1 = vc .* vm; % Multiplier

vout = filter(b,a,v1); % Apply lowpass filter

60


61/73


62/73

62

Kalman Filter

The Kalman Filter is a recursive (iterative) time-domain data

processing algorithm in the time domain that solves the sameproblems as the Wiener filter. The Kalman filter can be also be

made adaptive, but we will not cover this topic (I do in my Digital

Communications course")

Generates optimal estimate of desired quantities given the set of

measurements (estimation, prediction, interpolation, smoothing,)

Optimal filtering for linear system and white Gaussian errors,

Kalman filter is bestestimate based on all previous

measurements

Recursive/Iterative Does not need to store all previous measurements and reprocess all

data each time step.

Kalman algorithmic approach can be viewed as two steps: (1)

prediction and then (2) correction.


63/73

63

Kalman Algorithm System Model

Output

device Kalman

Filter-

EstimatorMeasurement

noise

System

state

Input

Observed

output

Optimal

estimate of

system state

System

model noise

System

dynamics

Black box

system model


64/73

64

Kalman Filter Overview---Discrete Time The system state process,yk, that is to be estimated is modelled by the following

difference equations

yk=Ayk-1+Buk+ wk-1 [8-65a]

zk= Hyk+ vk [8-66a]

where the system state process is denoted byyk with filter parameters,AandB, and the output filterHthat are known. The model noise is wk. The processzkis the observable system output (filtered signal + noise) and the process u

kis the

system input. The model noise wkhas covariance Q,the measurement noise vkhas covarianceR, andPdenotes the prediction error co-variance matrix.

Kalman Filter algorithm is a two-step process: prediction and correction

1.Prediction:--kis an estimate based on measurements at previous time stepsthat follows the system above system dynamics

--k = Ayk-1+ Buk [8-67a]

P-k = APk-1AT+ Q [8-67b]

2. Correction:-khas additional information the measurement at time k-k =-

-k+ Kk(zk- H-

-k) [8-68a]

Pk = (I - KkH)P-k where Kk= P

-kH

T(HP-kHT+ R)-1 [8-68b]


65/73

65

Blending Factor

If we are sure about measurements: Measurement error covariance of the output noiseRdecreases to zero

The Kalman Gain,Kkdecreases and weights residual more heavily than

prediction

If we are sure about prediction

Prediction error covariance P-k decreases to zero

The Kalman GainKkincreases and weights prediction more heavily

than residual


66/73

66

Kalman Filter Summary

--k = Ayk-1+ Buk

P-k = APk-1AT+ Q

Prediction (time update)

(1) Project the state ahead

(2) Project the error covariance ahead

Correction (Measurement Update)

(1) Compute the Kalman Gain

(2) Update estimate with measurementzk

(3) Update error covariance

-k =--k+ K(zk- H-

-k)

Kk= P-kH

T(HP-kHT+ R)-1

Pk = (I - KH)P-k


67/73

67

Example Constant System Model

Prediction: system model has no input and no model noise

-k =--k+ Kk(zk- H-

-k)

Correction:

Kk= P-k(P

-k + R)

-1

--k = yk-1

P-k = Pk-1

Pk = (I - Kk)P-k


68/73

68

Example Constant System Model

0 10 20 30 40 50 60 70 80 90 100-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

-k


69/73

69

Example Constant Model

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Convergence of Error Covariance - Pk


70/73

70

0 10 20 30 40 50 60 70 80 90 100-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

Example Constant Model

Larger value ofR the

measurement error covariance

(indicates poorer quality of

measurements)

Filter slower to believe

measurements slower

convergence

-k


71/73

Comparing the Least-Squares (Kalman ) and the Least Mean Square Error

(Wiener) Approach

Least Mean Square [LMS] criterion

is statistical

Error criterion is not an explicit

function of the data, but depends

only on statistics

Least Squares (Kalman) Error

criterion is an explicit function of

the signal samples

To track variations in the channel,

the weighting factor, w [


72/73

Least-Squares/Kalman Algorithm: For an FIR Filter

LSEN = w

N!n

n=0

N

" en2 = wN!n (zkn=0

N

" ! dk)2 = wN!n (yn'n=0

N

" cn! dn )2 (8-70a)

Solving the above gives: cn =

An!1

yn (8-70b)

Where, An = w

n!k

k=0

n

"k

rk

'+!I [ !is the noise power] (8-70c)

=wAn!1 + rnrn

' , (8-70d)

and rn= wn!k

k=0

n

" rkak = wrn!1 + rnan (8-70e)

Challenge: given cn, how do we find c

n+1 ?

72


73/73

Least-Squares/Kalman Algorithm: For a Tapped Delay Line Equalizer---continued

The key result that we use to derive the Kalman algorithm is the

Matrix Inversion Lemma to determine An

!1 from An!1

!1

An

!1=w

!1A

n!1

!1{ !A

n!1

!1rnrn

'A

n!1

!1

w+ rn

'A

n!1

!1rn

} (8-71a)

Letting, Dn= A

n

!1 (8-71b)

kn=

1

w+n

Dn!1rn [denotes the Kalman Gain] (8-71c)

n = r

n

'D

n!1rn (8-71d)

It can be shown that

cn+1 = cn + knen (8-71e)

Dn=w

!1[Dn!1 ! knrn

'D

n!1] (8-71f)

This algorithm takes "big" steps in the direction of the Kalman gain to iteratively

realize the optimum tap setting at each time instant [based upon the received

samples up to "n"]. The algorithm is effectively using the Gram-Schmidt

orthogonalization technique to realize copt

from the successive input vectors, rn{ }

The Kalman algorithms converge in ~Niterations !

topic 8-mean square estimation-wiener and kalman filtering

Documents