eleg-636: statistical signal processingarce/files/courses/statsp/...eleg-636: statistical signal...

93
ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware Spring 2010 Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 79

Upload: others

Post on 20-Mar-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

ELEG-636: Statistical Signal Processing

Gonzalo R. Arce

Department of Electrical and Computer EngineeringUniversity of Delaware

Spring 2010

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 79

Page 2: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Course Objectives & Structure

Course Objectives & Structure

Objective: Given a discrete time sequence {x(n)}, developStatistical and spectral signal representationFiltering, prediction, and system identification algorithmsOptimization methods that are

StatisticalAdaptive

Course Structure:Weekly lectures [notes: www.ece.udel.edu/!arce]Periodic homework (theory & Matlab implementations) [15%]Midterm & Final examinations [85%]

Textbook:Haykin, Adaptive Filter Theory.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 2 / 79

Page 3: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Course Objectives & Structure

Course Objectives & Structure

Broad Applications in Communications, Imaging, Sensors.Emerging application in

Brain-imaging techniquesBrain-machine interfaces,Implantable devices.

Neurofeedback presents real-time physiological signals from MRIsin a visual or auditory form to provide information about brainactivity. These signals are used to train the patient to alter neuralactivity in a desired direction.Traditionally, feedback using EEGs or other mechanisms has notfocused on the brain because the resolution is not good enough.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 3 / 79

Page 4: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Motivation

Adaptive Optimization and Filtering Methods

MotivationAdaptive optimization and filtering methods are appropriate,advantageous, or necessary when:

Signal statistics are not known a priori and must be “learned” fromobserved or representative samplesSignal statistics evolve over timeTime or computational restrictions dictate that simple, if repetitive,operations be employed rather than solving more complex, closedform expressionsTo be considered are the following algorithms:

Steepest Descent (SD) – deterministicLeast Means Squared (LMS) – stochasticRecursive Least Squares (RLS) – deterministic

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 5 / 79

Page 5: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

Definition (Steepest Descent (SD))Steepest descent, also known as gradient descent, it is an iterativetechnique for finding the local minimum of a function.Approach: Given an arbitrary starting point, the current location (value)is moved in steps proportional to the negatives of the gradient at thecurrent point.

SD is an old, deterministic method, that is the basis for stochasticgradient based methodsSD is a feedback approach to finding local minimum of an errorperformance surfaceThe error surface must be known a prioriIn the MSE case, SD converges converges to the optimal solution,w0 = R!1p, without inverting a matrixQuestion: Why in the MSE case does this converge to the globalminimum rather than a local minimum?

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 6 / 79

Page 6: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

ExampleConsider a well structured cost function with a single minimum. Theoptimization proceeds as follows:

Contour plot showing that evolution of the optimization

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 7 / 79

Page 7: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

ExampleConsider a gradient ascent example in which there are multipleminima/maxima

Surface plot showing the multipleminima and maxima Contour plot illustrating that the final

result depends on starting value

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 8 / 79

Page 8: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

To derive the approach, consider the FIR case:

{x(n)} – the WSS input samples{d(n)} – the WSS desired output{d(n)} – the estimate of the desired signal given by

d(n) = wH(n)x(n)wherex(n) = [x(n), x(n " 1), · · · , x(n "M + 1)]T [obs. vector]w(n) = [w0(n),w1(n), · · · ,wM-1(n)]T [time indexed filter coefs.]

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 9 / 79

Page 9: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

Then similarly to previously considered cases

e(n) = d(n)" d(n) = d(n)"wH(n)x(n)

and the MSE at time n is

J(n) = E{|e(n)|2}= !2d "wH(n)p" pHw(n) +wH(n)Rw(n)

where!2d – variance of desired signalp – cross-correlation between x(n) and d(n)R – correlation matrix of x(n)

Note: The weight vector and cost function are time indexed (functionsof time)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 10 / 79

Page 10: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

When w(n) is set to the (optimal) Wiener solution,

w(n) = w0 = R!1p

andJ(n) = Jmin = !2d " pHw0

Use the method of steepest descent to iteratively find w0.The optimal result is achieved since the cost function is a secondorder polynomial with a single unique minimum

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 11 / 79

Page 11: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

ExampleLet M = 2. The MSE is a bowl–shaped surface, which is a function ofthe 2-D space weight vector w(n)

A contour of the MSE is given as

w0w

J(w)

J(w)

w1

w2

1

Surface Plot

w0

w2

w1

)(J∇−

2wJ

∂−

1wJ

∂−

Contour Plot

Imagine dropping a marble at any point on the bowl-shaped surface.The ball will reach the minimum point by going through the path ofsteepest descent.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 12 / 79

Page 12: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent

Observation: Set the direction of filter update as: "#J(n)Resulting Update:

w(n + 1) = w(n) + 12µ["#J(n)]

or, since #J(n) = "2p+ 2Rw(n)

w(n + 1) = w(n) + µ[p" Rw(n)] n = 0, 1, 2, · · ·

where w(0) = 0 (or other appropriate value) and µ is the step sizeObservation: SD uses feedback, which makes it possible for thesystem to be unstable

Bounds on the step size guaranteeing stability can be determinedwith respect to the eigenvalues of R (Widrow, 1970)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 13 / 79

Page 13: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Convergence AnalysisDefine the error vector for the tap weights as

c(n) = w(n)"w0

Then using p = Rw0 in the update,

w(n + 1) = w(n) + µ[p" Rw(n)]= w(n) + µ[Rw0 " Rw(n)]= w(n)" µRc(n)

and subtracting w0 from both sides

w(n + 1)"w0 = w(n)"w0 " µRc(n)$ c(n + 1) = c(n)" µRc(n)

= [I" µR]c(n)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 14 / 79

Page 14: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Using the Unitary Similarity Transform

R = Q!QH

we have

c(n + 1) = [I" µR]c(n)= [I" µQ!QH ]c(n)

$ QHc(n + 1) = [QH " µQHQ!QH ]c(n)= [I" µ!]QHc(n) (%)

Define the transformed coefficients as

v(n) = QHc(n)= QH(w(n)"w0)

Then (%) becomesv(n + 1) = [I" µ!]v(n)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 15 / 79

Page 15: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Consider the initial condition of v(n)

v(0) = QH(w(0)"w0)

= "QHw0 [if w(0) = 0]

Consider the kth term (mode) in

v(n + 1) = [I" µ!]v(n)

Note [I" µ!] is diagonalThus all modes are independently updatedThe update for the kth term can be written as

vk (n + 1) = (1" µ"k )vk (n) k = 1, 2, · · · ,M

or using recursion

vk (n) = (1" µ"k )nvk (0)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 16 / 79

Page 16: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Observation: Conversion to the optimal solution requires

limn"#

w(n) = w0

$ limn"#

c(n) = limn"#

w(n)"w0 = 0

$ limn"#

v(n) = limn"#

QHc(n) = 0

$ limn"#

vk (n) = 0 k = 1, 2, · · · ,M (%)

Result: According to the recursion

vk (n) = (1" µ"k )nvk (0)

the limit in (%) holds if and only if

|1" µ"k | < 1 for all k

Thus since the eigenvalues are nonnegative, 0 < µ"max < 2, or

0 < µ <2

"max

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 17 / 79

Page 17: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Observation: The kth mode has geometric decay

vk (n) = (1" µ"k )nvk (0)

The rate of decay it is characterized by the time it takes to decayto e!1 of the initial valueLet #k denote this time for the kth mode

vk (#k ) = (1" µ"k )!k vk (0) = e!1vk (0)

$ e!1 = (1" µ"k )!k

$ #k ="1

ln(1" µ"k )&

1µ"k

for µ ' 1

Result: The overall rate of decay is

"1ln(1" µ"max)

( # ("1

ln(1" µ"min)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 18 / 79

Page 18: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

ExampleConsider the typical behavior of a single mode

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 19 / 79

Page 19: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Error AnalysisRecall that

J(n) = Jmin + (w(n)"w0)HR(w(n)"w0)

= Jmin + (w(n)"w0)HQ!QH(w(n)"w0)

= Jmin + v(n)H!v(n)

= Jmin +M!

k=1"k |vk (n)|2 [sub in vk (n) = (1" µ"k )

nvk (0)]

= Jmin +M!

k=1"k (1" µ"k )

2n|vk (0)|2

Result: If 0 < µ < 2"max

, then

limn"#

J(n) = Jmin

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 20 / 79

Page 20: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

ExampleConsider a two–tap predictor for real–valued input

λ

χ

Analyzed the effects of the following cases:Varying the eigenvalue spread $(R) = "max

"minwhile keeping µ fixed

Varying µ and keeping the eigenvalue spread $(R) fixed

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 21 / 79

Page 21: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD loci plots (with shown J(n) contours) as a function of [v1(n), v2(n)]for step-size µ = 0.3

Eigenvalue spread: $(R) = 1.22Small eigenvalue spread$modes converge at a similar rate

Eigenvalue spread: $(R) = 3Moderate eigenvalue spread$modes converge at moderatelysimilar rates

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 22 / 79

Page 22: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD loci plots (with shown J(n) contours) as a function of [v1(n), v2(n)]for step-size µ = 0.3

Eigenvalue spread: $(R) = 10Large eigenvalue spread$modes converge at different rates

Eigenvalue spread: $(R) = 100Very large eigenvalue spread$modes converge at very differentratesPrinciple direction convergence isfastest

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 23 / 79

Page 23: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD loci plots (with shown J(n) contours) as a function of [w1(n),w2(n)]for step-size µ = 0.3

Eigenvalue spread: $(R) = 1.22Small eigenvalue spread$modes converge at a similar rate

Eigenvalue spread: $(R) = 3Moderate eigenvalue spread$modes converge at moderatelysimilar rates

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 24 / 79

Page 24: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD loci plots (with shown J(n) contours) as a function of [w1(n),w2(n)]for step-size µ = 0.3

Eigenvalue spread: $(R) = 10Large eigenvalue spread$modes converge at different rates

Eigenvalue spread: $(R) = 100Very large eigenvalue spread$modes converge at very differentratesPrinciple direction convergence isfastest

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 25 / 79

Page 25: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

Learning curves of steepest-descent algorithm with step-sizeparameter µ = 0.3 and varying eigenvalue spread.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 26 / 79

Page 26: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD loci plots (with shown J(n) contours) as a function of [v1(n), v2(n)]with $(R) = 10 and varying step–sizes

Step–sizes: µ = 0.3This is over–damped$ slowconvergence

Step–sizes: µ = 1This is under–damped$ fast(erratic) convergence

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 27 / 79

Page 27: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD loci plots (with shown J(n) contours) as a function of [w1(n),w2(n)]with $(R) = 10 and varying step–sizes

Step–sizes: µ = 0.3This is over–damped$ slowconvergence

Step–sizes: µ = 1This is under–damped$ fast(erratic) convergence

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 28 / 79

Page 28: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

ExampleConsider a system identification problem

w(n) system

{x(n)}

)(ˆ nd d(n)+_

e(n)

Suppose M = 2 and

Rx ="

1 0.80.8 1

#

P =

"

0.80.5

#

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 29 / 79

Page 29: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

From eigen analysis we have

"1 = 1.8,"2 = 0.2$ µ <21.8

alsoq1 =

1)2

"

11

#

q2 =1)2

"

1"1

#

andQ =

1)2

"

1 11 "1

#

Also,w0 = R!1p =

"

1.11"0.389

#

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 30 / 79

Page 30: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

Thusv(n) = QH [w(n)"w0]

Noting that

v(0) = "QHw0 = "1)2

"

1 11 "1

# "

1.11"0.389

#

=

"

0.511.06

#

andv1(n) = (1" µ(1.8))n0.51

v2(n) = (1" µ(0.2))n1.06

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 31 / 79

Page 31: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor

SD convergence properties for two µ values

Step–sizes: µ = 0.5This is over–damped$ slowconvergence

Step–sizes: µ = 1This is under–damped$ fast(erratic) convergence

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 32 / 79

Page 32: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Least Mean Squares (LMS)

Least Mean Squares (LMS)

Definition (Least Mean Squares (LMS) Algorithm)Motivation: The error performance surface used by the SD method isnot always known a prioriSolution: Use estimated values. We will use the followinginstantaneous estimates

R(n) = x(n)xH(n)

p(n) = x(n)d$(n)

Result: The estimates are RVs and thus this leads to a stochasticoptimization

Historical Note: Invented in 1960 by Stanford University professorBernard Widrow and his first Ph.D. student, Ted Hoff

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 33 / 79

Page 33: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Least Mean Squares (LMS)

Recall the SD update

w(n + 1) = w(n) + 12µ["#(J(n))]

where the gradient of the error surface at w(n) was shown to be

#(J(n)) = "2p+ 2Rw(n)

Using the instantaneous estimates,

#(J(n)) = "2x(n)d$(n) + 2x(n)xH(n)w(n)= "2x(n)[d$(n)" xH(n)w(n)]= "2x(n)[d$(n)" d$(n)]= "2x(n)e$(n)

where e$(n) is the complex conjugate of the estimate error.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 34 / 79

Page 34: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Least Mean Squares (LMS)

Utilizing #(J(n)) = "2x(n)e$(n) in the update

w(n + 1) = w(n) + 12µ["#(J(n))]

= w(n) + µx(n)e$(n) [LMS Update]

The LMS algorithm belongs to the family of stochastic gradientalgorithmsThe update is extremely simpleAlthough the instantaneous estimates may have large variance,the LMS algorithm is recursive and effectively averages theseestimatesThe simplicity and good performance of the LMS algorithm makeit the benchmark against which other optimization algorithms arejudged

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 35 / 79

Page 35: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Convergence Analysis

Independence TheoremThe following conditions hold:

1 The vectors x(1), x(2), · · · , x(n) are statistically independent2 x(n) is independent of d(1), d(2), · · · , d(n " 1)3 d(n) is statistically dependent on x(n), but is independent ofd(1), d(2), · · · , d(n " 1)

4 x(n) and d(n) are mutually Gaussian

The independence theorem is invoked in the LMS algorithmanalysisThe independence theorem is justified in some cases, e.g.,beamforming where we receive independent vector observationsIn other cases it is not well justified, but allows the analysis toproceeds (i.e., when all else fails, invoke simplifying assumptions)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 36 / 79

Page 36: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

We will invoke the independence theorem to show that w(n) convergesto the optimal solution in the mean

limn"#

E{w(n)} = w0

To prove this, evaluate the update

w(n + 1) = w(n) + µx(n)e$(n)$ w(n + 1)"w0 = w(n)"w0 + µx(n)e$(n)

$ c(n + 1) = c(n) + µx(n)(d$(n)" xH(n)w(n))= c(n) + µx(n)d$(n)" µx(n)xH(n)[w(n)"w0 +w0]

= c(n) + µx(n)d$(n)" µx(n)xH(n)c(n)"µx(n)xH(n)w0

= [I" µx(n)xH(n)]c(n) + µx(n)[d$(n)" xH(n)w0]

= [I" µx(n)xH(n)]c(n) + µx(n)e$0(n)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 37 / 79

Page 37: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Take the expectation of the update noting thatw(n) is based on past inputs and desired values

$ w(n), and consequently c(n)), are independent of x(n)(Independence Theorem)

Thus

c(n + 1) = [I" µx(n)xH(n)]c(n) + µx(n)e$0(n)$ E{c(n + 1)} = (I" µR)E{c(n)}+ µE{x(n)e$0(n)}

$ %& '

=0 why?

= (I" µR)E{c(n)}

Using arguments similar to the SD case we have

limn"#

E{c(n)} = 0 if 0 < µ <2

"max

or equivalently

limn"#

E{w(n)} = w0 if 0 < µ <2

"max

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 38 / 79

Page 38: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Noting that(M

i=1 "i = trace[R]

$ "max ( trace[R] = Mr(0) = M!2x

Thus a more conservative bound (and one easier to determine) is

0 < µ <2

M!2x

Convergence in the mean

limn"#

E{w(n)} = w0

is a weak condition that says nothing about the variance, whichmay even growA stronger condition is convergence in the mean square, whichsays

limn"#

E{|c(n)|2} = constant

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 39 / 79

Page 39: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Proving convergence in the mean square is equivalent to showing that

limn"#

J(n) = limn"#

E{|e(n)|2} = constant

To evaluate the limit, write e(n) as

e(n) = d(n)" d(n) = d(n)"wH(n)x(n)= d(n)"wH

0 x(n)" [wH(n)"wH0 ]x(n)

= e0(n)" cH(n)x(n)

Thus

J(n) = E{|e(n)|2}= E

)*

e0(n)" cH(n)x(n)+*

e$0(n)" xH(n)c(n)+,

= Jmin + E{cH(n)x(n)xH(n)c(n)}$ %& '

Jex(n)

[Cross terms* 0, why?]

= Jmin + Jex(n)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 40 / 79

Page 40: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Since Jex(n) is a scalar

Jex(n) = E{cH(n)x(n)xH(n)c(n)}= E{trace[cH(n)x(n)xH(n)c(n)]}= E{trace[x(n)xH(n)c(n)cH(n)]}= trace[E{x(n)xH(n)c(n)cH(n)}]

Invoking the independence theorem

Jex(n) = trace[E{x(n)xH(n)}E{c(n)cH(n)}]= trace[RK(n)]

whereK(n) = E{c(n)cH(n)}

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 41 / 79

Page 41: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Thus

J(n) = Jmin + Jex(n)= Jmin + trace[RK(n)]

RecallQHRQ = ! or R = Q!QH

SetS(n) + QHK(n)Q

where S(n) need not be diagonal. Then

K(n) = QQHK(n)QQH [since Q!1 = QH ]

= QS(n)QH

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 42 / 79

Page 42: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Utilizing R = Q!QH and K(n) = QS(n)QH in the excess errorexpression

Jex(n) = trace[RK(n)]= trace[Q!QHQS(n)QH ]

= trace[Q!S(n)QH ]

= trace[QHQ!S(n)]= trace[!S(n)]

Since ! is diagonal

Jex(n) = trace[!S(n)] =M!

i=1"i si(n)

where s1(n), s2(n), · · · , sM(n) are the diagonal elements of S(n).

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 43 / 79

Page 43: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

The previously derived recursion

E{c(n + 1)} = (I" µR)E{c(n)}

can be modified to yield a recursion on S(n),

S(n + 1) = (I" µ!)S(n)(I" µ!) + µ2Jmin!

which for the diagonal elements is

si(n + 1) = (1" µ"i)2si(n) + µ2Jmin"i i = 1, 2, · · · ,M

Suppose Jex(n) converges, then si(n + 1) = si(n)

$ si(n) = (1" µ"i)2si(n) + µ2Jmin"i

$ si(n) =µ2Jmin"i

1" (1" µ"i)2=

µ2Jmin"i2µ"i " µ2"2i

=µJmin2" µ"i

i = 1, 2, · · · ,M

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 44 / 79

Page 44: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis

Consider again

Jex(n) = trace[!S(n)] =M!

i=1"i si(n)

Taking the limit and utilizing si(n) = µJmin2!µ"i

,

limn"#

Jex(n) = Jmin

M!

i=1

µ"i2" µ"i

The LMS misadjustment is defined as

MA =limn"# Jex(n)

Jmin=

M!

i=1

µ"i2" µ"i

Note: A misadjustment at 10% or less is generally consideredacceptable.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 45 / 79

Page 45: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Example

µ

µ

This is a one tap predictor

x(n) = w(n)x(n " 1)

Take the underlying process to be areal order one AR process

x(n) = "ax(n " 1) + v(n)

The weight update is

w(n + 1) = w(n) + µx(n " 1)e(n) [LMS update for obs. x(n " 1)]= w(n) + µx(n " 1)[x(n)" w(n)x(n " 1)]

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 46 / 79

Page 46: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Sincex(n) = "ax(n " 1) + v(n) [AR model]

and

x(n) = w(n)x(n " 1) [one tap predictor]$ w0 = "a

Note thatE{x(n " 1)eo(n)} = E{x(n " 1)v(n)} = 0

proves the optimalitySet µ = 0.05 and consider two cases

a !2x-0.99 0.936270.99 0.995

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 47 / 79

Page 47: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Figure: Transient behavior of adaptive first-order predictor weight w(n) forµ = 0.05.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 48 / 79

Page 48: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Figure: Transient behavior of adaptive first-order predictor squared error forµ = 0.05.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 49 / 79

Page 49: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Figure: Mean-squared error learning curves for an adaptive first-orderpredictor with varying step-size parameter µ.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 50 / 79

Page 50: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Consider the expected trajectory of w(n). Recall

w(n + 1) = w(n) + µx(n " 1)e(n)= w(n) + µx(n " 1)[x(n)" w(n)x(n " 1)]= [1" µx(n " 1)x(n " 1)]w(n) + µx(n " 1)x(n)

In this example, x(n) = "ax(n " 1) + v(n). Substituting in:

w(n + 1) = [1" µx(n " 1)x(n " 1)]w(n) + µx(n " 1)["ax(n " 1)+v(n)]

= [1" µx(n " 1)x(n " 1)]w(n)" µax(n " 1)x(n " 1)+µx(n " 1)v(n)

Taking the expectation and invoking the dependence theorem

E{w(n + 1)} = (1" µ!2x )E{w(n)}" µ!2xa

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 51 / 79

Page 51: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Figure: Comparison of experimental results with theory, based on w(n).

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 52 / 79

Page 52: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Next, derive a theoretical expression for J(n).Note that the initial value of J(n) is

J(0) = E{(x(0)" w(0)x("1))2} = E{(x(0))2} = !2x

and the final value isJ(,) = Jmin + Jex

= E{(x(n)" w(n)x(n " 1))2}+ Jex

= E{(v(n))2}+ Jex

= !2v + Jminµ"1

2" µ"1

Note "1 = !2x . Thus,

J(,) = !2v + !2v

-µ!2x

2" µ!2x

.

= !2v

-

1+ µ!2x2" µ!2x

.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 53 / 79

Page 53: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

And if µ is small

J(,) = !2v

-

1+ µ!2x2" µ!2x

.

& !2v

-

1+ µ!2x2

.

Putting all the components together:

J(n) = [!2x " !2v (1+µ

2!2x )]

$ %& '

J(0)!J(#)

(1" µ!2x )2n

$ %& '

1"0

+!2v (1+µ

2!2x )

$ %& '

J(#)

Also, the time constant is

# = "1

2 ln(1" µ"1)= "

12 ln(1" µ!2x )

&1

2µ!2x

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 54 / 79

Page 54: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor

Figure: Comparison of experimental results with theory for the adaptivepredictor, based on the mean-square error for µ = 0.001.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 55 / 79

Page 55: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

Example (Adaptive Equalization)Objective: Pass a known signal through an unknown channel to invertthe effects the channel and noise have on the signal

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 56 / 79

Page 56: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

The signal is a Bernoulli sequence

xn =

/

+1 with probability 1/2"1 with probability 1/2

The additive noise is ! N(0, 0.001)The channel has a raised cosine response

hn =

/ 120

1+ cos12#w (n " 2)

23

n = 1, 2, 30 otherwise

$ w controls the eigenvalue spread $(R)$ hn is symmetric about n = 2 and thus introduces a delay of 2We will use an M = 11 tap filter, which is symmetric about n = 5$ Introduce a delay of 5Thus an overall delay of % = 5+ 2 = 7 is added to the system

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 57 / 79

Page 57: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

Channel response and Filter response

Figure: (a) Impulse response of channel; (b) impulse response of optimumtransversal equalizer.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 58 / 79

Page 58: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

Consider three w values

Note the step size is bound by the w = 3.5 case

µ (2

Mr(0) =2

11(1.3022) = 0.14

Choose µ = 0.075 in all cases.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 59 / 79

Page 59: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

Figure: Learning curves of the LMS algorithm for an adaptive equalizer withnumber of taps M = 11, step-size parameter µ = 0.075, and varyingeigenvalue spread $(R).

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 60 / 79

Page 60: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

Ensemble-average impulse response of the adaptive equalizer (after1000 iterations) for each of four different eigenvalue spreads.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 61 / 79

Page 61: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization

Figure: Learning curves of the LMS algorithm for an adaptive equalizer withthe number of taps M = 11, fixed eigenvalue spread, and varying step-sizeparameter µ.

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 62 / 79

Page 62: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality

ExampleDirectionality of the LMS algorithm

The speed of convergence of the LMS algorithm is faster incertain directions in the weight spaceIf the convergence is in the appropriate direction, the convergencecan be accelerated by increased eigenvalue spread

To investigate this phenomenon, consider the deterministic signal

x(n) = A1 cos(&1n) + A2 cos(&2n)

Even though it is deterministic, a correlation matrix can be determined:

R =12

"

A21 + A22 A21 cos(&1) + A22 cos(&2)A21 cos(&1) + A22 cos(&2) A21 + A22

#

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 63 / 79

Page 63: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality

Determining the eigenvalues and eigenvectors yields

"1 =12A

21(1+ cos(&1)) +

12A

22(1+ cos(&2))

"2 =12A

21(1" cos(&1)) +

12A

22(1" cos(&2))

andq1 =

"

11

#

q2 ="

"11

#

Case 1: A1 = 1,A2 = 0.5,&1 = 1.2,&2 = 0.1

xa(n) = cos(1.2n) + 0.5 cos(0.1n) and $(R) = 2.9

Case 2: A1 = 1,A2 = 0.5,&1 = 0.6,&2 = 0.23

xb(n) = cos(0.6n) + 0.5 cos(0.23n) and $(R) = 12.9

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 64 / 79

Page 64: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality

Since p undefined, set p = "iqiThen since p = Rw0, we see (two cases)

p = "1q1 $ Rw0 = "1q1 $ w0 = q1 ="

11

#

p = "2q2 $ Rw0 = "2q2 $ w0 = q2 ="

"11

#

Utilize 200 iterations of the algorithm.

Consider the minimum eigenfilter first, w0 = q2 ="

"11

#

Consider the maximum eigenfilter second, w0 = q1 ="

11

#

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 65 / 79

Page 65: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality

Convergence of the LMS algorithm, for a deterministic sinusoidalprocess, along “slow” eigenvector (i.e., minimum eigenfilter).

For input xa(n) ($(R) = 2.9) For input xb(n) ($(R) = 12.9)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 66 / 79

Page 66: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality

Convergence of the LMS algorithm, for a deterministic sinusoidalprocess, along “fast” eigenvector (i.e., maximum eigenfilter).

For input xa(n) ($(R) = 2.9) For input xb(n) ($(R) = 12.9)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 67 / 79

Page 67: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

Observation: The LMS correction is proportional to µx(n)e$(n)

w(n + 1) = w(n) + µx(n)e$(n)

If x(n) is large, the LMS update suffers from gradient noiseamplificationThe normalized LMS algorithm seeks to avoid gradient noiseamplification

The step size is made time varying, µ(n), and optimized tominimize the next step error

w(n + 1) = w(n) + 12µ(n)["#J(n)]

= w(n) + µ(n)[p" Rw(n)]

Choose µ(n), such that w(n + 1) produces the minimum MSE,

J(n + 1) = E{|e(n + 1)|2}

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 68 / 79

Page 68: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

Let #(n) + #J(n) and note

e(n + 1) = d(n + 1)"wH(n + 1)x(n + 1)

Objective: Choose µ(n) such that it minimizes J(n + 1)The optimal step size, µ0(n), will be a function of R and #(n).$ Use instantaneous estimates of these values

To determine µ0(n), expand J(n + 1)

J(n + 1) = E{e(n + 1)e$(n + 1)}= E{(d(n + 1)"wH(n + 1)x(n + 1))

(d$(n + 1)" xH(n + 1)w(n + 1))}= !2d "wH(n + 1)p" pHw(n + 1)

+wH(n + 1)Rw(n + 1)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 69 / 79

Page 69: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

Now use the fact that w(n + 1) = w(n)" 12µ(n)#(n)

J(n + 1) = !2d "wH(n + 1)p" pHw(n + 1)+wH(n + 1)Rw(n + 1)

= !2d ""

w(n)" 12µ(n)#(n)

#Hp

"pH"

w(n)" 12µ(n)#(n)

#

+

"

w(n)" 12µ(n)#(n)

#HR"

w(n)" 12µ(n)#(n)

#

$ %& '

= wH(n)Rw(n)" 12µ(n)w

H(n)R#(n)

"12µ(n)#

H(n)Rw(n) + 14µ

2(n)#H(n)R#(n)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 70 / 79

Page 70: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

J(n + 1) = !2d ""

w(n)" 12µ(n)#(n)

#Hp

"pH"

w(n)" 12µ(n)#(n)

#

+wH(n)Rw(n)" 12µ(n)w

H(n)R#(n)

"12µ(n)#

H(n)Rw(n) + 14µ

2(n)#H(n)R#(n)

Differentiating with respect to µ(n),

'J(n + 1)'µ(n) =

12#

H(n)p+12p

H#(n)" 12w

HR#(n)

"12#

H(n)Rw(n) + 12µ(n)#

H(n)R#(n) (%)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 71 / 79

Page 71: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

Setting (%) equal to 0

µ0(n)#H(n)R#(n) = wH(n)R#(n)" pH#(n)+#H(n)Rw(n)"#H(n)p

$ µ0(n) =wH(n)R#(n)" pH#(n) +#H(n)Rw(n)"#H(n)p

#H(n)R#(n)

=[wH(n)R" pH ]#(n) +#H(n)[Rw(n)" p]

#H(n)R#(n)

=[Rw(n)" p]H#(n) +#H(n)[Rw(n)" p]

#H(n)R#(n)

=12#

H(n)#(n) + 12#

H(n)#(n)#H(n)R#(n)

=#H(n)#(n)#H(n)R#(n)

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 72 / 79

Page 72: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

Using instantaneous estimates

R = x(n)xH(n) and p = x(n)d$(n)$ #(n) = 2[Rw(n)" p]

= 2[x(n)xH(n)w(n)" x(n)d$(n)]= 2[x(n)(d$(n)" d$(n))]= "2x(n)e$(n)

Thus

µ0(n) =#H(n)#(n)#H(n)R#(n) =

4xH(n)e(n)x(n)e$(n)2xH(n)e(n)x(n)xH(n)2x(n)e$(n)

=|e(n)|2xH(n)x(n)

|e(n)|2(xH(n)x(n))2

=1

xH(n)x(n) =1

||x(n)||2

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 73 / 79

Page 73: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm

Result: The NLMS update is

w(n + 1) = w(n) + µ

||x(n)||2$ %& '

µ(n)

x(n)e$(n)

µ is introduced to scale the updateTo avoid problems when ||x(n)||2 & 0 we add an offset

w(n + 1) = w(n) + µ

a+ ||x(n)||2x(n)e$(n)

where a > 0

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 74 / 79

Page 74: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce

Objective: Analyze the NLMS convergence

w(n + 1) = w(n) + µ

||x(n)||2x(n)e$(n)

Substituting e(n) = d(n)"wH(n)x(n)

w(n + 1) = w(n) + µ

||x(n)||2x(n)[d$(n)" xH(n)w(n)]

=

"

I" µx(n)xH(n)||x(n)||2

#

w(n) + µx(n)d$(n)||x(n)||2

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 75 / 79

Page 75: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce

Objective: Compare the NLMS and LMS algorithms:NLMS:

w(n + 1) ="

I" µx(n)xH(n)||x(n)||2

#

w(n) + µx(n)d$(n)||x(n)||2

LMS:

w(n + 1) = [I" µx(n)xH(n)]w(n) + µx(n)d$(n)

By observation, we see the following corresponding terms

LMS NLMSµ µ

x(n)xH(n) x(n)xH(n)||x(n)||2

x(n)d$(n) x(n)d$(n)||x(n)||2

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 76 / 79

Page 76: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce

LMS NLMSµ µ

x(n)xH(n) x(n)xH(n)||x(n)||2

x(n)d$(n) x(n)d$(n)||x(n)||2

LMS case:

0 < µ <2

trace[E{x(n)xH(n)}] =2

trace[R]

guarantees stabilityBy analogy,

0 < µ <2

trace4

E)x(n)xH(n)||x(n)||2

,5

guarantees stability of the NLMSGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 77 / 79

Page 77: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce

To analyze the bound, make the following approximation

E/x(n)xH(n)||x(n)||2

6

&E{x(n)xH(n)}E{||x(n)||2}

Then

trace"

E/x(n)xH(n)||x(n)||2

6#

=trace[E{x(n)xH(n)}]

E{||x(n)||2}

=E{trace[x(n)xH(n)]}

E{||x(n)||2}

=E{trace[xH(n)x(n)]}

E{||x(n)||2}

=E{trace[||x(n)||2]}

E{||x(n)||2}= 1

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 78 / 79

Page 78: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce

Thus0 < µ <

2trace

4

E)x(n)xH(n)||x(n)||2

,5 = 2

Final Result: The NLMS update

w(n + 1) = w(n) + µx(n)

||x(n)||2e$(n)

will converge if 0 < µ < 2Note:

The NLMS has a simpler convergence criterion than the LMSThe NLMS generally converges faster than the LMS algorithm

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 79 / 79

Page 79: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

ELEG 636 - Statistical Signal Processing

Gonzalo R. Arce

Variants of the LMS algorithm

Department of Electrical and Computer EngineeringUniversity of Delaware

Newark, DE, 19716Fall 2013

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 1 / 15

Page 80: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Standard LMS Algorithm

FIR filters:

y(n) = w0(n)u(n)+w1(n)u(n�1)+ ...+W

M�1(n)u(n�M +1)

=M�1

Âk=0

w

k

(n)u(n�k) = w(n)T

u(n), n = 0,1, . . . ,•

Error between filter output y(t) and a desired signal d(t):

e(n) = d(n)�y(n) = d(n)�w(n)T

u(n).

Update filter parameters according to

w(n+1) = w(n)+µu(n)e(n).

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 2 / 15

Page 81: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

1. Normalized LMS Algorithm

Modify at time n the parameter vector from w(n) to w(n+1)Where l will result from

d(n) =M�1

Âi=0

w

i

(n+1)u(n� i).

In order to add an extra freedom degree to the adaptation strategy, oneconstant, µ, controlling the step size will be introduced:

w

j

(n+1)=w

j

(n)+ µ 1

ÂM�1i=0 (u(n� i))2

e(n)u(n� j)=w

j

(n)+µ

ku(n)k2 e(n)u(n� j).

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 3 / 15

Page 82: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

To overcome the possible numerical difficulties when ku(n)k is close to zero,a constant a > 0 is used:

w

j

(n+1) = w

j

(n)+µ

a+ku(n)k2 e(n)u(n� j)

This is the update used in the Normalized LMS algorithm.

The Normalized LMS algorithm converges if

0 < µ < 2

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 4 / 15

Page 83: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Comparison of LMS and NLMS

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 5 / 15

Page 84: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Comparison of LMS and NLMS

The LMS was run with three different step-sizes:µ = [0.075; 0.025; 0.0075].

The NLMS was run with four different step-sizes: µ = [1.0; 0.5; 0.1].

The larger the step-size, the faster the convergence. The smallerthe step-size, the better the steady state square error.

LMS with µ = 0.0075 and NLMS with µ = 0.1 achieved similar averagesteady state square error. However, NLMS was faster.

LMS with µ = 0.075 and NLMS with µ = 1.0 had a similar convergencespeed. However, NLMS achieved a lower steady state average sqareerror.

Conclusion: NLMS offers better trade-offs than LMS.

The computational complexity of NLMS is slightly higher that that ofLMS.

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 6 / 15

Page 85: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

2. LMS Algorithm with Time Variable Adaptation StepHeuristics: combine benefits of two different situations:

Convergence time constant is small for large µ.

Mean-square error in steady state is low for small µ. Initial adaptation µ

is kept large, then it is monotonically reduced µ(n) = 1n+c

.

Disadvantage for non-stationary data: algorithm will not react to changesin the optimum solution, for large values of n.

Variable Step algorithm:

w(n+1) = w(n)+M(n)u(n)e(n)

where M(n) =

��������

µ0(n) 0 · · · 00 µ1(n) · · · 00 0 · · · 00 0 · · · µ

M�1(n)

��������.

Each filter parameter w

i

()n is updated using an independent adaptationstep µ

i

(n).

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 7 / 15

Page 86: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Comparison of LMS and variable size LMS µ = µ0.01n+c

with c = [10;20;50]

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 8 / 15

Page 87: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

3. Sign algorithmsIn high speed communication the time is critical, thus faster adaptationprocesses is needed.

sgn(a) =

8<

:

1; a> 00; a= 0�1; a< 0

The Sign algorithm (other names: pilot LMS, or Sign Error)

w(n+1) = w(n)+µu(n) sgn(e(n)).

The Clipped LMS (or Signed Regressor)

w(n+1) = w(n)+µ sgn(u(n))e(n).

The Zero forcing LMS (or Sign Sign)

w(n+1) = w(n)+µ sgn(u(n)) sgn(e(n)).

The Sign algorithm can be derived as a LMS algorithm for minimizing the Meanabsolute error (MAE) criterion

J(w) = E [|e(n)|] = E [|d(n)�w

T

u(n)|].

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 9 / 15

Page 88: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Properties of sign algorithms

Fast computation: if µ is constrained to the form µ = 2m, only shiftingand addition operations are required.

Drawback: the update mechanism is degraded, compared to LMSalgorithm, by the crude quantization of gradient estimates.

The steady state error will increaseThe convergence rate decreases

The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for32000 bps system.

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 10 / 15

Page 89: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Comparison of LMS and Sign LMS

Sign LMS algorithm should be operated at smaller step-sizes to get a similarbehavior as standard LMS algorithm.

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 11 / 15

Page 90: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

4. Linear smoothing of LMS gradient estimates

Lowpass filtering the noisy gradient Rename the noisy gradientg(n) = —

w

J

g(n) = —w

J = 2u(n)e(n)

g

i

(n) =�2e(n)u(n� i).

Passing the signals g

i

(n) through low pass filters will prevent the largefluctuations of direction during adaptation process.

b

i

(n) = LPF (gi

(n)).

The updating process will use the filtered noisy gradient

w(n+1) = w(n)�µb(n).

The following versions are well known:Averaged LMS algorithm LPF is the filter with impulse responseh(m) = 1

N

m = 0,1, . . . ,N �1.

w(n+1) = w(n)+µN

n

Âj=n�N+1

e(j)u(j).

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 12 / 15

Page 91: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

Momentum LMS algorithm LPF is an IIR filter of first orderh(0) = 1� g,h(1) = gh(0),h(2) = g2

h(0), . . . then,

b

i

(n) = LPF (gi

(n)) = gb

i

(n�1)+(1� g)gi

(n)

b(n) = gb(n�1)+(1� g)g(n)

The resulting algorithm can be written as a second order recursion:

w(n+1) = w(n)�µb(n)

gw(n) = gw(n�1)� gµb(n�1)w(n+1)� gw(n) = w(n)� gw(n�1)�µb(n)+ gµb(n�1)

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 13 / 15

Page 92: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

w(n+1)� gw(n) = w(n)� gw(n�1)�µb(n)+ gµb(n�1)w(n+1) = w(n)+ g(w(n)�w(n�1))�µ(b(n)� gb(n�1))w(n+1) = w(n)+ g(w(n)�w(n�1))�µ(1� g)g(n)w(n+1) = w(n)+ g(w(n)�w(n�1))+2µ(1� g)e(n)u(n)

w(n+1)�w(n) = g(w(n)�w(n�1))+ µ(1� g)e(n)u(n)

Drawback: The convergence rate may decrease.Advantages: The momentum term keeps the algorithm active even in theregions close to minimum.

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 14 / 15

Page 93: ELEG-636: Statistical Signal Processingarce/files/Courses/StatSP/...ELEG-636: Statistical Signal Processing Gonzalo R. Arce Department of Electrical and Computer Engineering University

5. Nonlinear smoothing of LMS gradient estimates

Impulsive interference in either d(n) or u(n), drastically degrades LMSperformance.

Smooth noisy gradient components using a nonlinear filter.

The Median LMS AlgorithmThe adaptation equation can be implemented as

w

i

(n+1) = w

i

(n)�µ ·med((e(n)u(n� i)),(e(n�1)u(n�1� i)), . . . ,

(e(n�N)u(n�N � i)))

Smoothing effect in impulsive noise environment is very strong.If the environment is not impulsive, the performances of MedianLMS are comparable with those of LMS.Convergence rate is slower than in LMS.

(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 15 / 15