unit review & exam preparation m513 – advanced dsp techniques
TRANSCRIPT
Unit Review&
Exam Preparation
M513 – Advanced DSP Techniques
M513 – Main Topics Covered in 2011/2012 ay
1. Review of DSP Basics
2. Random Signal Processing
3. Optimal and Adaptive Filters
4. (PSD) Spectrum Estimation Techniques
(exam questions will mainly come from parts 2 and 3 but good knowledge of part 1 is needed !!!)
Part 1 – Review of DSP Basics
DSP = Digital Signal Processing =
Signal Analysis + Signal Processing
… performed in discrete-time domain
• Fourier Transform Family• More general transform (z-transform)• LTI Systems and Convolution• Guide to LTI Systems
Signal Analysis
• To analyse signals in time domain we can use appropriate member of Fourier Transform family
Fourier Transforms - Summary
FourierTransform
Discrete-TimeFourier
Transform
FourierSeries
DiscreteFourier
Transform
time continuous discrete
continuous
discrete
aperiodic
periodic
aperiodic periodic frequency
j tX x t e dt
1
2j tx t X e d
0jk tpx t X k e dt
0
2
2
1T
jk tp
T
X k x t e dtT
1
2j nx n X e d
j nX x n e
21
0
1 N j nkN
k
x n X k eN
21
0
N j nkN
n
X k x n e
Fourier Transforms
• Following analogies can be seen:
Periodic in time ↔ Discrete in Frequency
Aperiodic in time ↔ Continuous in Frequency
Continuous in Time ↔ Aperiodic in Frequency
Discrete in Time ↔ Periodic in Frequency
More general transforms
• Two more transforms are introduced in order to generalise Fourier transforms for both continuous and discrete-time domain signals
• To understand their region of operation it is important to recognise that both CTFT and DTFT only operated on one limited part of the whole complex plane (plane of complex values)
• CTFT operates on the frequency axis, i.e. line s=0 from the complex plane s=s+j W (i.e. s=W).
• DTFT operates on the frequency circle, i.e. curve r=1 from the complex plane z=rejw (i.e. z=ejw ).
From Laplace to Z-Transform
• Evaluate Laplace transform of sampled signal xs(t)
substitute:
' ( )ns
n
X s x nT z X z
e ssTz
From Laplace to Z-Transform• Consider again substitution we made on the previous
slide:
i.e. left half of the s-plane (s<0) maps into the interior of the unit circle in the z-plane |z|<1
e ssTz
e
Re[ ] Im[ ]
e e e e cos sin
s
s s s s
sT
j T T j T T
z
s s j s j
z j
for <0 e 1 for <0 1
cos sin 1
sT
zj
j Axis in s to z Mapping
S-planeIm (jw)
Re
s
s
Re
Z-planeIm
1
j
1
j
2s
sT
2s
Signal Processing
• Delay … signal• Scale … signal• Add … two or more samples
(from the same or different signals)
Signal Filtering Convolution
Convolution
• Gives the system input – system output relationship for LTI type systems (both DT and CT).
Systemx(t) y(t)
x(n) y(n)
Impulse Response of the System
• Let h(n) be the response of the system to d(n) impulse type input (i.e. Impulse Response of the System)
• we note this as ( )n h n
LTI Systemd(n) h(n)
Time-invariance
• For LTI system, if than
(this is so called time-invariance property of the system)
( )n h n ( )n k h n k
LTI Systemd(n-k) h(n-k)
Linearity
• Linearity implies the following system behavior:
LTI Systemx(n) y(n)
LTI Systemax(n) ay(n)
LTI Systemx1(n)+ x2(n) y1(n)+y2(n)
Linearity and Time-Invariance
• We can now combine time-invariance and linearity:
LTI SystemSd(n-k) Sh(n-k)
LTI SystemSx(k)d(n-k) Sx(k)h(n-k)
Convolution Sum
• I.e., if than• and:
• i.e. system output is the sum of lots of delayed impulse responses (i.e. responses to individual, scaled impulse signals which are making up the whole DT input signal)
• This sum is called CONVOLUTION SUM• Sometimes we use to denote convolution operation, i.e.
( )n h n ( )n k h n k
k
y n x k h n k
k
y n x k h n k x n h n
Convolution Sum for CT
• Similarly, for continuous time signals and systems (but a little bit more complicated)
• The above expression basically describes the analogue (CT) input signal x(t) as an integral (i.e. sum) of an infinite number of time-shifted and scaled impulse functions.
x t x t d
Important fact about convolution
Convolution in t domain ↔ Multiplication in f domain
but we also have
Multiplication in t domain ↔ Convolution in f domain
Discrete LTI Systems in Two Domains
h(n)x(n) y(n)
H(z)X(z) Y(z)
h(n) – impulse response, H(z) – transfer function of DT LTI system
Summary
DT:• H(z) is z Transform of the System Impulse Response -
System Transfer Function.• H(w) is Discrete Time Fourier Transform of the System
Impulse Response – System Frequency Response.
CT:• H(s) is Laplace Transform of the System Impulse
Response - System Transfer Function.• H(W) is Fourier Transform of the System Impulse
Response – System Frequency Response.
Guide to Discrete LTI Systems
h(n)
H(z) H(w)TransferFunction
FrequencyResponse
ImpulseResponse
DifferenceEquation
ZT
IZT
ZT
IZT
z=ejw
DTFT
IDTFT
Including some mathematical manipulations
Guide to Continuous LTI Systems
h(t)
H(s) H(w)TransferFunction
FrequencyResponse
ImpulseResponse
DifferentialEquation
LT
ILT
LT
ILT
s=jw
FT
IFT
Including some mathematical manipulations
Example(s) - Use guide to LTI systems to move between the various descriptions of the system.
E.g. Calculate transfer function and frequency response for the IIR filter given with the following difference equation.
0.7 1 0.3 2 6 1y n y n y n x n
1 2 1
1 2 1
1
1 2 2
2 2
0.7 0.3 6
1 0.7 0.3 6
6 6
1 0.7 0.3 0.7 0.3
6 6
1 0.7 0.3 0.7 0.3
j jj
j j j j
Y z z Y z z Y z z X z
z z Y z z X z
Y z z zG z
X z z z z z
e eG e
e e e e
Example(s) - Use guide to LTI systems to move between the various descriptions of the system.
Having obtained frequency response, system response to any frequency F1 can be easily calculated, e.g. for the ¼ of sampling frequency Fs we have:
1
2
12
2 2
6...
0.7 0.3
j
j
j j
eG F G e
e e
11 1
1
42 2 22
s
s s
FFf
F F
(note, in general this is a complex number so both phase and amplitude/gain can be calculated)
Example(s) - Use guide to LTI systems to move between the various descriptions of the system.
Opposite problem can also be easily solved, e.g. for the IIR filter with transfer function:
find the corresponding difference equation to implement the system.
2
2
0.2 0.08
0.5
z zH z
z
2 1 2
2 2
2 1 2
0.2 0.08 1 0.2 0.08
0.5 1 0.5
1 0.5 1 0.2 0.08
0.5 2 0.2 1 0.08 2
0.2 1 0.08 2 0.5 2
Y z z z z z
X z z z
Y z z X z z z
y n y n x n x n x n
y n x n x n x n y n
Part 2 – Random SignalsRandom signals – unpredictable type of
signals ( … well, more or less).
• Moments of random signals – mx, rxx(m)
• Autocorrelation ↔ PSD• Filtering Random Signals –Spectral
Factorisation Equations (in three domains)• Deconvolution and Inverse Filtering• Minimum and Non-minimum phase
systems/filters
Signal Classification• Deterministic signals
– can be characterised by mathematical equation
• Random (Nondeterministic, Stochastic) signals– can not be characterised by mathematical equation– usually characterised by their statistics
Random (Nondeterministic, Stochastic) signals can be further classified as:
• Stationary signals– if their statistics does not change with time
• Nonstationary signals – if their statistics changes with time
Signal Classification
• Wide Sense Stationary (random) signals – random signals with constant signals statistics up to a 2nd order
• Ergodic (random) signals – random signals whose statistics can be measured by Time Averaging rather then Ensemble Averaging (i.e. expectation of the ergodic signal is its average)
• For simplicity reasons, we study Wide-Sense Stationary (WSS) and Ergodic Signals
1st Order Signal Statistics
• Mean Value m of the signal x(n) is its 1st order statistics:
1lim ...
2 1
N
x Nn N
E x n x n x p x dxN
If mx is constant over time,we are talking about stationary signal x(n).
Expected value of x(n)(E – expectation operator)
Single waveform averaging,so signal is ergodic (no need for ensemble averages).
more general eq.
• Autocovariance of the signal is the 2nd order signal statistics:
• It is calculated according to:
where * denotes a complex conjugate in case of complex signals
• For equal lags, i.e. k=l, the autocovariance reduces to variance
2, ...xx x xc k l E x k x l x p x dx
2, xx xc k k
2nd Order Statistics
2nd Order Statistics
• Variance s2 of the signal is:
• Variance can be considered as some kind of measure of signal dispersion around its mean.
22 2 x x x xE x n x n E x n
Analogies to Electrical signals
• Two zero-mean signals, with different variances
0 5 10 15 20 25 30 35 40 45 50-1.5
-1
-0.5
0
0.5
1
1.5
mx – mean – DC component of electrical signal
mx2 – mean squared – DC
powerE[x2(n)] – mean square –
total average powersx
2 – variance – AC power
sx – standard deviation – rms
value
Autocorrelation
• This is also a 2nd order statistics of signal and is very similar (in some cases identical) to autocovariance
• Autocorrelation of the signal is basically a product of signal and its shifted version:
where m=l-k.
• Autocorrelation is a measure of signal predictability – correlated signal is one that has redundancy (it is compressible, e.g. speech, audio or video signals)
1, lim
2 1
N
xxN
k N
r k l E x k x l x k x k mN
Autocorrelation
• for m=l-k we sometimes use notation rxx(m) or even rx(m) instead of rxx(k,l), i.e.
1, lim
2 1
N
xxN
k N
r k l E x k x l x k x k mN
1lim
2 1
x xx
N
Nk N
r m r m E x k x l
E x k x k m x k x k mN
Autocorrelation and Autocovariance
• For zero mean, stationary signals those two quantities are identical:
where mx = 0
• Also notice that the variance of the zero-mean signal then corresponds to zero-lag autocorrelation:
,
( ) ( )
xx xx x x
xx x
c m c k l E x k x l
E x k x l E x k x k m r m r m
2, 0 0xx xx xx xc k k c r
Autocorrelation
• Two important autocorrelation properties:
• rxx(0) is essentially a signal power so it must be larger than any other autocorrelation of that signal (other way of looking at this property of the autocorrelation is to realise that the sample is best correlated with itself).
0
xx xx
xx xx
r k r k
r r k
Example (Tutorial 1, Problems 2, 3)
Random phase family of sinusoids:
A and wk are fixed constants and q is a uniformly distributed random variable (i.e. equally likely to have any value in the interval –p to p).
Prove the stationarity of this process, i.e.
a) Find the mean and variance values (should be const.)
b) Find the autocorrelation function (ACF) (should only depend on time lag value m, otherwise const.)
cos kx n A n
Example (Tutorial 1, Problems 2, 3)
1cos cos sin sin
2
cos sincos sin
2 2
cos sin0 0
2 20 .
x
k k
k k
k k
E x x p d
n n d
n nd d
n n
const
Example (Tutorial 1, Problems 2, 3)
2 2 22
2
2
0
1cos
2
1 11 cos 2
2 2
1 1 1 1cos 2
2 2 2 2
1 1 12 0
2 2 21
.2
x x
k
k
k
E x E x E x
x p d
n d
n d
d n d
const
Example (Tutorial 1, Problems 2, 3)
Random phase family of sinusoids:
A and wk are fixed constants and q is a uniformly distributed random variable (i.e. equally likely to have any value in the interval –p to p).
Discuss two approaches to calculate ACF for this process.
We can use:
or we can go via:
cos kx n A n
2
DTFT IDTFTxx xxx n X R r m
,
cos cos
x xx xx
k k
r m r m r n n m E x n x n m
E n n m
Power Spectral Density (PSD)
• Power Spectral Density is the Discrete Time Fourier Transform of the autocorrelation function
• PSD contains power info (i.e. distribution of signal power across the frequency spectrum) but has no phase information (PSD is “phase blind”).
• PSD is always real and non-negative.
j j mxx xx xx
m
R e R r m e
White Noise• White noise signal has a perfectly flat power spectrum
(equal to the variance of the signal s2).
• Autocorrelation of white noise is a unit impulse with amplitude sw
2 – white noise is perfectly uncorrelated signal
(not realisable in practice, we usually use pseudorandom noise with PSD almost flat over a finite frequency range)
w m
2jwwR e
2 wwr m m
Rww rww
DFT
IDFT
Filtering Random Signals
• Filter scales the mean value of the input signal.• In time domain the scaling value is the sum of the
impulse response.• In frequency domain the scaling value is the frequency
response of the filter at w=0.
0
jy xH e
y xk
h k DIGITALFILTER
mx my
TimeDomain
FrequencyDomain
Filtering Random Signals
• Cross-correlation between the filter input and output signals:
,
...
yx
xxl
r n k n E y n k x n
h l r k l
yx xxr k r k h k or
Filtering Random Signals
• Autocorrelation of the filter output:
,
...
yy
yxl
r n k n E y n k y n
h n l r n k l
yy yx yxl
r k h m r m k r k h k
using m n l
Filtering Random Signals
• The autocorrelation of the filter output therefore depends only on k, difference between the indices n+k and n, i.e.:
• Combining with:
we have: yy xxr k r k h k h k
yy yxr k r k h k
yx xxr k r k h k
Spectral Factorisation Equations
y xr k r k h k h k
h(k)rx
h*(-m)ryx ry
2j j j j j j
y x xR e H e H e R e H e R e
Taking the DTFT of above equation:
* 1y xR z R z H z H z Taking the ZT of above equation:
Filtering Random Signals
• In terms of z-transform:
• If H(z) is real, H(z)=H*(z*) so:
• This is a special case of spectral factorisation.
1
y xR z H z H R zz
1
y xR z H z H R zz
Example – Tutorial 2, Problem 2
A zero mean white noise signal x(n) is applied to an FIR filter with impulse response sequence {0.5, 0, 0.75}. Derive an expression for the PSD of the signal at the output of the filter.
1
2 2 2
2 2 2
2 2 2
2
0.5 0.75 0.5 0.75
0.25 0.375 0.375 0.5625
0.375 0.8125 0.375
2 0 2
yy xx
x
x
x
yy yy yy x
R z H z H z R z
z z
z z
z z
r r r
2 2 22 0.375 ; 0 0.8125 ; 2 0.375yy x yy x yy xr r r
2 2 20 0.8125y yy x xr
2xx xr m m 2
xx xR z
Example - What about IIR type filter and coloured noise signal? (not in Tutorial handouts)
An IIR filter described by the following difference equation:
Is used to process WSS signal with PSD:
Find the PSD of the filter output.
Solution – PSD of the output is required, so spectral factorisation equation in W domain can be used.
1
1 0.5cosj
xR e
0.4 1 2.5 1y n y n x n x n
j j j jy xR e H e H e R e
Example - What about IIR type filter and coloured noise signal? (not in Tutorial handouts)
First calculate the transfer function of the filter, then find the frequency response:
… then apply spectral factorisation:
1 2.5 1 2.5 1
1 0.4 1 0.4 1 0.5cos
j jj j j j
y x j j
e eR e H e H e R e
e e
1 1
1 1
1
1
0.4 1 2.5 1
0.4 2.5
1 0.4 1 2.5
1 2.5
1 0.4
1 2.5
1 0.4
jj
j
y n y n x n x n
Y z z Y z X z z Y z
Y z z X z z
Y z zH z
X z z
eH e
e
Inverse Filters
• Consider the deconvolution problem as represented on the figure below:
• Our task is to design a filter which will reconstruct or reconstitute the original input x(n) from the observed output y(n).
• The reconstruction may have the arbitrary gain A and a delay of d, hence the definition of the required inverse filter is:
H(z)x(n)
H-1(z)y(n) Ax(n-d)
1 dH z H z Az
Inverse Filters• The inverse system is said to ‘equalise’ the amplitude
and phase response of H(z), or to ‘deconvolve’ the output y(n) in order to reconstruct the input x(n).
H(z)x(n)
H-1(z)y(n) Ax(n-d)
Inverse Filters - Problem
• If H(z) is a non-minimum phase system, the zeros outside the unit circle become poles outside the unit circle and the inverse filter is unstable !!!
H(z)x(n)
H-1(z)y(n) Ax(n-d)
Noise Whitening
• With inverse filtering x(n) does not have to be a white random sequence but the inverse filter H0
-1(z) has to produce the same sequence x(n).
• For noise whitening, input x(n) has to be a white random sequence as well as output of H0
-1(z), u(n) but sequences x(n) and u(n) are not the same.
H0(z)x(n)
H0-1(z)
y(n) x(n)
H1(z)x(n)
H0-1(z)
y(n) u(n)
Inverse filtering
Noise whitening
Deconvolution using autocorrelation
• Consider the filtering process again:
• The matrix equation to be solved in order to estimate the impulse response h(k) before attempting the deconvolution is given on the next slide:
yx xr k r k h kh(k)x(n) y(n)
Deconvolution using autocorrelation
• using matrix notation:
• The aim is to obtain coefficients b(0), b(1), …, b(L) which can be done by inverting the matrix Rxx.
• This matrix is known as autocorrelation matrix and can be used as important 2nd order characterisation of random signal.
yx xx xx xx xx
yx xx xx xx xx
xx xx xx xxyx
r 0 r 0 r 1 r 2 ... r 1 L b 0
r 1 r 1 r 0 r 1 ... r 2 L b 1
. . . ... . ......r L r L 1 r L 2 ... r 0 b Lr L
yx xr R b
Deconvolution using autocorrelation
• Solution of the equation from the previous slide:
is obviously given with:
• Important to note is the structure of autocorrelation matrix Rx.
1 x yxb R r
yx xr R b
Toeplitz matrices
• If Ai,j is an element of the matrix in i-th row and j-th column then Ai,j=ai-j.
• Another important DSP operation – convolution also has strong relation to Toeplitz type matrix, called convolution matrix.
0 1 2 n 1
1 0 1 n 2
2 1 0 n 3
n 1 n 2 n 3 0
a a a ... a
a a a ... a
A a a a ... a
... ... ... ... ...
a a a ... a
Convolution matrix
• To form the convolution matrix we need to represent the convolution operation in vector form.
• For example, the output of the FIR filter of length N can be written as:
where x is a vector of input samples to a filter and h is a vector of filter coefficients (impulse response in case of FIR filter)
• The above equation represents the case where x(n)=0 for n<0.
1
T Th x=x h
N
k
y n h k x n k
Decomposition of Autocorrelation Matrix
• The Toeplitz structure of autocorrelation matrix is used in diagonalisation of the autocorrelation matrix – this is the important process of decomposition of the autocorrelation matrix in the form:
here: L – diagonal matrix containing the eigenvalues of R
Q – modal matrix containing the eigenvectors of associated with eigenvalues in L
-1 TR Q Q Q Q
White noise
• What would be the form of the autocorrelation matrix for the case of white noise signal?
• Assuming ideal white noise sequence, i.e. perfectly uncorrelated signal, its autocorrelation is a unit impulse with amplitude sw
2.
• The autocorrelation matrix is in this case diagonal (all non-diagonal elements are zero).
m
rww
More on minimum and non-minimum phase systems
• Non-minimum phase systems (sometimes also called mixed-phase systems) are the systems with some of its zeros inside the unit circle and the remaining zeros outside the unit circle.
• If all of its zeros are outside the unit circle, non-minimum phase system is called a maximum phase system.
• The minimum, non-minimum and maximum phase systems can also be recognised by their phase characteristics.
• The phase characteristic of the minimum phase system has a zero net phase change between w=0 and w=p frequencies, while the non-minimum phase system has non-zero phase change between those frequencies.
More on minimum and non-minimum phase systems
• Maximum phase system has the maximum phase change between w=0 and w=p frequencies amongst all possible systems with the same amplitude response.
Example (Tutorial 2, Problem 4)
A zero-mean stationary white noise x(n) is applied to a filter with a transfer function:
Find all filters that can produce the same PSD as the above filter. Are those filters minimum or maximum phase filters?
Using spectral factorisation equation
2
0.5 3z zH z
z
1 1
1 22 2
0.5 30.5 3yy xx x
z zz zS z H z H z S z
z z
Example (Tutorial 2, Problem 4)
1 1
1 22 2
0.5 30.5 3yy xx x
z zz zS z H z H z S z
z z
1 1
2 1 20 02 2
0.5 3 0.5 3yy x x
z z z zS z H z H z
z z
Same PSD would be obtained using filter H0(z):
H0(z) has two zeros 0.5 and 1/3 = 0.333333, both inside the unit circlei.e. this is a minimum phase filter
1
2 2
0.5 3z zH z
z
has poles at z=2 and z=3, both outside the unit circlei.e. this is a maximum phase filter
Part 3
Optimal and Adaptive Digital Filters
Part 3 – Optimal and Adaptive Digital Filters
Best filters for the task in hand
• Wiener Filter and Equation• Finding the minimum of the cost function
(MSE) = MMSE• Steepest Descent algorithm• LMS and RLS algorithms• Optimal and Adaptive Filter Configurations
(i.e. applications)
Optimal and Adaptive Filters• Optimal filters are the “best” filters for the particular
task. We use knowledge about the signals to design those filters and apply them to the task.
• Adaptive Filters change their coefficients to improve their performance for the given task. They are not fixed and can therefore change their (statistical) properties over time.
• Adaptive Filters may not be optimal but are constantly striving to become optimal
Optimal (Wiener) Filter Design
• System Identification problem:
• We want to estimate the impulse response h(n) of the “unknown” discrete-time system.
• We can use the equation for the cross-correlation between the filter input and output to obtain the estimate for h(n).
h(n) = ?x(n) y(n)
Optimal (Wiener) Filter Design
1. convolution form:
2. matrix form:
3. matrix form in short-notation:
yx xr k r k h k
yx xx xx xx xx
yx xx xx xx xx
xx xx xx xxyx
ˆr 0 h 0r 0 r 1 r 2 ... r 1 Lˆr 1 r 1 r 0 r 0 ... r 2 L h 1
. . . ... . ......r L 1 r L 2 r L 3 ... r 0 ˆr L 1 h L 1
ˆyx xx R R h
Optimal (Wiener) Filter Design
3. matrix form in short-notation:
– cross-correlation vector
(between input and output signals)
– autocorrelation matrix (of input signal)
– estimated impulse response vector
From the above equation we can easily obtain vector
ˆyx xx R R h
1ˆxx yx h R R
yxR
xxR
h
h
Optimal (Wiener) Filter Design
• Equation
is also known as Wiener-Hopf equation
• Using this equation we have actually estimated (or designed) a filter with the impulse response close (or equal) to the impulse response of the unknown system.
• This type of optimal filter is also known as Wiener filter.
1ˆxx yx h R R
Optimal (Wiener) Filter Design
• We can approach the problem of designing the Wiener filter estimate of the unknown system in a slightly different way. Consider a block diagram given below:
• A good estimate of the unknown filter impulse response h(n) can be obtained, if the difference/error signal between two outputs (real and estimated system) is minimal (ideally zero).
Sx(n) y(n) e(n)
-+
d(n)
h n
h n
Optimal (Wiener) Filter Design
• We use the following notation:d(n) – output of the unknown system (desired signal)y(n) – output of the system estimatex(n) – input signal (same for both systems)e(n) – error signal, e(n)= d(n) – y(n)
• For e(n)→0, we expect to achieve a good estimate of the unknown system, i.e.:
Sx(n) y(n) e(n)
-+
d(n)
h n
h n
h n h n
Optimal (Wiener) Filter Design
• Wiener filter design is actually a much more general problem
• desired signal d(n) does not have to be the output of the unknown system
Sx(n) y(n) e(n)
-+
d(n)
h n
Optimal (Wiener) Filter Design
• Another Wiener filter estimation example:
Optimal Filter Sx(n) y(n)
d(n)
-+
SignalDistorting System
d(n) +S
w(n)
+
h(n)e(n)
w(n) – noise signal
Task: Design (determine) h(n) in order to minimise error e(n) !
Optimal (Wiener) Filter Design
2J E e n
• Rather than minimising current value of the error signal - e(n), we can choose more effective approach – minimise the expected value of the square error – mean square error (MSE) function.
• Function to be minimised (cost function) is therefore MSE function defined as:
Sx(n) y(n) e(n)
-+
d(n)
h n
Mathematical Analysis
1
0
( ) ( ) ( )N
k
y n h k x n k
( ) ( ) ( )e n d n y n
Filter output
Error signal
Sx(n) y(n) e(n)
-+
d(n)
h n
22
21
0
( ) ( ) ( )
( ) ( ) ( )N
k
J E e n E d n y n
E d n h k x n k
MSE (cost) function
We can try to minimise this expression or switch to matrix/vector notation
Mathematical Analysis
1
0
( ) ( ) ( )N
k
y n h k x n k
( )
( 1)( )
( 1)
x n
x nn
x n N
x
(0)
(1)
( 1)
h
h
h N
h
T T( ) ( ) ( )y n n n h x x h
Using vector notation:
22( ) ( ) ( )J E e n E d n y n
Mathematical Analysis
2 2
T 2
2 T T T
2 T T T
T Td
[ ( )] [[ ( ) ( )] ]
[[ ( ) ( )] ]
[ ( ) 2 ( ) ( ) ( ) ( ) ]
[ ( )] 2 [ ( ) ( )] [ ( ) ( ) ] ]
P -2 +
E e n E d n y n
E d n n
E d n d n n n n
E d n E d n n E n n
dx xx
h x
h x h x x h
h x h x x h
h R h R h
T[ ( ) ( ) ]E n n xxR x x
2dP = [ ( )]E d n
[ ( ) ( )]E d n n dxR x Cross-correlation vector
Autocorrelation matrix
Scalar
Mathematical Analysis
• To find the minimum error take the derivative with respect to the coefficients, h(k), and set equal to zero.
2[ ( )]2 2 0
( ) opt
E e n
h k
dx xxR R h
• Solving for h:
Wiener-Hopf Equation … again
• Wiener-Hopf equation therefore determines the set of optimal filter coefficients in the mean-square sense.
1opt
xx dxh R R
Example (Tutorial 3, Problems 1 and 2)
Derive the Wiener-Hopf equation for the Wiener FIR filter working as a noise canceller.
Detailed derivation of Wiener-Hopf equation is shown in Tutorial (3); for ANC application, we can start from
1 1
1 2 1 20 0
ˆ ˆN N
k k
e n d n x n v n x n w k v n k d n v n w k v n k
Example – Tutorial 3, Problems 1 and 2
Derivation of the Wiener-Hopf equation for the FIR noise canceller.
21
2
1 20 0 0
N
pn n k
e n d n v n w k v n k
2
0 0
2 2 0p
n n
e ne n e n v n k
w k w k
2 1 2 2 2
1
2 1 2 20 0 0
1
1 2 2 20 0 0
1
0
0
N
n n l
N
n k n
N
dv v v v vk
e n v n k d n v n w k v n l v n k
d n v n v n k w k v n l v n k
r k r k w k r l k
Example – Tutorial 3, Problems 1 and 2
Derivation of the Wiener-Hopf equation for the FIR noise canceller.
1 2 2 2
1
0
N
v v v vk
r k w k r l k
or in matrix/vector form:
1 2 2v v vr R w2 1 2
1opt v v v
w R r
MSE Surface• E[e2(n)] represents the
expected value of squared filter error e(n), i.e. mean-square error (MSE).
• For the N coefficients filter this is an N dimensional surface with Wiener-Hopf solution positioned at the bottom of this surface (i.e. this is the minimum error point)
• We can plot it for the case of 2-coefficient filter (more than that - impossible to draw in 2D).
Mean Square Error Surface Example for 2 weights Wiener filter
0 2 4 6 8 10 12 14010
20
0
1
2
3
4
5
6
MMSE – Wiener optimum
MMSE• Once the coefficients of the Wiener filter (i.e. coordinates
of the MMSE point) are known, the actual MMSE value is easy to calculate – we need to evaluate J(n) for h=hopt.
min d
d
d
d
d
d
P -2 +
P -2 +
P -2 +
P -2 +
P -2 +
P -
J J
opt
T Topt dx opt xx opth=h
T T-1 -1 -1xx dx dx xx dx xx xx dx
T T-1 -1 -1xx dx dx xx dx xx xx dx
T T-1 -1xx dx dx xx dx dx
T T-1 -1xx dx dx xx dx dx
T-1xx dx d
h R h R h
R R R R R R R R
R R R R R R R R
R R R R R IR
R R R R R R
R R R
dP -
x
Topt dxh R
Example (Tutorial 3, Problems 3 and 4)
• Alternative derivation of the MMSE equation is shown in Tutorial 3, Problem 3
• Use of both Wiener-Hopf and MMSE equations is demonstrated in Tutorial 3, Problem 4
Two-coefficient Wiener filter is used to filter zero-mean, unit variance noisy signal, v(n) uncorrelated with the desired signal d(n). Find: rdx, optimal solution (Wiener-Hopf) wopt and MMSE Jmin assuming:
0.6mdr m vr m m
Example (Tutorial 3, Problems 3 and 4)
0 1
1 0.6
dx
dd
d
m E d n x n m
E d n d n m E d n v n m
rm
r
r
r
Example (Tutorial 3, Problems 3 and 4)
1
1
1
1
1
12
2
( ) ( )
0 1 00
1 0 10
1 0.6 1 0
0.6 1 0 1
opt x dx
dx
dx
dx
d v dx
d d dv
d d dv
E x n x n m
E d n v n d n m v n m
E d n d n m E d n v n m E v n d n m E v n v n m
+
r r r
r r r
w R r
r
r
r
R R r
1 11 2 0.6 1 0.549 0.165 1 0.451
0.6 0.6 2 0.6 0.165 0.549 0.6 0.165
Example (Tutorial 3, Problems 3 and 4)
2min
0
0.4511 1 0.6 1 0.549 0.451
0.165
Tdx opt
Td dx opt
J E d n
r
r w
r w
MMSE
MMSE
• Another very important observation can be made after rearranging the basic equation for the error signal:
( ) ( ) ( ) ( )e n d n y n d n n Th x
( ) ( )
( )
n e n n d n n n
n d n n n
T
T
x x x h x
x x x h
( ) ( )
( )
E n e n E n d n n n
E n d n E n n
T
T
dx xx
x x x x h
x x x h
R - R h
The Steepest Descent Algorithm
• The Steepest Descent method iteratively estimates the solution to the Weiner-Hopf equation using a method called gradient descent.
• This minimization method finds a minima by estimating the gradient of the MSE surface and forcing the step in the opposite direction of the gradient.
• The basic equation in gradient descent is:
21 h E e n n nh h
Step size parameter Gradient vector that makes hn+1(k) approach hopt
The Steepest Descent Algorithm
• Notice that the expression for the gradient has already been obtained in the process of calculating Wiener filter coefficients, i.e.:
• This is a significant improvement in our search for more efficient solutions – coefficients are now determined iteratively and no inverse of the autocorrelation matrix is needed.
2[ ( )]2 2
( )h
E e n
h k
dx xxR R h
21 h E e n n nh h
The Steepest Descent Algorithm
• We still need to estimate autocorrelation matrix Rxx and crosscorrelation vector Rdx (for every iteration step !!!)
• Further simplification of the algorithm can be achieved by using the instantaneous estimates of Rxx and Rdx.
2[ ( )]2 2
( )h
E e n
h k
dx xxR R h
2 Tn+1 n xx n dxh h R h R
E n n n n T T
xxR x x x x
E n d n n d n dxR x x
The LMS (Least Mean Squares) Algorithm for Adaptive Filtering
2
2
2
2
n n n d n
n n d n
n e n
n+1 n xx n dx
T
n n
T
n n
n
h h R h R
h x x h x
h x x h
h x
Example – Tutorial 4, Problem 3
4 coefficients LMS based FIR adaptive filter working in the system identification application trying to identify system with transfer function:
Write the equations for the signals d(n) and e(n) and the update equation for each adaptive filter coefficient, i.e. w1(n)… w4(n).
1
1
1.25 0.35
1 0.5
zH z
z
Example – Tutorial 4, Problem 3
1
1 2
1.5 0.3
1 0.5 0.25
D z zG z
X z z z
1 2 11 0.5 0.25 1.5 0.3D z z z X z z
1 1 21.5 0.3 0.5 0.25D z X z z D z z z
( ) 1.5 ( ) 0.3 ( 1) 0.5 ( 1) 0.25 ( 2)d n x n x n d n d n
( ) (0) ( ) (1) ( 1) (2) ( 2) (3) ( 3)y n w x n w x n w x n w x n
( ) ( ) ( )e n d n y n
weights update equations: ( ) ( ) 2 ( ) ( )w i w i e n x n i i=0, 1, 2, 3
Applications
• Before looking into details of Matlab implementation of LMS update algorithm, some practical applications for adaptive filters are considered first.
• Those are:– System Identification– Inverse System Estimation– Adaptive Noise Cancellation– Linear Prediction
Applications: System Identification
Definitions of signals:x(n) – input applied to unknown system and adaptive filtery(n) – filter outputd(n) – system (desired) outpute(n) – estimation error
Digital Filter
Adaptive Algorithm
Sx(n) y(n)
e(n)h(n)
-+
Unknown System
d(n)
Identifying the response of the unknown system.
Applications: Inverse Estimation
Definitions of signals:x(n) – input applied to system y(n) – filter outputd(n) – desired outpute(n) – estimation error
Digital Filter
Adaptive Algorithm
Sx(n) y(n)
e(n)h(n)
-+
Delay
Systemx(n)
d(n)
Estimating the inverse of the system.
Delay block ensures the causality of the estimated inverse
Applications: Noise Cancellation
Definitions of signals:x(n) – noise (so called reference signal)y(n) – noise estimated(n) – signal + noisee(n) – signal estimate
Digital Filter
Adaptive Algorithm
Sx(n) y(n)
e(n)
d(n)=s(n)+n(n)
h(n)
-+
Signalsource
Noisesource
Removing background noise from the useful signals
Applications: Linear Predictor
Definitions of signals:x(n) – signal to be predictedy(n) – filter output (signal prediction)d(n) – desired outpute(n) – estimation error
Digital Filter
Adaptive Algorithm
Sx(n) y(n)
e(n)h(n)
-+
Delayx(n)
d(n)
AR Process
Estimating the future samples of the signal.
Applications: Linear Predictor
• Assuming that the signal x(n):• is periodic • Is steady or varies slowly over time
• the adaptive filter will can be used to predict the future values of the desired signal based on past values.
• When x(n) is periodic and the filter is long enough to remember previous values, this structure with the delay in the input signal, can perform the prediction.
• This configuration can also be used to remove a periodic signal from stochastic noise signals.
Example
• Have a look into Tutorials 3 and 4 for examples of each discussed configuration.
Adaptive LMS Algorithm Implementation
• LMS algorithm can easily be implemented in software. Main steps of this algorithm are:
1. Read in the next sample, x(n), and perform
the filtering operation with the current version of the coefficients.
2. Take the computed output and compare it with the expected output, i.e. calculate the error.
3. Update the coefficients (obtain the next set of coefficients) using the following computation.
• This algorithm is performed in a loop so that with each new sample, a new coefficient vector, hn+1(k) is created.
• In this way, the filter coefficients change and adapt.
1
0
( ) ( ) ( )N
nk
y n h k x n k
( ) ( ) ( )e n d n y n
1( ) ( ) ( ) ( ) n nh k h k e n x n k
Adaptive LMS Algorithm Implementation
• Before the LMS algorithm “kicks in” we also need to initialise filter coefficients; the safest option is to initialise them all to zero.
1
0
( ) ( ) ( )N
nk
y n h k x n k
( ) ( ) ( )e n d n y n
1( ) ( ) ( ) ( ) n nh k h k e n x n k
(0) 0 for k=0...N-1kh
Other applications of adaptive filters
• PSD estimation of observed signal• Foetal ECG monitoring – cancelling of maternal ECG• Removal of mains interference in medical signals• Radar signal processing
– Background noise removal– RX-TX crosstalk reduction– Adaptive jammer suppression
• Separation of the speech from the background noise• Echo cancellation for speaker phones• Beamforming
Part 4
PSD Estimation and Signal Modelling Techniques
Part 4 – PSD Estimation
There are more ways to find PSD of the signal
• Nonparametric techniques (periodogram and correlogram)
• Parametric Techniques (AR, MA and ARMA models)
• Yule-Walker Equations and Signal Predictors
Approaches to PSD estimation
• Classical, Non-parametric Techniques – based on Fourier Transform– robust, require no previous knowledge about the data– assume zero data values outside the data window -results are
distorted, resolution can be low– not suitable for short data records
• Modern, Parametric Techniques – include a priori model information concerning the spectrum to be estimated.
• Modern, Non-parametric methods – use singular value decomposition (SVD) of the signal to separate correlated and uncorrelated signal components for an easier analysis
Non-parametric PSD estimation techniques (DFT/FFT based)
• PSD is estimated directly from the signal itself, with no previous knowledge about signal
• Periodogram:– Based on the following formula
• Correlogram:– Based on the following formula
21 2
0
1 1Nj j n j
xxn
R e x n e X eN N
1
1
ˆN
j j mxx xx
m N
R e r m e
- estimate of the autocorrelation of signal x(n)xxr
Periodogram and Correlogram
• note that since
results obtained with those two estimators should coincide
• variations on the basic periodogram approach are usually used in practice
2*j j j j j
xx
X e X e X e X e X e
DTFT x k x k DTFT r m
(note - not a strict mathematical derivation)
Blackman-Tukey method
• Since correlation function at its extreme lag values is not reliable (less data points enter the computation) it is recommended to use lag values of about 30%-40% of the total length of the data
• Blackman-Tukey is windowed correlogram given by:
w(n) is the window with zero values for |m|>L-1• also, L<<N
1
1
ˆL
j j mBT xx
m L
R e w m r m e
Bartlett Method
• This is an improved periodogram method (note that previously discussed, Blackman-Tukey is a correlogram method)
• Bartlett’s method reduces the fluctuation of the periodogram by splitting up the available data of N observations into K=N/L subsections of L observations each.
• Spectral densities of produced K periodograms are then averaged.
Bartlett Method
L samples L samples L samples…
periodogram 1
periodogram 2
periodogram K
+
…
=
Total/K PSD Estimate
segment 1 segment 2 segment K…
Welch Method
• Welch proposed further modification to Bartlett method and introduced overlapped and windowed data segments defined as:
where: w(n) - window of length M
D - offset distance
K - number of sections that the sequence x(n) is divided into
0 1, 0 i K-1ix n x iD n w n n M
Welch Method
• i-th periodogram is
• averaged periodogram is
21
0
1 Lj j n
i in
R e x n eL
1
0
1 Lj j
in
R e R eK
Welch Method
segment 1
segment 2
segment K
periodogram 1
periodogram 2
periodogram K
+
…
=
Total/K PSD Estimate
dataD samples
Modified Welch Method
• Data segments taken from the data record are progressively getting longer thus introducing a better frequency resolution.
• Due to an averaging procedure periodogram variance decreases and smoother periodograms are obtained
Modified (Symmetric) Welch Method
segment 1
segment 2
segment K
periodogram 1
periodogram 2
periodogram K
+
…
=
Total/K PSD Estimate
data
Modified (Assymetric) Welch Method
segment 1
segment 2
segment K
periodogram 1
periodogram 2
periodogram K
+
…
=
Total/K PSD Estimate
data
Comparison of nonparametric PSD estimators
• We use quality factor Q to evaluate different nonparametric methods
• This is a ratio if the square if the mean of the power spectral density to its variance
var
jxx
jxx
E R eQ
R e
Comparison of nonparametric PSD estimators
Periodogram
Bartlett
Welch
Blackman-Tukey
N→∞
N,L→∞
N,L→∞, 50% overlapp
N,L→∞,triangular window
1
1.11Nf
1.39Nf
2.34Nf
Inconsisten, independent of N
Quality improves with data length
Quality improves with data length
Quality improves with data length
Method Conditions Q Comments
• f is a 3 dB main lobe of the associated windows
Parametric PSD estimation techniques (DFT based)
• use a priori model information about the spectrum to be estimated
Steps for parametric spectrum estimation
1. Select a suitable model for the procedure. This step may be based on:
- a priori knowledge of the physical mechanism that generates the random process.
- trial and error, by testing various parametric models.
(if wrong model is selected, results can be worse than when using non-parametric methods for PSD estimation)
2. Estimate the (p,q) order of the model (from the collected data and/or from a priori information).
3. Use collected data to estimate model parameters, coefficients.
Stochastic signal modelling
Deterministic signal modelling
Possible Models
b(n)x(n) y(n)
Moving Average (MA) - b coefficients only (i.e. FIR)
All Zero Model
y(n)
+
a(n)
x(n)
Autoregressive (AR) - a coefficients only
All Pole Model
+x(n) b(n)
a(n)
y(n)+
+
Most General Model:
Autoregressive Moving Average (ARMA) - a and b coefficients
(i.e. IIR)
also known as Pole-Zero Model
Model Equations
0 0
p q
k kk k
a y n k b x n k
ARMA
0
q
kk
y n b x n k
MA
0
p
kk
a y n k x n
AR
Model Equations in z-domain (i.e. Model Transfer Functions)
0 0
p q
k kk k
a y n k b x n k
ARMA
0 0
p qn kn k
k kk k
Y k a z X z b z
apply z-transform
0
0
qn k
kkp
n kk
k
b zY k
H zX z
a z
Model Equations in z-domain
MA 0
qk
kk
H z b z
AR
1
1
1p
kk
k
H za z
ARMAfor a0=1: 0
1
1
qk
kk
pk
kk
b zH z
a z
Model Equations in W-domain
MA 0
qj jk
kk
H e b e
AR 1
1
1
jp
jkk
k
H ea e
ARMAfor a0=1: 0
1
1
qjk
kj k
pjk
kk
b eH e
a e
So how do we get the signal PSD from the estimated model?
• If the white noise signal w(n) is the input to our model (i.e. x(n)=w(n)) the output signal y(n) is a WSS (wide sense stationary) signal with PSD given as:
or 1yy wwR z H z H z R z
j j j jyy wwR e H e H e R e
PSD for ARMA modelled signal
• using vector notation:
20 0
1 1
1 1
j j j jyy ww
q qjk jk
k kk k
wp pjk jk
k kk k
R e H e H e R e
b e b e
a e a e
2jyy wR e
H Hq q
H Hp p
e bb e
e aa e
PSD for ARMA modelled signal
• where H denotes Hermitian (transpose + complex conjugate) and:
2jyy wR e
H Hq q
H Hp p
e bb e
e aa e
2
1
...
j
j
jq
e
e
e
qe2
1
...
j
j
jp
e
e
e
pe
0
1
2
...
q
b
b
b
b
b1
2
1
...
p
a
a
a
a
PSD for AR & MA modelled signals
• Similarly, for AR modelled signals we have:
• and for MA models:
2 2
1 1
1 1 1
1 1
jyy w wp p
jk jkk k
k k
R ea e a e
H Hp pe aa e
2 2
0 0
q qj jk jk j
yy k k w yy wk k
R e b e b e R e
H Hq qe bb e
Statistical Signal ModellingYule-Walker Equations
• In statistical signal modelling problem of determining the model coefficients boils down to solving a set of nonlinear equations called Yule Walker equations.
• Next couple of slides show how those equations are obtained starting from the general expression for ARMA model
• Assuming a0=1 we have:
0 1 0
p p q
k k kk k k
a y n k y n a y n k b x n k
Yule-Walker Equations
• Multiplying both sides of the ARMA equation with y(n-i) and taking the expectation we have:
• Since both x(n) and y(n) are jointly wide sense stationary processes, we can rewritte the last part of this equation using the following reasoning:
1 0
p q
y k y kk k
r i a r k i b E x n k y n i
1 0
p q
k kk k
E y n y n i E a y n k y n i E b x n k y n i
Yule-Walker Equations
• i.e.
2
2
,
m
m
x
x
x y
E x n k y n i E x n k x m h n i m
E x n k x m h n i m
n k h n i m
h k i
r k i
2
1 0
p q
y k y x kk k
r i a r k i b h k i
Yule-Walker Equations
• Further, for causal h(n) we can obtain the standard form of Yule-Walker equations
• Introducing
we have:
2 2
1 0
p q q i
y k y x k x k ik k i k
r i a r k i b h k i b h k
0
q q i
k k k ik i k
c b h k i b h k
2
1
for 0
0 for
px k
y k yk
c k qr i a r k i
k q
Yule-Walker Equations
• in matrix form:
0
1
1
2
0 1 ...
11 0 1
... ... ...
1
...1 1 0
... .... ...
1 ... 0
y y y
y y y
y y y q
y y y
p
y y y
r r r p c
r r r p c
a
ar q r q r q p c
r q r q r q p
a
r q p r q p r q
Yule Walker Equations for AR model
0
p
kk
a y n k x n
2 2
1
p q
y k y x k xk k i
r i a r k i b h k i
• in matrix form:
1
22
10 1 ... 1
1 0 ... 0
... ......
1 0 0
y y y
y yx
y y yp
r r r pa
r ra
r p r p ra