speech processing

99
Speech Processing Homomorphic Signal Processing

Upload: naomi-ramirez

Post on 31-Dec-2015

39 views

Category:

Documents


0 download

DESCRIPTION

Speech Processing. Homomorphic Signal Processing. Outline. Principles of Homomorphic Signal Processing Details of Homomorphic Processing Variants of Homomorphic Processing Investigation of Homomorphic systems to speech analysis and synthesis. Principles of Homomorphic Processing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speech Processing

Speech Processing

Homomorphic Signal Processing

Page 2: Speech Processing

April 19, 2023 Veton Këpuska 2

Outline

Principles of Homomorphic Signal Processing

Details of Homomorphic Processing

Variants of Homomorphic Processing

Investigation of Homomorphic systems to speech analysis and synthesis

Page 3: Speech Processing

April 19, 2023 Veton Këpuska 3

Principles of Homomorphic Processing

Superposition Property of Linear Systems:

Lx1[n]

x2[n]

x[n]L(x[n])

Lx1[n]

x2[n]

a1L(x1[n])

L(x[n])

L a2L(x2[n])

nxLanxLanxanxaL

nxLnxL

nxLnxLnxnxL

22112211

2121

a1

a2

a2

a1

Page 4: Speech Processing

April 19, 2023 Veton Këpuska 4

Principles of Homomorphic Processing

Example 6.1: If signals fall in non-overlapping frequency bands

then they are separable. x[n]=x1[n]+x2[n]

X1()=ℱ{x1[n]} & X1() [0,/2],

X2()=ℱ{x2[n]} & X2() [/2, ],

y[n] = h[n] * (x1[n]+x2[n]) = h[n] * x1[n] + h[n] * x2[n]

y[n] = h[n] * x2[n] = x2[n]

0 for ∈[0,/2]

1 for ∈[/2, ]

Page 5: Speech Processing

April 19, 2023 Veton Këpuska 5

Generalized Superposition Concept that would support separation of nonlinearly

combined signals. Leads to the notion of Generalized Linear

Filtering.

Properties: H(x1[n]□x2[n])=H(x1[n])○H(x2[n]) H(c:x [n])=c◈H(x [n])

Systems that satisfy those two properties are referred to as homomorphic systems and are said to satisfy a generalized principle of superposition.

Principles of Homomorphic Processing

H()x[n]□

Input rule

: y[n]○

Output rule

Page 6: Speech Processing

April 19, 2023 Veton Këpuska 6

Principles of Homomorphic Processing

Importance of homomorphic systems for speech processing lies in their capability of transforming nonlinearly combined signals to additively combined signals so that linear filtering can be performed on them.

Homomorphic systems can be expressed as a cascade of three homomorphic sub-systems depicted in the figure below – referred to as the canonic representation:

H

D□x[n]

:+. y[n]L

+. .

+D○

○+

. ◈-1

I II III

nx ny

Page 7: Speech Processing

April 19, 2023 Veton Këpuska 7

Canonic Representation of a Homomorphic System

i. The Characteristic System: Transforms □ into add “+”

ii. The Linear System: transforms “add” into “add”

iii. The Inverse System: transforms add into ○

D□x[n]

:+.

I nx

L+. .

+ nx nyII

y[n]D○

○+

. ◈-1

III

ny

Page 8: Speech Processing

April 19, 2023 Veton Këpuska 8

Homomorphic Systems

Let the goal be removal of undesired component of the signal (e.g., noise):

Type of combination rule

System Operation

Signal & Additive noise

Linear System Linear Filtering

Signal & Multiplicative noise

Multiplicative System

Multiplicative Filtering

Signal & Convolutional Noise

Convolutional System

Convolutional Filtering

Page 9: Speech Processing

April 19, 2023 Veton Këpuska 9

Multiplicative Homomorphic Systems

Consider Homomorphic Multiplicative System depicted below:

Use D□ to convert MULT into ADD.

Use D○ to convert ADD into MULT.

Which rule (operation) transforms MULT into ADD?

M[]x[n]● ●

y[n]

-1

D●x[n]

● +y[n]L

+ +D●

●+ -1

I II III

nx ny

Page 10: Speech Processing

April 19, 2023 Veton Këpuska 10

Multiplicative Homomorphic Systems

If x[n]=x1[n]●x2[n], and x1[n]>0 & x2[n]>0 for all n

Then log(x1[n]●x2[n])=log(x1[n])+log(x2[n])

However, x[n] may not be always positive. Generalization to complex signals:

x[n]=|x[n]|ejarg(x[n])

which requires definition of complex log operator.

Page 11: Speech Processing

April 19, 2023 Veton Këpuska 11

Multiplicative Homomorphic Systems

An implementation of multiplicative Homomorphic System:

Definition: Complex log:

Complex exp.(Inverse operation)

Complex log

x[n]● +

y[n]LinearSystem

+ + Complex Exp.

●+

I II III

nx ny

nxjnxnx argloglog

nxjnxnx eee argloglog

Page 12: Speech Processing

April 19, 2023 Veton Këpuska 12

Homomorphic Systems for Convolution Consider Homomorphic System for Convolution depicted below:

Use D□ to convert “ *” into ADD. Use D○ to convert ADD into “ *” .

How to transform “ *” into ADD?

C[]x[n]* *

y[n]

D ** +

y[n]L+ +

D **+ -1

I II III

nx nyx[n]

C

Page 13: Speech Processing

April 19, 2023 Veton Këpuska 13

Homomorphic Systems for Convolution

Let x[n]=x1[n]*x2[n]

Inverse Operation

I.

З[]* ●

log[]● +

З-1[]++

zX zX

x[n] nx

D *

time “time”

III.

З[]+ +

exp[]+ ●

З-1[]*●

zY zY

D *

“time”

nyy[n]

-1

Page 14: Speech Processing

April 19, 2023 Veton Këpuska 14

Homomorphic Systems for Convolution

For x[n]=x1[n]*x2[n]:1. X(z)=X1(z)X2(z)

2. Log(X(z))=Log(X1(z)X2(z))= Log(X1(z))+Log(X2(z))Complex logarithm. This operation requires special handling because:

X(z) > 0 For complex X(z) phase is not uniquely defined (i.e., multiple of

2) X(z) has to be defined on unit circle (e.g., Z transform of a

stable sequence).

In practice operate on unit circle z=ej. Fourier Transform:

j1

jjjj

eXnx

eXjeXeXeX

ˆˆ

argloglogˆ

Page 15: Speech Processing

April 19, 2023 Veton Këpuska 15

Homomorphic Systems for Convolution

Two cases are possible in computing :1. Complex Cepstrum (CC):

2. Real Cepstrum (RC):

nx

jj eXjeXnx arglogˆ 1

jeXnc log1

Page 16: Speech Processing

April 19, 2023 Veton Këpuska 16

Homomorphic Systems for Convolution

Example 6.3 Consider a sequence x[n] consisting of a system impulse response h[n] convolved with an impulse train p[n]:

Goal is to estimate h[n]. First form canonical representation for convolution:

If D* is such that p[n] remains train of pulses, and h[n] falls between impulses then separation is possible.

h[]p[n] x[n]

k

k kPnanp x[n]=h[n]*p[n]

npnhnpDnhDnxDnx ˆˆˆ ^

^

Page 17: Speech Processing

April 19, 2023 Veton Këpuska 17

Example 6.3 (cont.)

Let L denote such operation (i.e., rectangular window that would separate p[n] from h[n]).

nhnpLnhLnpnhLnxLny ˆˆˆˆˆˆˆ

^ ^

0

nyDnh ˆ1*

Page 18: Speech Processing

April 19, 2023 Veton Këpuska 18

Example 6.4

a,b real and positive:⇒ log(ab) = log(a)+log(b)

a,b real but b<0⇒ log(ab) = log(a|b|ejk)=log(a)+log(|b|)+jk, k=1,3,5,… log(ab) is ambiguous.

This example indicates that special consideration must be made in defining the logarithm operator for complex X(z) in order to make the logarithm of the product the sum of logarithms.

Page 19: Speech Processing

April 19, 2023 Veton Këpuska 19

Homomorphic Systems for Convolution-Complex Logarithm

Suppose that X(z) is evaluated on the unit circle (z=ej)

Let x[n]=x1[n]*x2[n] ⇒ X()=X1() X2() Consider then complex log of X():

Considering that X()=X1() X2() then:

XjXeXX Xj logloglog

2121

21

2121

loglog

loglog

loglogloglog22

XXjXX

eXeX

XXXXXXjXj

Page 20: Speech Processing

April 19, 2023 Veton Këpuska 20

Homomorphic Systems for Convolution-Complex Logarithm

In the previous expression the following was assumed:

Also:

Expression generally does not hold due to the ambiguity in the definition of phase:

0 & 0 if holds Expression

loglog

loglog

21

21

21

XX

XX

XXX

21

21

XX

XXX

kXPVX 2

Page 21: Speech Processing

April 19, 2023 Veton Këpuska 21

Homomorphic Systems for Convolution-Complex Logarithm

Note that: PV denotes principal value of the phase which falls in the interval

[-,]. Arbitrary multiple of 2 can be added to the principal phase value Thus additive property generally does not hold.

How to impose uniqueness?1. Force continuity of phase:

Select k such that ∠X()=PV[∠X()]+ 2k is a continuous function. Figure 6.5 (next slide).

2. Phase derivative approach:

It can be shown that:

ωXdω

d ωX, where dβXωX

ω

0

2ωX

ωXωXωXωXωX

d ωX irir

Page 22: Speech Processing

April 19, 2023 Veton Këpuska 22

Fourier Transform Phase Continuity

Page 23: Speech Processing

April 19, 2023 Veton Këpuska 23

Homomorphic Systems for Convolution

Relationship of complex cepstrum to real cepstrum c[n]: If x[n] real then:

|X()| is real and even and thus log[|X()|] is real and even ∠X() is odd, and hence

is referred to as the complex cepstrum. Even component of the complex cepstrum, c[n] is referred to as

the real cepstrum.

2

ˆˆ nxnxnc

nx

deXnx njlog2

nx

Page 24: Speech Processing

April 19, 2023 Veton Këpuska 24

Complex Cepstrum of Speech-Like Sequences

Sequences with Rational z-Transform: General form the class of sequences is given below:

Mi, Ni – are zeros and poles inside the unit circle. Mo, No – are zeros and poles outside the unit circle. |ak|, |bk|, |ck|, |dk| are all < 1 ⇒

Thus there are no singularities on the unit circle. A > 0.

oi

oi

N

kk

N

kk

M

kk

M

kk

r

zdzc

zbzaAzzX

11

1

11

1

11

11

Page 25: Speech Processing

April 19, 2023 Veton Këpuska 25

Complex Cepstrum of Speech-Like Sequences

Applying complex logarithm gives:

is a z-transform of sequence

Want inverse z-transform to be absolutely summable ⇒ ROC of must include unit circle, |z|=1.

This condition is equivalent to having all constituent elements of have ROC’s that include unit circle, |z|=1

oioi N

kk

N

kk

M

kk

M

kk zdzczbzaA

zXzX

11

1

11

1 1log1log1log1loglog

logˆ

nx zX

zXnx ˆˆ 1

zX

zX

Page 26: Speech Processing

April 19, 2023 Veton Këpuska 26

Complex Cepstrum of Speech-Like Sequences

In order to obtain ROC for expressions of the form: log(1-z-1) log(1-z),

they are expressed in a power series expansion:

1

1

11

1 ,1log

1 ,1log

n

nn

n

nn

zzn

z

zzn

z

1

Im

Re

Z-plane

ROC for log(1-z-1)

1/

1

Im

Re

Z-plane

ROC for log(1- z)

Page 27: Speech Processing

April 19, 2023 Veton Këpuska 27

Complex Cepstrum of Speech-Like Sequences

The ROC of is therefore given by an annulus defined by the poles & zeros of X(z) closest to the unit circle:

1

Im

Re

Z-plane

ROC for typical rational X(z)

zX

Page 28: Speech Processing

April 19, 2023 Veton Këpuska 28

Complex Cepstrum of Speech-Like Sequences

Complex cepstrum associated with rational X(z) can be therefore expressed as:

nx

11logˆ1111

nun

d

n

bnu

n

c

n

anAnx

ooii N

k

nk

M

k

nk

N

k

nk

M

k

nk

11

11

11

11

zaaz

zbbzAzX

Page 29: Speech Processing

April 19, 2023 Veton Këpuska 29

Example A.

Determine the complex cepstrum of the minimum-phase sequence:

x[n] = anu[n], |a|<1

Solution:

1. Determine the z-transform of x[n].

azaz

zazazX n

n

n

n

n

,

1

1)()(

11

00

Page 30: Speech Processing

April 19, 2023 Veton Këpuska 30

Example A. (cont.)

Compute X(z):

Complex cepstrum values are simply the coefficients of the term z-n above, that is:

^

n

n

n

zn

aaz

azzXzX

1

11

]1log[1

1log)](log[)(ˆ

]1[][ˆ nun

anx

n

Page 31: Speech Processing

April 19, 2023 Veton Këpuska 31

Example B.

Determine the complex cepstrum of the maximum-phase sequence:

x[n] = d[n]+bd[n-1], |b|<1

Solution:

1. Determine the z-transform of x[n].

)1(1]1[][)( 1111 zbbzbzznbnzX n

n

Page 32: Speech Processing

April 19, 2023 Veton Këpuska 32

Example B. (cont.)

Compute X(z):

Complex cepstrum values are simply the coefficients of the term z-n above, that is:

^

n

n

nn

zbn

bzzXzX

1

11 1

1log)](log[)(ˆ

]1[

1][ˆ

1

nubn

nx nn

Page 33: Speech Processing

April 19, 2023 Veton Këpuska 33

Example C.

Determine the complex cepstrum of the sequence:

x[n] = d[n]+ad[n-Np], |a|<1Discrete convolution of any sequence x[n] with this sequence produced a scaled by-a echo of the first sequence: i.e.:

x1[n]*(d[n]+ad[n-Np]) = x1[n]+ax1[n-Np] Solution:

1. Determine the z-transform of x[n].

1,1][][)(0

pNn

np zzNnnzX

Page 34: Speech Processing

April 19, 2023 Veton Këpuska 34

Example C. (cont.)

Compute X(z):

Complex cepstrum values are simply the coefficients of the term z-n above, that is:

^

nNp

n

nn

Np zn

zzXzX

1

111log)](log[)(ˆ

1

1

][1

][ˆk

nk

kNpnkn

nx

Page 35: Speech Processing

April 19, 2023 Veton Këpuska 35

Example D.

Determine the complex cepstrum of the convolution sequence of previous examples (A-C):

Solution:

])[][(*])1[][(*])[(][ pn Nnnnbnnuanx

]1[][

]1[][

])[][(*])1[][(][

1

1

1

pNn

pn

nn

pnn

NnuabNnua

nubanua

Nnnnubanuanx

p

Page 36: Speech Processing

April 19, 2023 Veton Këpuska 36

Example D.

The complex cepstrum of x[n] is the sum of the complex cepstra of the three sequences:

1

1

1

][)1(

]1[)1(

]1[][ˆ

kp

kk

nnn

kNnk

nun

bnu

n

anx

Page 37: Speech Processing

April 19, 2023 Veton Këpuska 37

Example D. (cont.)

Page 38: Speech Processing

April 19, 2023 Veton Këpuska 38

Example 6.5

Let:

where a, b, c, are real and <1. The ROC of X(z) includes unit circle so that x[n] is stable. A delay z-r corresponds to a shift in the sequence. Thus complex cepstrum is given by:

1

1

1

11

cz

bzazzzX r

rnnn

znun

bnu

n

c

n

anx

log11ˆ 1

Page 39: Speech Processing

April 19, 2023 Veton Këpuska 39

Example 6.5 (cont.)

The inverse z-transform of the shift term is given by:

Contribution of z-r term is significant. On the unit circle: z-r=e-jr=1∠-r contributes a linear

ramp to the phase and thus for a large shift r, dominates the phase representation and gives a large discontinuity at and -.

0, 0

0,cos

log1

n

nn

nrz r

Page 40: Speech Processing

April 19, 2023 Veton Këpuska 40

Complex Cepstrum of Speech-Like Sequences

Relation of complex cepstrum and real cepstrum for x[n] with rational z-transform that is minimum phase:

Complex cepstrum of a minimum-phase sequence with a rational z-transform is right-sided:

0, 0

0, 2

0, 1

ˆ

nnl

nnl

nnl

ncnlnx

2

ˆˆ nxnxnc

Page 41: Speech Processing

April 19, 2023 Veton Këpuska 41

Impulse Train Convolved with Rational z-Transform Sequences

Second class of sequences of interest in the speech context is the train of uniformly-spaced unit samples with varying weights and its interaction with the system:

h[n]p[n] x[n]

Q

rk rNnnp

0

x[n]=h[n]*p[n]

Q

r

rNr

Q

rr zα zPrNnnp

00

Z

1

0

1

00

1Q

r

Nr

Q

r

rNr

Q

r

rNk zazαzα zP

Page 42: Speech Processing

April 19, 2023 Veton Këpuska 42

Impulse Trans Convolved with Rational z-Transform Sequences

If p[n] is minimum phase and |ar(zN)-1|<1, zeros are inside the unit circle, log[P(z)] can be expressed as:

Thus is an infinite right-sided sequence of impulses spaced N-samples apart.

Note that in general for non-minimum phase sequences the complex cepstrum is two-sided with uniformly spaced impulses.

1

0 1

1

0

11log log

Q

r k

kNkr

Q

r

Nr z

k

azazP

zPnp logˆ 1

Page 43: Speech Processing

April 19, 2023 Veton Këpuska 43

Example 6.6

Consider a sequence x[n]=h[n]*p[n] where z-transform of h[n] is given by:

a,a*, and b, b* are complexconjugate pairs.

Consider p[n] to be train ofperiodic pulses then:

11

11

11

11

zaaz

zbbzAzH

1

Im

Re

Z-plane

a

b

b*

a*

h[n]p[n] x[n]

0

k

k kPnnp x[n]=h[n]*p[n]

Page 44: Speech Processing

April 19, 2023 Veton Këpuska 44

Example 6.6 (cont)

If ∈ and ||<1 then p[n] is train of decaying exponentials:

Z-transform of p[n] is given by:

Then, as derived earlier:

1p[n]

n

0k

kPkP zzP

npnhnx ˆˆˆ

Page 45: Speech Processing

April 19, 2023 Veton Këpuska 45

Example 6.6 (cont)

h[n]p[n]

Page 46: Speech Processing

April 19, 2023 Veton Këpuska 46

Homomorphic Filtering

In the cepstral domain: Pseudo-time Quefrency Low Quefrency Slowly varying components. High Quefrency Fast varying components.

Removal of unwanted components (i.e., filtering) can be attempted in the cepstral domain (on the signal , in which case filtering is referred to as liftering):

When the complex cestrum of h[n] resides in a quefrency interval less than a pitch period, then the two components can be separated form each other.

nx

Page 47: Speech Processing

April 19, 2023 Veton Këpuska 47

Homomorphic Filtering

If log[X()] Is viewed as a “time signal” Consisting of low-frequency and high-frequency

contributions. Separation of this signal with a high-pass/low-pass

“filter”. One implementation of low pass “filter”:

D *

* +y[n]l[n]

+ +D *

*+ -1

nx ny

x[n]=h[n]*p[n]

Page 48: Speech Processing

April 19, 2023 Veton Këpuska 48

Homomorphic Filtering

Alternate view of “liftering” operation: Filtering operation L() applied in the log-spectral domain

Interchange of time and frequency domain by viewing the frequency-domain signal log[X()] as a time signal to be filtered. ⇒ “Cepstrum” can be thought of as spectrum of log[X ()] Time axes of is referred to as “quefrency” Filter l[n] as the “lifter”.

F-1 y[n]l[n] F-1

nx nyx[n]=

h[n]*p[n] F log F exp

X()^ Y()^

L()

nx

Page 49: Speech Processing

April 19, 2023 Veton Këpuska 49

Homomorphic Filtering

Three elements in the doted lines of previous figure can be replaced by L(), which can be viewed as a smoothing function:

XLY logˆ

y[n]L() F-1x[n]=h[n]*p[n] F log exp

X()^ Y()^

Page 50: Speech Processing

April 19, 2023 Veton Këpuska 50

Practical Implementation Issues

Use FFT and IFFT for Fourier Transformations. X() is computed by:

log|X()| computed as

And for x[n] use

N

n

N

knj

enxkX0

2

kXjkXkXkX loglogˆ

^

N

knjN

kN ekX

Nnx

21

0

ˆ1][ˆ

Page 51: Speech Processing

April 19, 2023 Veton Këpuska 51

Practical Implementation Issues

1. Cepstrum x[n] is infinitely long thus xN[n] is aliased version of x[n]. That is:

Thus it is necessary to use a largest N as possible2. Phase component j∠X(k) must be properly

unwrapped to ensure phase continuity.

Goal to determine r[k] so that ∠X(k) is continuous.

r

N rNnxnx ][ˆ][ˆ

^ ^

^

krkXPVkX 2

Page 52: Speech Processing

April 19, 2023 Veton Këpuska 52

Modulo 2 Phase Unwrapper

Goal is to determine r[k] so that X(k) is continuous

2/N

-

PrincipalValue PV

PV[X()] PV[X(k)]

Phase Representation in Discrete Complex Spectrum

Page 53: Speech Processing

April 19, 2023 Veton Këpuska 53

Modulo 2 Phase Unwrapper Algorithm:

If PV[X(k)]-PV[X(k-1)]>2- r[k]=r[k-1]-1 # Subtract 2

Else if PV[X(k)]-PV[X(k-1)]<2- r[k]=r[k-1]+1 # Add 2

Else r[k]=r[k-1] # Do not change

End

Note: Even with fine grid of (determined by N) 2/N, it is possible that subsequent PV samples may be more than 2 rad apart (case of poles/zeros close together).

Page 54: Speech Processing

April 19, 2023 Veton Këpuska 54

Phase Derivate-Based Phase Unwrapper

The phase derivative is uniquely defined by:

Then:

However, since only X(k) is available must estimate from discrete values.

2ωX

ωXωXωXωXωX

d ωX irir

dX ωX 0

ωX

Page 55: Speech Processing

April 19, 2023 Veton Këpuska 55

Phase Derivate-Based Phase Unwrapper

Re-state the Problem:

Where q(k) is an integer-valued function.

Assuming that phase has been correctly unwrapped up-to k-1

with the value (k-1) then:

An approximation:

Select value of q(k) such that E[k] is minimized:

over q(k).

kkk qXPVX 2

k

k

dkk

1

1

11

1 2

kkkk

kk

kkk qXPVkE ˆ2

Page 56: Speech Processing

April 19, 2023 Veton Këpuska 56

Example

Page 57: Speech Processing

April 19, 2023 Veton Këpuska 57

Short-Time Homomorphic Analysis of Periodic Sequences

Page 58: Speech Processing

April 19, 2023 Veton Këpuska 58

Short-Time Homomorphic Analysis of Periodic Sequences

Recall Source-System model of speech production:

For voiced speech p[n] is quasi-periodic:

For unvoiced speech p[n] is noise-like. In practice a periodic waveform is windowed by a finite-

length sequence w[n]:

s[n]=w[n]x[n]=w[n](p[n]*h[n]) Approximation to s[n]:

h[n]p[n] x[n]= h[n]*p[n]

0

k

k kPnnp

][])[][( ][~ nhnpnwnx

Page 59: Speech Processing

April 19, 2023 Veton Këpuska 59

Short-Time Homomorphic Analysis of Periodic Sequences

If w[n] is smooth relative to h[n], that is, P large enough so that h[n-kP] do not substantially overlap, then:

Then, Cepstrum of s[n] is:

where is complex cepstrum of w[n]p[n].

Can show that:

D[n] – weighting function depending on w[n].

][ˆ][ˆ][ nhnpns

[n] [n][n][n] [n]~ shpwx

][ˆ np

k

kPnhnDnpns ][ˆ][][ˆ][ …………()

Page 60: Speech Processing

April 19, 2023 Veton Këpuska 60

Cepstral Domain (Quefrency) Perspective

Under what conditions can we perform deconvolution?

Cepstral Domain (Quefrency) Perspective Let x[n], a voiced speech signal, produced by an infinite

train of periodic impulses:

Thus the only samples in X() and log[X()] are defined at multiples of the fundamental frequency o=2/P, i.e., k=(2/P)k

X(k) = P(k) H(k)

log[X(k)] = log[P(k)] + log[H(k)]

][][][

0

nhnpnx

kPnnpk

Page 61: Speech Processing

April 19, 2023 Veton Këpuska 61

Cepstral Domain (Quefrency) Perspective

In the cepstral domain, appear as a set of replicas of h[n] appearing at every kP.

Thus, aliasing is an issue and needs to be handled properly. That is: Can this aliasing be prevented or at least minimized?

Consider:

s[n]=w[n]x[n]=w[n](p[n]*h[n])

k

kPnh ][^

WHPS 2

1

k

oo kWkHP

S 1

Page 62: Speech Processing

April 19, 2023 Veton Këpuska 62

Cepstral Domain (Quefrency) Perspective

Let’s rewrite s[n] as:

s[n] = (p[n]w[n])*g[n]where g[n] ≈ h[n].

Then:

Taking log of equations under and , and solving for log[G()] the following is obtained:

GkWP

Sk

o

1

ko

koo kWkWkHG log loglog

………(1)

Page 63: Speech Processing

April 19, 2023 Veton Këpuska 63

Cepstral Domain (Quefrency) Perspective

To simplify, assume W() has only one main lobe of rectangular window:

That is:with wo=2/P`

otherwiseW

o

,02

,1

Page 64: Speech Processing

April 19, 2023 Veton Këpuska 64

Cepstral Domain (Quefrency) Perspective

Thus second log term becomes zero:

ko

koo kWkWkHG log loglog

0

………(2)

koo

koo

koo

kkHW

kWkH

kWkHG

log

log

loglog

Page 65: Speech Processing

April 19, 2023 Veton Këpuska 65

Cepstral Domain (Quefrency) Perspective

From (1) and (2) we can write:

where is the complex cepstrum of p[n]w[n], and

is the complex cepstrum of h[n] and w[n] is the inverse Fourier transform of the rectangular function W(w).

The result is illustrated in Figure g.15.

ngnpns ˆˆˆ np

k

kPnhnwGng ˆlogˆ 1 …………()

][ˆ nh

Page 66: Speech Processing

April 19, 2023 Veton Këpuska 66

Figure 6.15.

Quefrency

Page 67: Speech Processing

April 19, 2023 Veton Këpuska 67

Cepstral Domain (Quefrency) Perspective

Last equation () is a special case of Equation () with D[n]=w[n].

As with purely convolutional model:the contributions of the windowed pulse train and impulse response are additively combined so that deconvolution is possible.

Now the impulse response contribution is repeated at the pitch period rate. This aliasing is: Dependent upon pitch, and is different from aliasing

due to an Insufficient DFT length (see section 6.4.4).

][])[][( ][~ nhnpnwnx

Page 68: Speech Processing

April 19, 2023 Veton Këpuska 68

Cepstral Domain (Quefrency) Perspective

Conditions under which: s[n]≈(w[n]p[n])*h[n]

1. w[n] – time domain window, should be long enough so that D[n] should be smooth over |n|<P over the extent of .

2. w[n] – should be short enough to reduce contribution of replicas of . In practice w[n] is Hamming window of 2-3 pitch periods long.

3. w[n] should be centered at time origin, n=0, aligned with h[n].

^

][ˆ nh

][ˆ nh

Page 69: Speech Processing

April 19, 2023 Veton Këpuska 69

Cepstral Domain (Quefrency) Perspective Under those conditions for low-time lifter (filter in

cepstral domain), l[n] of the length |n|<P/2

That is, complex cepstrum is close to that derived from conventional model.

Note that with high-pitched speakers there is stronger presence of p[n] close to the origin (as noted earlier) as well as more aliasing of replicas of h[n].

][])[][( ][~ nhnpnwnx

^

Page 70: Speech Processing

April 19, 2023 Veton Këpuska 70

Frequency Domain Perspective

Let x[n] where:

Then: X(k)=P(k) H(k)Where X(k) represents line spectrum at k=(2/P)k.

Question arises: Under what conditions the window properties would lead:

the output to be close to actual:

s[n]=w[n]x[n]=w[n](p[n]*h[n])?

][][][

0

nhnpnx

kPnnpk

][])[][( ][~ nhnpnwnx

Page 71: Speech Processing

April 19, 2023 Veton Këpuska 71

Frequency Domain Perspective Define an error measure E() that would reflect degradation in the

frequency domain:

Want to minimize:

It was found empirically that for Hamming window this spectral distance measure is minimized for window length in the range of roughly 2-3 pitch periods.

An implication of this result is that the length of the analysis window should be adapted to the pitch period to make the windowed waveform as close as possible (in the sense described above) to the desired convolutional model.

X

SE ~

dED2

log2

1

Page 72: Speech Processing

April 19, 2023 Veton Këpuska 72

Short-Time Speech Analysis

Page 73: Speech Processing

April 19, 2023 Veton Këpuska 73

Short-Time Speech Analysis

Complex Cepstrum of Voiced Speech Recall:

H(z)=AG(z)V(z)RL(z)

The output speech then is:

GainGlottalModel

Vocaltract

Model

LipRadiation

Model

][

][][][][][][][nh

l nrnvngnApnhnpnx

Page 74: Speech Processing

April 19, 2023 Veton Këpuska 74

Complex Cepstrum of Voiced Speech

General form for stable V(z):

Zeros inside & outside the unit circle Poles inside the unit circle

Goal is to separate h[n] from p[n]. Let s[n]=w[n](p[n]*h[n]) be approximately equal to

i

i o

N

kk

M

k

M

kkk

zc

zbzazV

1

1

1 1

1

1

11

][])[][( ][~ nhnpnwnx

Page 75: Speech Processing

April 19, 2023 Veton Këpuska 75

Complex Cepstrum of Voiced Speech

Recall that x[n]≈s[n] if window is 2-3 pitch-periods long and its center aligned with h[n].

Using the DFT of order N the following denotes discrete complex cepstrum:

For a typical speaker the duration of the short-time window lies in the range of 20ms-40ms.

Assuming that: Source and systems components lie roughly in separate

quefrency regions Negligible aliasing of the replicas of h[n] Most of the h[n] occurs within P/2 from origin Distortion function D[n] is smooth in the same range for |n|<P/2

and thus it makes other higher order replicas negligible for |n|>P/2.

Then, applying a cepstral lifter function:

~

][ˆ][ˆ][ˆ nhnpns NNN

^^

Page 76: Speech Processing

April 19, 2023 Veton Këpuska 76

Complex Cepstrum of Voiced Speech

Low-Quefrency lifter:

to separate h[n] from p[n]. Similarly high-quefrency lifter can be used to produce

the input train pulse (pitch estimation).

elsewhere

Pnnl

,02

,1 ][

elsewhere

Pnnl

,12

,0 ][

^ ^

Page 77: Speech Processing

April 19, 2023 Veton Këpuska 77

Example 6.11

Voiced female speech with pitch period of 5 ms.

Sampling rate fs=10kHz. Hamming window of 15 ms. A 1024 point FFT/IFFT is used to

obtain discrete complex cepstrum. Center window on h[n] (more about

that latter).

Page 78: Speech Processing

April 19, 2023 Veton Këpuska 78

Example 6.11

Page 79: Speech Processing

April 19, 2023 Veton Këpuska 79

Example 6.11Maximum

Phase

Minimum Phase

Maximum Phase

Minimum Phase

Page 80: Speech Processing

April 19, 2023 Veton Këpuska 80

Complex Cepstrum of Unvoiced Speech

Recall the transfer function model for the unvoiced speech:

H(z) = AV(z)R(z)

In contrast to the voiced case, there is no glottal volume velocity contribution.

Resulting speech waveform in time domain:x[n]=u[n]*h[n]=u[n]*v[n]*r[n]

Resulting signal after applying short time analysis window:

s[n]=w[n](u[n]*h[n])

White noise

Page 81: Speech Processing

April 19, 2023 Veton Këpuska 81

Complex Cepstrum of Unvoiced Speech

Similarly to the arguments applied for voiced speech: Duration of the analysis window w[n] is selected so

that the formant of the unvoiced speech power spectral density are not significantly broadened

w[n] is sufficiently smooth so as to be as nearly constant over h[n] the following can be assumed:

s[n]≈(w[n]u[n])*h[n]

Defining the windowed white noise as q[n] = u[n]w[n], and

Computing discrete complex cepstrum with N-point DFT

Page 82: Speech Processing

April 19, 2023 Veton Këpuska 82

Complex Cepstrum of Unvoiced Speech

qN[n] – the discrete complex cepstrum of the noise source covers all quefrencies, and thus separation is not possible.

Phase unwrapping of noisy signals is very unreliable.

Real cepstrum is adequate for unvoiced speech (phase information not important for this case) resulting in minimum-phase versions of h[n].

Deconvolved excitation may contain interesting fine source structure for classes of sounds; e.g., voiced fricatives.

][ˆ][ˆ][ˆ nhnqns NNN

Page 83: Speech Processing

April 19, 2023 Veton Këpuska 83

Analysis/Synthesis Structure

Page 84: Speech Processing

April 19, 2023 Veton Këpuska 84

Analysis/Synthesis Structure

In speech analysis underlying parameters of the speech model are estimated

In speech synthesis stage the waveform is reconstructed from the model parameters.

Liftering of low-quefrency region of the cepstrum ⇒ provides an estimate of the system impulse response

Liftering of high-quefrency region of the cepstrum ⇒ provides an estimate of source excitation signal.

Inverting the estimate of the source signal with homomorphic system to obtain excitation function.

Convolution of the two resulting component estimates yields the original short-time segment exactly.

1D

Page 85: Speech Processing

April 19, 2023 Veton Këpuska 85

Analysis/Synthesis Structure With an overlap-add reconstruction from the short-time

segments, the entire waveform is recovered. The homomorphic system performs transformation with

no information reduction. This process is analogous to reconstructing the

waveform, in linear prediction analysis/synthesis, from the convolution of the all-pole filter and the output of its inverse filter.

In speech coding and speech modification applications a more efficient representation is desired.

Complex or real cepstrum provides an approach to such a representation because pitch and voicing can be estimated from the peak (or lack of peak) in the high-quefrency region of the cepstrum.

Page 86: Speech Processing

April 19, 2023 Veton Këpuska 86

Zero and Minimum-Phase Synthesis

Assuming that we have a succinct and accurate characterization of the speech production source (as with linear prediction-based analysis/synthesis), able to synthesize an estimate of the speech

waveform.

This synthesis can be performed based on any one of several possible phase functions: Zero-phase, Minimum-phase, maximum-phase Mixed-phase functions

Page 87: Speech Processing

April 19, 2023 Veton Këpuska 87

Zero and Minimum-Phase Synthesis

General framework for homomorphic analysis/synthesis:

1024-pointReal Cepstrum

Analysis window of 10-20 ms

P/2

Page 88: Speech Processing

April 19, 2023 Veton Këpuska 88

Mixed-Phase Synthesis

Example 6.13

Page 89: Speech Processing

April 19, 2023 Veton Këpuska 89

Contrasting Linear Predication and Homomorphic Filtering

Homomorphic Filtering is viewed as an alternative to linear prediction.

Linear Prediction Homomorphic FilteringParametric Non-parametric

Sharp smooth resonances Wider spurious resonances

All-pole representation Poles and zeros can be represented.

Minimum-phase response estimate only

Minimum-phase as well as Mixed-phase if complex cepstrum is used.

Synthesized speech “crisper” but more “mechanical”

Synthesized speech more “natural” but “muffled”

Page 90: Speech Processing

April 19, 2023 Veton Këpuska 90

Contrasting Linear Predication and Homomorphic Filtering

Similar problems with both methods:

Linear Prediction Homomorphic FilteringIncreased speech distortion with increasing pitch

Aliasing of the vocal tract impulse response at the pitch period repetition rate

Linear prediction windowing results in the prediction of nonzero values of the waveform from zeros outside the window.

Windowing a periodic waveform distorts the convolutional model.

Number of poles is required The length of the low-quefrency lifter must be chosen

Best window and order selection is often a function of the pitch of the speaker.

Page 91: Speech Processing

April 19, 2023 Veton Këpuska 91

Homomorphic Prediction

Number of speech analysis methods rely on combining homomorphic filtering with linear prediction and are referred to collectively as homomorphic prediction.

Two primary advantages of combining the methods:

1. By reducing the effects of waveform periodicity, an all-pole estimate suffers less from the effect of high-pitch aliasing.

2. By removing ambiguity in waveform alignment, zero estimation can be performed without the requirement of pitch-synchronous analysis.

Page 92: Speech Processing

April 19, 2023 Veton Këpuska 92

Homomorphic Prediction

Waveform Periodicity: Recall that for the waveform consisting of the

convolution of a short-time impulse train and an impulse response:

x[n]=p[n]*h[n] Autocorrelation function is given by the convolution

of the autocorrelation function of the response and that of the impulse train:

rx[]=rh[]*rp[] Thus, as the spacing between impulses (the pitch

period) decreases, the autocorrelation function of the impulse response suffers form increasing distortion.

Page 93: Speech Processing

April 19, 2023 Veton Këpuska 93

Homomorphic Prediction

Thus if spectrogram magnitude of h[n] can be estimated accurately then linear prediction analysis can be performed with an estimate of rh[] free of the waveform periodicity. This leads to the following idea:1. Use homomorphic filtering to deconvolve and

estimate of h[n] by low-pass liftering the real or complex cepstrum of x[n].

2. Use autocorrelation method on the resulting impulse response estimate by linear prediction analysis to obtain the model parameters.

Page 94: Speech Processing

April 19, 2023 Veton Këpuska 94

Example 6.14 Suppose h[n] is a minimum-phase all-pole sequence of

order p. Consider a waveform x[n] constructed by convolving h[n] with a sequence p[n] where:

p[n] = [n] + [n-N], with <1

Complex cepstrum of x[n] is given by:

Where and are the complex cepstra of p[n] and h[n], respectively.

The autocorrelation function is given by:

rx[] = (1+2)rh[] + rh[-N] + rh[+N] rx[] is rh[] distorted by its neighboring terms centered

at =+N and =-N.

][ˆ][ˆ][ˆ nhnpnx ][ˆ ][ˆ nhnp

Page 95: Speech Processing

April 19, 2023 Veton Këpuska 95

Homomorphic Prediction Important point of previous example:

The first p coefficients of the real cepstrum of x[n] are undistorted (if a long-enough DFT length is used in the computation)

The first p coefficients of the autocorrelation function rx[] of the waveform are distorted by aliasing of autocorrelation replicas (regardless of the DFT length)

Þ Cepstral lowpass lifter of duration less than p extracts a smoothed and not aliased version of the spectrum.

Þ Linear prediction coefficients can alternatively be obtained exactly through the recursive relation between the real cepstrum and predictor coefficients of the all-pole model when h[n] is all-pole (Exercise 6.13).

Page 96: Speech Processing

April 19, 2023 Veton Këpuska 96

Homomorphic Prediction Zero Estimation:

Consider a transfer function of poles and zeros of the form:

Also consider a sequence x[n]=h[n]*p[n] where p[n] is a periodic impulse train.

Suppose that: Estimate of h[n] is obtained through homomorphic filtering of

x[n] Number of poles and zeros is known and Linear-phase component z-r has been removed.

Then poles of h[n] can be estimated using the covariance method of linear predication.

Other methods can be used (e.g., Shanks method described in Chapter 5) to estimate zeros.

zD

zNzH

Page 97: Speech Processing

April 19, 2023 Veton Këpuska 97

Homographic Prediction

Page 98: Speech Processing

April 19, 2023 Veton Këpuska 98

Summary This chapter focus was on the use of Homomorphic

filtering with application to deconvolution-separation of source from a system.

The presented methodology is general and can be applied not only to deconvolution of vocal tract from glottal source.

Example Applications: Control of dynamic range of multiplicatively combined

signals (Exercise 6.19) Recovery of speech from degraded recordings. Old acoustic

recordings suffer from convolutional distortion imparted by an acoustic horn that can be approximated by a linear resonant filter. See Exercise 6.20 for details.

In image processing, homomorphic filtering can be used for contrast enhancement (See Oppenheim and Shafer Book, “Digital Signal Processing”, p487, Prentice Hall 1975.)

Page 99: Speech Processing

April 19, 2023 Veton Këpuska 99

Summary Homomorphic processing is applied in the phase Vocoder

and sinewave analysis/synthesis. It also has been found useful in speech coding (Chapter 12) Speaker Recognition (Chapter 14) It also a basis for mel-cepstrum; Fourier Transform of a

constant-Q filtered log-spectrum. Mel-cepstrum it is hypothesized that it approximates signal

processing in the early stages of human auditory perception.

Homomorphic filtering applied along the temporal trajectories of the mel-cepstral coefficients can be used to remove convolutional channel distortions even when the cepstrum of these distortions overlaps the cepstrum of speech (Chapter 13): Cepstral Mean Subtraction and RASTA processing.