changes in ppc[1]

8/6/2019 Changes in Ppc[1]

1/31

Use of spectral autocorrelation in

spectral envelope linearprediction for speech recognitionBY

B.LALITHA

(08691D3807)

Under the guidance of

Prof. M. B. MANJUNATHA, M.Tech (PhD)


2/31

Introduction

Out-Line of the project

11.Introduction to Speech.Introduction to Speech

i) Speech Productioni) Speech Productionii) Speech Recognitionii) Speech Recognition

2. Implementation

3.spectral enevelope LPC Analysis

4.

Speech recognition us

ing Dynamic Time Warping (DTW)5.Result


3/31

1. INTRODUCTION TO SPEECH

i)i) SpeechSpeech ProductionProduction::

SpeechSpeech cancan bebe characterizedcharacterized asas aa signalsignal carryingcarrying messagemessage informationinformation..

TheThe PurposePurpose ofofspeechspeech isis communicationcommunication betweenbetween humanshumans..

SpeechSpeech isis anan acousticacoustic waveformwaveform thatthat conveysconveys informationinformation fromfrom aa speakerspeaker toto

aa listenerlistener


4/31

Human peech ommunication

Human Speech signal Human Ear


5/31

Speech Production Mechanism


6/31

Speech Production Mechanism

Flow of air from lungs

Vibrating vocal cords

Speech production cavities

Lips

Sound wave

Vowels (a, e, i), fricatives (f, s, z) and plosives (p, t, k)


7/31

Classification of Speech Signals

Voiced Sounds :

Voiced sounds are produced when the vocal cords vibrate. These arequasi-periodic pulses of air which excite the vocal tract.

These are labeled as / u /, / d /, / w /, / i /, and / e /.

Unvoiced or Fricative Sounds :

Unvoiced sounds are produced by forming constriction at some point in

the vocal tract and forcing air through the constriction at a high velocity toproduce turbulence.

These are labeled as / / is a fricative sh / f /, and / s / are also fricatives.


8/31

ii) Speech Recognition

The three basic steps in Automatic Speech Recognition (ASR) are:

1. Parameter Estimation2. Parameter Comparison

3.

Decis

ion Making


9/31

IMPLEMENTATION

Pre-emphasising

Hamming windowing

Linear prediction

Spectral autocorrelation

Dynamic time wrapping

Extract total storedvectors

Vector with minimum value

PredictiveFilter

Input Speech signal

LPC Coefficients

Recognized Word

ReferenceWord


10/31

PrePre--EmphasisEmphasis

Boosting the energy in the high frequencies.Boosting the energy in the high frequencies.

The spectrum for voiced segments has more energy at lower frequenciesThe spectrum for voiced segments has more energy at lower frequenciesthan higher frequencies.than higher frequencies.

spectral tiltspectral tilt Spectral tilt is caused by the nature of the glottal pulseSpectral tilt is caused by the nature of the glottal pulse

Boosting highBoosting high--frequency energy gives more information to Acousticfrequency energy gives more information to AcousticModel.Model.

Improves phone recognition performanceImproves phone recognition performance


11/31

Example of preExample of pre--emphasisemphasis

Before and after preBefore and after pre--emphasisemphasis Spectral slice from the vowel [aa]Spectral slice from the vowel [aa]


12/31

Pre-emphasing

b = [ 1 -15/16];

x= filter(b,1,x);

Len = length(x);

%Resample Decimation by 4 % x = x(1:Fs/8000:Len);

x = resample(x,8000,Fs);

Fs=8000;


13/31

Pre-emphasized and Resampled


14/31

windowing

Speech is Non-stationary signal.

A window is non-zero inside some region and zero elsewhere.

The speech extracted from each window is called as a frame.


15/31

Windowing

The windowing process, showingthe frame shift and frame size


16/31

Applying the window for i = 1:n for j = 1:nbFrame M(i, j) = speech(((j - 1) * m) + i); end end

h = hamming(n);

M2 = diag(h) * M;


17/31

Common window shapesCommon window shapes

Rectangular windowRectangular window

Hamming windowHamming window


18/31

Common window shapesCommon window shapes


19/31

Linear prediction

Linear Predictive Coding (LPC) provides low-dimension representation ofspeech signal at oneframe

representation ofspectral envelope, not harmonics analytically tractable method some ability to identify formants

LPC models the speech signal at time point n as anapproximate linear combination of previousp samples :

where a1, a2, ap are constant for each frame ofspeech.

)()2()1()( 21 pnsansansans p } .


20/31

If the error over a segment of speech is defined as

2

1

2

2

1

2

1

)()(

)(

! !

!

!

!

M

Mm

p

k

nkn

M

Mm

nn

kmsams

meE

where (sn = signal starting at time n)then we can find akby setting xEn/xak= 0 fork= 1,2,p,obtainingp equations andp unknowns:

pimsimskmsimsaM

Mm

nn

p

k

M

Mm

nnk ee! !! !

1)()()()(2

1

2

11

Error is minimum (not maximum) when derivative is zero, because

as any akchanges away from optimum value, error will increase.


21/31

2

1

2

1)()( ! !

!

M

Mm

p

kkn kmsamsE

! !!!

!

2

1111

2 )()()()(2)(M

Mm

p

r

k

p

k

k

p

k

kn rmsakmsakmsamsmsE

!

!

!

!

!

2

1

1

1

22

1

11

2

)()()()(2

)()2()2()(2

)()1()1()(2)(M

Mmp

r

rpp

p

r

r

p

r

r

n

rmsapmsapmsams

rmsamsamsams

rmsamsamsamsms

E

.

Features: LPC

)()2(...)1()2()2()(2

)()1(...)1()1(2)1()(2)(0

2122

1111

2

1

2

1

pmsamsamsamsamsams

pmsamsamsamsamsamsmsa

E

p

M

Mm

pn

!!xx

!

-

! !

2

10)1()(...)1()3()1()2(

)()1(...)2()1()1()1(2)1()(2

32

21M

Mm p

p

mspmsamsmsamsmsa

pmsamsmsamsmsmsamsms

0)()1(2...)2()1(2)1()1(2)1()(22

1

21 !!

M

Mm

p pmsmsamsmsamsmsamsms

Repeat above equationns fora2, a3, ap


22/31

? A pikmsimsaimsmsM

Mm

p

k

k

M

Mm

ee!

! !!

10)()(2)()(22

1

2

11

pikmsimsaimsmsM

Mm

p

k

k ee!

! !

10)()(2)()(22

11

pimsimskmsimsaM

Mm

p

k

M

Mm

k ee! !! !

1)()()()(2

1

2

11


23/31

LPC Autocorrelation Method

Autocorrelation: meas

ure of periodicity ins

ignal

!

!2

1

)()(),(M

Mm

nnn kmsimskiJ

we can re-write equation as

piikia n

p

k

nk ee!! 1)0,(),(1 JJ

We can solve forakusing several methods. The most common

method in speech processing is the autocorrelation method:

Force the signal to be zero outside of interval 0 e m e N-1:

where w(m) is a finite-length window (e.g. Hamming) of length N

that is zero when less than 0 and greater than N-1. is the

windowed signal. As a result,


24/31

!

!1

0

2 )(pN

m

nnmeE

because of setting the signal to zero outside the window, eqn

(6):

z

and this can be expressed as

and this is identical to the autocorrelation function for |i-k|

because

the autocorrelation function is symmetric, Rn(-k) = Rn(k) :

so the set of equations forak (eqn (7)) can be combo of(7) and

(12):

! eeee!

1

0 0

1

)()(),(p

m

nnnpkpikmsimskiJ

! ee

ee!

)(1

0 0

1))(()(),(

ki

m

nnn

pk

pikimsmskiJ

!

!

!

k

m

nnn

nn

kmsmskR

kiRki

1

0

)()()(

|)(|),(J

!

ee!p

k

nnkpiiRkiRa

1

1)(|)(|


25/31

!

)(

)3()2()1(

)0()3()2()1(

)3()0()1()2()2()1()0()1()1()2()1()0(

3

2

1

p

a

a

a

a

p

p

p

p

p

p

n

n

n

n

pnnnn

nnnn

nnnn

nnnn

.

.

.

.

.

.....

.....

.

.

.

In matrix form, equation (14) looks like this:

There is a recursive algorithm to solve this: Durbins solution

LPC Durbins Solution

Solve a Toeplitz (symmetric, diagonal elements equal) matrixfor values ofE:


26/31

)(

)1(2)(

)1()1()(

)(

)1(1

1

)1(

)0(

1

)1(

11

1)()(

)0(

1)(|)(|

p

jj

i

i

i

i

jii

i

j

i

j

i

i

i

ii

j

i

ji

p

k

nnk

aEkE

ijk

k

piEjiRiRk

RE

piiRkiR

E

EEE

E

E

E

!!

ee!

!

ee

!

!

ee!

!

!

We can compute spectral envelope magnitude from LPCparametersby evaluating the transfer function S(z) forz=ej[:

!

!! p

k

kj

k

j

j

ea

G

eA

G

eS

1

1)()( [

[

[


27/31

Finding frequency envelope using LPC method for col =1:nbFrame % compute Mth-order autocorrelation function:

rx = zeros(1,Or+1)'; speech1 = M2(:,col)'+0.000001; for i=1:Or+1, rx(i) = rx(i) + speech1(1:n-i+1) * speech1(1+i-1:n)'; end % prepare the M by M Toeplitx covariance matrix: covmatrix = zeros(Or,Or); for i=1:Or, covmatrix(i,i:Or) = rx(1:Or-i+1)'; covmatrix(i:Or,i) = rx(1:Or-i+1); end

% solve "normal equations" for prediction coeffs Acoeff s = - covmatrix \ rx(2:Or+1); Alp = [1,Acoeffs']; % LP polynomialA(z) dbenvlp(:,col) = 20*log(abs(freqz(1,Alp,n*2)')); end


28/31

Dynamic Time Warping (DTW The SELP analysis is evaluated using a Dynamic time wrapping T = { t1,t2,,ti,.tn} , R= {r 1,r2,,ri,.,rm}

T= Test Signal or Unknown Signal

R= Reference Signal or known Signal

A matrix of m x n is created

C(x, y) = MIN [C(x + 1, y) , C(x + 1,y + 1) , C(x, y + 1) ] + D(x, y)


29/31

A matrix of m x n is created


30/31

Dynamic Time Warping (DTW)


31/31

Dynamic Time Warping (DTW

changes in ppc[1]

Documents