changes in ppc[1]
TRANSCRIPT
-
8/6/2019 Changes in Ppc[1]
1/31
Use of spectral autocorrelation in
spectral envelope linearprediction for speech recognitionBY
B.LALITHA
(08691D3807)
Under the guidance of
Prof. M. B. MANJUNATHA, M.Tech (PhD)
-
8/6/2019 Changes in Ppc[1]
2/31
Introduction
Out-Line of the project
11.Introduction to Speech.Introduction to Speech
i) Speech Productioni) Speech Productionii) Speech Recognitionii) Speech Recognition
2. Implementation
3.spectral enevelope LPC Analysis
4.
Speech recognition us
ing Dynamic Time Warping (DTW)5.Result
-
8/6/2019 Changes in Ppc[1]
3/31
1. INTRODUCTION TO SPEECH
i)i) SpeechSpeech ProductionProduction::
SpeechSpeech cancan bebe characterizedcharacterized asas aa signalsignal carryingcarrying messagemessage informationinformation..
TheThe PurposePurpose ofofspeechspeech isis communicationcommunication betweenbetween humanshumans..
SpeechSpeech isis anan acousticacoustic waveformwaveform thatthat conveysconveys informationinformation fromfrom aa speakerspeaker toto
aa listenerlistener
-
8/6/2019 Changes in Ppc[1]
4/31
Human peech ommunication
Human Speech signal Human Ear
-
8/6/2019 Changes in Ppc[1]
5/31
Speech Production Mechanism
-
8/6/2019 Changes in Ppc[1]
6/31
Speech Production Mechanism
Flow of air from lungs
Vibrating vocal cords
Speech production cavities
Lips
Sound wave
Vowels (a, e, i), fricatives (f, s, z) and plosives (p, t, k)
-
8/6/2019 Changes in Ppc[1]
7/31
Classification of Speech Signals
Voiced Sounds :
Voiced sounds are produced when the vocal cords vibrate. These arequasi-periodic pulses of air which excite the vocal tract.
These are labeled as / u /, / d /, / w /, / i /, and / e /.
Unvoiced or Fricative Sounds :
Unvoiced sounds are produced by forming constriction at some point in
the vocal tract and forcing air through the constriction at a high velocity toproduce turbulence.
These are labeled as / / is a fricative sh / f /, and / s / are also fricatives.
-
8/6/2019 Changes in Ppc[1]
8/31
ii) Speech Recognition
The three basic steps in Automatic Speech Recognition (ASR) are:
1. Parameter Estimation2. Parameter Comparison
3.
Decis
ion Making
-
8/6/2019 Changes in Ppc[1]
9/31
IMPLEMENTATION
Pre-emphasising
Hamming windowing
Linear prediction
Spectral autocorrelation
Dynamic time wrapping
Extract total storedvectors
Vector with minimum value
PredictiveFilter
Input Speech signal
LPC Coefficients
Recognized Word
ReferenceWord
-
8/6/2019 Changes in Ppc[1]
10/31
PrePre--EmphasisEmphasis
Boosting the energy in the high frequencies.Boosting the energy in the high frequencies.
The spectrum for voiced segments has more energy at lower frequenciesThe spectrum for voiced segments has more energy at lower frequenciesthan higher frequencies.than higher frequencies.
spectral tiltspectral tilt Spectral tilt is caused by the nature of the glottal pulseSpectral tilt is caused by the nature of the glottal pulse
Boosting highBoosting high--frequency energy gives more information to Acousticfrequency energy gives more information to AcousticModel.Model.
Improves phone recognition performanceImproves phone recognition performance
-
8/6/2019 Changes in Ppc[1]
11/31
Example of preExample of pre--emphasisemphasis
Before and after preBefore and after pre--emphasisemphasis Spectral slice from the vowel [aa]Spectral slice from the vowel [aa]
-
8/6/2019 Changes in Ppc[1]
12/31
Pre-emphasing
b = [ 1 -15/16];
x= filter(b,1,x);
Len = length(x);
%Resample Decimation by 4 % x = x(1:Fs/8000:Len);
x = resample(x,8000,Fs);
Fs=8000;
-
8/6/2019 Changes in Ppc[1]
13/31
Pre-emphasized and Resampled
-
8/6/2019 Changes in Ppc[1]
14/31
windowing
Speech is Non-stationary signal.
A window is non-zero inside some region and zero elsewhere.
The speech extracted from each window is called as a frame.
-
8/6/2019 Changes in Ppc[1]
15/31
Windowing
The windowing process, showingthe frame shift and frame size
-
8/6/2019 Changes in Ppc[1]
16/31
Applying the window for i = 1:n for j = 1:nbFrame M(i, j) = speech(((j - 1) * m) + i); end end
h = hamming(n);
M2 = diag(h) * M;
-
8/6/2019 Changes in Ppc[1]
17/31
Common window shapesCommon window shapes
Rectangular windowRectangular window
Hamming windowHamming window
-
8/6/2019 Changes in Ppc[1]
18/31
Common window shapesCommon window shapes
-
8/6/2019 Changes in Ppc[1]
19/31
Linear prediction
Linear Predictive Coding (LPC) provides low-dimension representation ofspeech signal at oneframe
representation ofspectral envelope, not harmonics analytically tractable method some ability to identify formants
LPC models the speech signal at time point n as anapproximate linear combination of previousp samples :
where a1, a2, ap are constant for each frame ofspeech.
)()2()1()( 21 pnsansansans p } .
-
8/6/2019 Changes in Ppc[1]
20/31
If the error over a segment of speech is defined as
2
1
2
2
1
2
1
)()(
)(
! !
!
!
!
M
Mm
p
k
nkn
M
Mm
nn
kmsams
meE
where (sn = signal starting at time n)then we can find akby setting xEn/xak= 0 fork= 1,2,p,obtainingp equations andp unknowns:
pimsimskmsimsaM
Mm
nn
p
k
M
Mm
nnk ee! !! !
1)()()()(2
1
2
11
Error is minimum (not maximum) when derivative is zero, because
as any akchanges away from optimum value, error will increase.
-
8/6/2019 Changes in Ppc[1]
21/31
2
1
2
1)()( ! !
!
M
Mm
p
kkn kmsamsE
! !!!
!
2
1111
2 )()()()(2)(M
Mm
p
r
k
p
k
k
p
k
kn rmsakmsakmsamsmsE
!
!
!
!
!
2
1
1
1
22
1
11
2
)()()()(2
)()2()2()(2
)()1()1()(2)(M
Mmp
r
rpp
p
r
r
p
r
r
n
rmsapmsapmsams
rmsamsamsams
rmsamsamsamsms
E
.
Features: LPC
)()2(...)1()2()2()(2
)()1(...)1()1(2)1()(2)(0
2122
1111
2
1
2
1
pmsamsamsamsamsams
pmsamsamsamsamsamsmsa
E
p
M
Mm
pn
!!xx
!
-
! !
2
10)1()(...)1()3()1()2(
)()1(...)2()1()1()1(2)1()(2
32
21M
Mm p
p
mspmsamsmsamsmsa
pmsamsmsamsmsmsamsms
0)()1(2...)2()1(2)1()1(2)1()(22
1
21 !!
M
Mm
p pmsmsamsmsamsmsamsms
Repeat above equationns fora2, a3, ap
-
8/6/2019 Changes in Ppc[1]
22/31
? A pikmsimsaimsmsM
Mm
p
k
k
M
Mm
ee!
! !!
10)()(2)()(22
1
2
11
pikmsimsaimsmsM
Mm
p
k
k ee!
! !
10)()(2)()(22
11
pimsimskmsimsaM
Mm
p
k
M
Mm
k ee! !! !
1)()()()(2
1
2
11
-
8/6/2019 Changes in Ppc[1]
23/31
LPC Autocorrelation Method
Autocorrelation: meas
ure of periodicity ins
ignal
!
!2
1
)()(),(M
Mm
nnn kmsimskiJ
we can re-write equation as
piikia n
p
k
nk ee!! 1)0,(),(1 JJ
We can solve forakusing several methods. The most common
method in speech processing is the autocorrelation method:
Force the signal to be zero outside of interval 0 e m e N-1:
where w(m) is a finite-length window (e.g. Hamming) of length N
that is zero when less than 0 and greater than N-1. is the
windowed signal. As a result,
-
8/6/2019 Changes in Ppc[1]
24/31
!
!1
0
2 )(pN
m
nnmeE
because of setting the signal to zero outside the window, eqn
(6):
z
and this can be expressed as
and this is identical to the autocorrelation function for |i-k|
because
the autocorrelation function is symmetric, Rn(-k) = Rn(k) :
so the set of equations forak (eqn (7)) can be combo of(7) and
(12):
! eeee!
1
0 0
1
)()(),(p
m
nnnpkpikmsimskiJ
! ee
ee!
)(1
0 0
1))(()(),(
ki
m
nnn
pk
pikimsmskiJ
!
!
!
k
m
nnn
nn
kmsmskR
kiRki
1
0
)()()(
|)(|),(J
!
ee!p
k
nnkpiiRkiRa
1
1)(|)(|
-
8/6/2019 Changes in Ppc[1]
25/31
!
)(
)3()2()1(
)0()3()2()1(
)3()0()1()2()2()1()0()1()1()2()1()0(
3
2
1
p
a
a
a
a
p
p
p
p
p
p
n
n
n
n
pnnnn
nnnn
nnnn
nnnn
.
.
.
.
.
.....
.....
.
.
.
In matrix form, equation (14) looks like this:
There is a recursive algorithm to solve this: Durbins solution
LPC Durbins Solution
Solve a Toeplitz (symmetric, diagonal elements equal) matrixfor values ofE:
-
8/6/2019 Changes in Ppc[1]
26/31
)(
)1(2)(
)1()1()(
)(
)1(1
1
)1(
)0(
1
)1(
11
1)()(
)0(
1)(|)(|
p
jj
i
i
i
i
jii
i
j
i
j
i
i
i
ii
j
i
ji
p
k
nnk
aEkE
ijk
k
piEjiRiRk
RE
piiRkiR
E
EEE
E
E
E
!!
ee!
!
ee
!
!
ee!
!
!
We can compute spectral envelope magnitude from LPCparametersby evaluating the transfer function S(z) forz=ej[:
!
!! p
k
kj
k
j
j
ea
G
eA
G
eS
1
1)()( [
[
[
-
8/6/2019 Changes in Ppc[1]
27/31
Finding frequency envelope using LPC method for col =1:nbFrame % compute Mth-order autocorrelation function:
rx = zeros(1,Or+1)'; speech1 = M2(:,col)'+0.000001; for i=1:Or+1, rx(i) = rx(i) + speech1(1:n-i+1) * speech1(1+i-1:n)'; end % prepare the M by M Toeplitx covariance matrix: covmatrix = zeros(Or,Or); for i=1:Or, covmatrix(i,i:Or) = rx(1:Or-i+1)'; covmatrix(i:Or,i) = rx(1:Or-i+1); end
% solve "normal equations" for prediction coeffs Acoeff s = - covmatrix \ rx(2:Or+1); Alp = [1,Acoeffs']; % LP polynomialA(z) dbenvlp(:,col) = 20*log(abs(freqz(1,Alp,n*2)')); end
-
8/6/2019 Changes in Ppc[1]
28/31
Dynamic Time Warping (DTW The SELP analysis is evaluated using a Dynamic time wrapping T = { t1,t2,,ti,.tn} , R= {r 1,r2,,ri,.,rm}
T= Test Signal or Unknown Signal
R= Reference Signal or known Signal
A matrix of m x n is created
C(x, y) = MIN [C(x + 1, y) , C(x + 1,y + 1) , C(x, y + 1) ] + D(x, y)
-
8/6/2019 Changes in Ppc[1]
29/31
A matrix of m x n is created
-
8/6/2019 Changes in Ppc[1]
30/31
Dynamic Time Warping (DTW)
-
8/6/2019 Changes in Ppc[1]
31/31
Dynamic Time Warping (DTW