speaker recognition using gaussian mixture model
DESCRIPTION
This presentation slide contains, Introduction to Gaussian mixture model and its application in identifying speaker.TRANSCRIPT
![Page 1: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/1.jpg)
1
GMMGaussian mixture models
04/10/2023
Saurab Dulal
IOE, pulchowk Campus
![Page 2: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/2.jpg)
2
Introduction to GMM• Gaussian“Gaussian is a
characteristic symmetric "bell curve" shape that quickly falls off towards 0 (practically)”
• Mixture Model“mixture model is a
probabilistic model which assumes the underlying data to belong to a mixture distribution”
![Page 3: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/3.jpg)
3
Introduction to GMM• Mathematical Description of GMM
p(x) = w1 p1 (x) + w2p2 (x) + w3 p3 (x) ……… +wn pn (x)
where p(x) = mixture component
w1, w2 ….. wn = mixture weight or mixture coefficient
pi (x) = Density functions
Fig :- Image
showing
Best fit
Gaussian
Curve
![Page 4: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/4.jpg)
4
Introduction to GMM“The most common mixture distribution is the Gaussian
(Normal) density function, in which each of the mixture components are Gaussian distributions, each with their own mean and variance parameters.”
p(x) = w1N( x | µ1∑1 )+ w1N( x | µ2∑2 )… +w1N( x | µn∑n )
µi ‘s are means and ∑i ‘s are covariance-matrix of individual components(probability density function)
G1,w1 G2,w2
G3,w3
G4,w4
G5,w5
![Page 5: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/5.jpg)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(
x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x)
![Page 6: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/6.jpg)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(
x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x)
![Page 7: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/7.jpg)
-5 0 5 100
0.5
1
1.5
2
Component Modelsp(
x)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x)
![Page 8: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/8.jpg)
8
GMM for Speaker Recognition
Motivation • Interpretation that Gaussian component
represent some general speaker –dependent spectral shapes
• Capabilities of Gaussian mixture to model arbitrary densities
![Page 9: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/9.jpg)
9
Description of SR-using GMM
• Speech Analysis• Model Description• Model Interpretations• Maximum Likelihood Parameters Estimation• Speaker Identification
![Page 10: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/10.jpg)
10
Speech Analysis
• Linear predictive coding(LPC)• Mel-scale filter-bank(to reducenoise)
Analysis is ended with the generation of Cepstrum coefficients x1
’, x2’
x3’….xn’
A cepstrum is the result of taking the Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal.
Cosine transform
![Page 11: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/11.jpg)
2000/05/03 11
Model Description
Gaussian Mixture Density
)()|(1
xbpxpM
iii
Where x
D-dimensional random vector
)()'(
2
1exp
)2(
1)( 1
212 iii
iDi xxxb
iiip ,, Mi ,,1
Nodal, Grand,Global
Nodal, diagonal (this)
Covariance matrix
Mean
Component Density
Speaker Model
![Page 12: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/12.jpg)
12
Choice of Covariance Matrix• Nodal Covariance One co-variance matrix per Gaussian component
• Grand CovarianceOne co-variance matrix for all Gaussian component
• Global Covariance single co-variance matrix shared by all speaker component
![Page 13: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/13.jpg)
13
Model Interpretation
• Intuitive notion Acoustic classes(vowels, nasals, fricatives) reflects
some general speaker-dependent vocal tract configuration that are useful for characterizing speaker-identity
• GMM have ability to form smooth approximation to arbitrary shaped density
• It doesn’t only have smooth approx but also multimodal nature of densities
![Page 14: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/14.jpg)
2000/05/03 14
ML-Parameters EstimationStep:
1. Beginning with an initial model
2. Estimate a new model such that
Mixture density
3. Repeated 2. until certain threshold is reached.
…Maximum Likelihood
)|()|( XpXp
![Page 15: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/15.jpg)
2000/05/03 15
(Mixture Weights)
(Means)
(Variances)
T
tti xip
Tp
1
),|(1
T
t t
T
t tti
xip
xxip
1
1
),|(
),|(
2
1
1
22
),|(
),|(iT
t t
T
t tti
xip
xxip
M
k tkk
tiit
xbp
xbpxip
1)(
)(),|(
Mixture
Density
ComponentDensity
and refers to arbitrary elements of vectors ii
,2 and tx
ii ','2
'tx
and
![Page 16: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/16.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4ANEMIA PATIENTS AND CONTROLS
Red Blood Cell Volume
Red
Blo
od C
ell H
emog
lobi
n C
once
ntra
tion
![Page 17: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/17.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 1
![Page 18: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/18.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 3
![Page 19: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/19.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 5
![Page 20: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/20.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 10
![Page 21: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/21.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 15
![Page 22: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/22.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
EM ITERATION 25
![Page 23: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/23.jpg)
0 5 10 15 20 25400
410
420
430
440
450
460
470
480
490LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS
EM Iteration
Lo
g-L
ike
liho
od
![Page 24: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/24.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell
He
mo
glo
bin
Co
nce
ntr
atio
n
ANEMIA DATA WITH LABELS
Anemia Group
Control Group
![Page 25: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/25.jpg)
2000/05/03 25
Speaker IdentificationA group of speakers S = {1,2,…,S} is represented by GMM’s λ1, λ2, …, λs, the obective is to find the speaker model which has the maximum a posteriori probability for a given observation sequence
)(
)Pr()|(maxarg)|Pr(maxargˆ11 Xp
XpXS kk
Skk
Sk
)|(maxargˆ1
kSk
XpS
)|(logmaxargˆ1
1kt
T
tSk
xpS
T
ttiikt xbpxp
1
)()|( which
logtake
![Page 26: Speaker Recognition using Gaussian Mixture Model](https://reader033.vdocuments.mx/reader033/viewer/2022061206/54825961b47959d80c8b4751/html5/thumbnails/26.jpg)
26
ReferencesD. A. Reynolds and R. C. Rose, “Robust Text- Independent
Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-83,January 1995.
• http://en.wikipedia.org/wiki/Probability_density_function• http://crsouza.blogspot.com/2010/10/gaussian-mixture-
models-and-expectation.html• https://www.ll.mit.edu/mission/communications/ist/public
ations/0802_Reynolds_Biometrics-GMM.pdf• http://statweb.stanford.edu/~tibs/stat315a/LECTURES/e
m.pdf• http://eprints.pascal network.org/archive/00008291/01/S
oftAssignReconstr_ICIP2011.pdf• http://home.deib.polimi.it/matteucc/Clustering/tutorial_ht
ml/kmeans.html