![Page 1: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/1.jpg)
CS 224S / LINGUIST 285Spoken Language Processing
Dan JurafskyStanford University
Spring 2014
Lecture 15: Speaker Recognition
Lots of slides thanks to Douglas Reynolds
![Page 2: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/2.jpg)
Why speaker recognition?Access Control
physical facilitieswebsites, computer networks
Transaction Authenticationtelephone bankingremove credit card purachse
Law Enforcementforensicssurveillance
Speech Data Miningmeeting summarizationlecture transcription
slide text from Douglas Reynolds
![Page 3: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/3.jpg)
Three Speaker Recognition Tasks
slide from Douglas Reynolds
![Page 4: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/4.jpg)
Two kinds of speaker verification
Text-dependent Users have to say something specificeasier for system
Text-independentUsers can say whatever they wantmore flexible but harder
![Page 5: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/5.jpg)
Two phases to speaker detection
slide from Douglas Reynolds
![Page 6: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/6.jpg)
Detection: Likelihood RatioTwo-class hypothesis test:
H0: X is not from the hypothesized speakerH1: X is from the hypothesized speaker
Choose the most likely hypothesis
Likelihood ratio test:
slide from Douglas Reynolds
![Page 7: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/7.jpg)
Speaker IDLog-Likelihood Ratio ScoreLLR= Λ =log p(X|H1) − log p(X|H0)
Need two models Hypothesized speaker model for H1 Alternative (background) model for H0
slide from Douglas Reynolds
![Page 8: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/8.jpg)
How do we get H1?
Pool speech from several speakers and train a single model: a universal background model (UBM)can train one UBM and use as H1 for all
speakersShould be trained using speech representing
the expected impostor speechSame type speech as speaker enrollment
(modality, language, channel)Slide adapted from Chu, Bimbot, Bonastre, Fredouille, Gravier, Magrin-Chagnolleau, Meignier, Merlin, Ortega-Garcia, Petrovska-Delacretaz, Reynolds
![Page 9: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/9.jpg)
How to compute P(H|X)?
Gaussian Mixture Models (GMM)The traditional best model for text-
independent speaker recognitionSupport Vector Machines (SVM)
More recent use of discriminative model
![Page 10: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/10.jpg)
Form of GMM/HMM depends on application
slide from Douglas Reynolds
![Page 11: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/11.jpg)
GMMs for speaker recognitionA Gaussian mixture model
(GMM) represents features as the weighted sum of multiple Gaussian distributions
Each Gaussian state i has aMean CovarianceWeight
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
Dim 1Dim 2
Model
( | )p x
iμ
i
![Page 12: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/12.jpg)
Recognition SystemsGaussian Mixture Models
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
Parameters iμ
i
iw
Dim 1Dim 2
( )p x
![Page 13: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/13.jpg)
Recognition SystemsGaussian Mixture Models
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
Model Components
Parameters
Dim 1Dim 2
( )p x
![Page 14: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/14.jpg)
GMM trainingDuring training, the
system learns about the data it uses to make decisionsA set of features are
collected from a speaker (or language or dialect)
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
Training Features
Dim 1Dim 2
Dim 1Dim 2
Model
1x2x
( )p x
![Page 15: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/15.jpg)
Recognition Systems for Language, Dialect, Speaker ID
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
Model Components
Languages,Dialects,
or Speakers
Parameters
1Model
2Model
3Model
( | )Cp x
Dim 1Dim 2
In LID, DID, and SID,we train a set of target models
for each dialect, language, or speakerC
![Page 16: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/16.jpg)
Recognition SystemsUniversal Background Model
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
Model Components
Parameters
( | )Cp x
We also train a universal backgroundmodel representing all speechC
Model C
Dim 1Dim 2
![Page 17: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/17.jpg)
Recognition SystemsHypothesis Test
Given a set of test observations, we perform a hypothesis test to determine whether a certain class produced it
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
0
1
: is from the hypothesized class
: is not from the hypothesized classtest
test
H X
H X
1 2{ , , , }test KX x x x
Dim 1Dim 2
![Page 18: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/18.jpg)
Recognition SystemsHypothesis Test
Given a set of test observations, we perform a hypothesis test to determine whether a certain class produced it
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
0
1
: is from the hypothesized class
: is not from the hypothesized classtest
test
H X
H X
1 2{ , , , }test KX x x x
0 ?H
1 ?H
1( | )p x
( | )Cp x
Dim 1Dim 2
Dim 1Dim 2
Dim 1Dim 2
![Page 19: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/19.jpg)
Recognition SystemsHypothesis Test
Given a set of test observations, we perform a hypothesis test to determine whether a certain class produced it
Nicolas Malyska, Sanjeev Mohindra, Karen Lauro, Douglas Reynolds, and Jeremy Kepner
1 2{ , , , }test KX x x x
1( | )p x
( | )Cp x
Dim 1Dim 2
Dim 1Dim 2
Dim 1Dim 2
Dan?
UBM (not Dan)?
![Page 20: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/20.jpg)
More details on GMMsInstead of training speaker model on only speaker dataAdapt the UBM to that speaker
takes advantage of all the dataMAP adaptation: new mean of each Gaussian is a weighted
mix of the UBM and the speakerWeigh the speaker more if we have more data:
μi =α Ei(x) + (1−α) μi α=n/(n+16)
![Page 21: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/21.jpg)
Gaussian mixture modelsFeatures are normal MFCC
can use more dimensions (20 + deltas)UBM background model: 512–2048
mixturesSpeaker’s GMM: 64–256 mixturesOften combined with other classifiers
in mixture-of-experts
![Page 22: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/22.jpg)
SVM
Train a one-versus-all discriminative classifierVarious kernelsCombine with GMM
![Page 23: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/23.jpg)
Other features
ProsodyPhone sequencesLanguage Model features
![Page 24: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/24.jpg)
Doddington (2001)
Word bigrams can be very informative about speaker identity
![Page 25: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/25.jpg)
Evaluation Metric
Trial: Are a pair of audio samples spoken by the same person?
Two types of errors: False reject = Miss: incorrectly reject a true trial
Type-I error False accept: incorrectly accept false trial
Type-II errorPerformance is trade-off between these two errors
Controlled by adjustment of the decision threshold
slide from Douglas Reynolds
![Page 26: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/26.jpg)
ROC and DET curves
slide from Douglas Reynolds
P(false reject) vs. P(false accept) shows system performance
![Page 27: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/27.jpg)
DET curve
slide from Douglas Reynolds
Application operating point depends on relative costs of the two errors
![Page 28: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/28.jpg)
Evaluation tasks
slide from Douglas Reynolds
Performance numbers depend on evaluation conditions
![Page 29: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/29.jpg)
Rough historical trends in performance
slide from Douglas Reynolds
![Page 30: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/30.jpg)
Milestones in the NIST SRE Program
1992 – DARPA: limited speaker id evaluation1996 – First SRE in current series2000 – AHUMADA Spanish data, first non-English speech2001 – Cellular data 2001 – ASR transcripts provided2002 – FBI “forensic” database2005 – Mutiple languages with bilingual speakers2005 – Room mic recordings, cross-channel trials2008 – Interview data2010 – New decision cost function: lower FA rate region2010 – High and low vocal effort, aging2011 –broad range of conditions, included noise and reverbFrom Alvin Martin’s 2012 talk on the NIST SR Evaluations
![Page 31: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/31.jpg)
MetricsEqual Error Rate
Easy to understandNot operating point of interest
FA rate at fixed miss rateE.g. 10%May be viewed as cost of listening to false
alarmsDecision Cost Function
From Alvin Martin’s 2012 talk on the NIST SR Evaluations
![Page 32: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/32.jpg)
Decision Cost Function CDet
Weighted sum of miss and false alarm error probabilities:
CDet = CMiss × PMiss|Target × PTarget
+ CFalseAlarm× PFalseAlarm|NonTarget × (1-PTarget)
Parameters are the relative costs of detection errors, CMiss and CFalseAlarm, and the a priori probability of the specified target speaker, Ptarget:
CMiss CFalseAlarm PTarget
‘96-’08 10 1 0.012010 1 1 .001
From Alvin Martin’s 2012 talk on the NIST SR Evaluations
![Page 33: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/33.jpg)
Accuracies
From Alvin Martin’s 2012 talk on the NIST SR Evaluations
![Page 34: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/34.jpg)
How good are humans?
Survey of 2000 voice IDs made by trained FBI employeesselect similarly pronounced wordsuse spectrograms (comparing formants, pitch, timing)listen back and forth
Evaluated based on "interviews and other evidence in the investigation" and legal conclusions
No decision 65.2% (1304)Non-match 18.8% (378) FR = 0.53% (2)Match 15.9% (318) FA = 0.31% (1)
Bruce E. Koenig. 1986. Spectrographic voice identification: A forensic survey. J. Acoust. Soc. Am, 79(6)
![Page 35: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/35.jpg)
Speaker diarization
Conversational telephone speech2 speakers
Broadcast newsMany speakers although often in dialogue (interviews)
or in sequence (broadcast segments).
Meeting recordingsMany speakers, lots of overlap and disfluencies
Tranter and Reynolds 2006
![Page 36: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/36.jpg)
Speaker diarization
Tranter and Reynolds 2006
![Page 37: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/37.jpg)
Step 1: Speech Activity Detection
Meetings or broadcast:Use supervised GMMs
two models: speech/non-speechor could have extra models for music, etc.
Then do Viterbi segmentation, possibly withminimum length constraints orsmoothing rules
TelephoneSimple energy/spectrum speech activity detection
State of the art:Broadcast: 1% miss, 1-2% false alarmMeeting: 2% miss, 2-3% false alarm Tranter and Reynolds 2006
![Page 38: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/38.jpg)
Step 2: Change Detection1. Look at adjacent windows of data2. Calculate distance between them3. Decide whether windows come from same sourceTwo common methods:
To look for change points within window use likelihood ratio test to see if better modeled by one distribution or two. If two, insert change and start new window thereIf one, expand window and check again
represent each window by a Gaussian, compare neighboring windows with KL distance, find peaks in distance function, threshold
Tranter and Reynolds 2006
![Page 39: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/39.jpg)
Step 3: Gender Classification
Supervised GMMsIf doing Broadcast news, also do bandwidth
classification (studio wideband speech versus narrowband telephone speech)
Tranter and Reynolds 2006
![Page 40: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/40.jpg)
Step 4: ClusteringHierarchical agglomerative clustering1. initialize leaf clusters of tree with speech segments;2. compute pair-wise distances between each cluster;3. merge closest clusters;4. update distances of remaining clusters to new cluster;5. iterate steps 2-4 until stopping criterion is met
Tranter and Reynolds 2006
![Page 41: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/41.jpg)
Step 5: ResegmentationUse final clusters and non-speech modelsTo resegment data via Viterbi decodingGoal:
refine original segmentationfix short segments that may have been removed
Tranter and Reynolds 2006
![Page 42: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/42.jpg)
TDOA featuresFor meetings, with multiple-microphonesTime-Delay-of-Arrival (TDOA) features
correlate signals from mikes and figure out time shiftused to sync up multiple microphonesand as a feature for speaker localization
assume the speaker doesn’t move, so they are near the same microphone
![Page 43: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/43.jpg)
EvaluationSystems give start-stop times of speech segments with
speaker labels nonscoring “collar” of 250 ms on either side
DER (Diarization Error Rate)missed speech (% of speech in the ground-truth but not in the hypothesis)false alarm speech (% of speech in the hypothesis but not in the ground-truth) speaker error (% of speech assigned to the wrong speaker)
Recent mean DER for Multiple Distant Mikes (MDM): 8-10% Recent mean DER for SDM: 12-18%
![Page 44: CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 15: Speaker Recognition Lots of slides thanks to](https://reader036.vdocuments.mx/reader036/viewer/2022070306/55170b02550346fe558b52cd/html5/thumbnails/44.jpg)
Summary: Speaker Recognition Tasks
slide from Douglas Reynolds