speech processing lab uchechukwu ofoegbu, dissertation defense 10/24/2015 1 model formation and...

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2311

Model Formation and Classification Model Formation and Classification Techniques For Conversation-based Techniques For Conversation-based

Speaker DiscriminationSpeaker Discrimination

Uchechukwu O. Ofoegbu

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.


04/20/2322

Acknowledgement Acknowledgement



Dennis Silage, Ph.D.Iyad Obeid, Ph.D. The audience, for being a part of thisThe audience, for being a part of this

ECE faculty and staff, for your great ECE faculty and staff, for your great supportsupport

Members and Friends of the Speech Members and Friends of the Speech Lab, for your valuable contributionsLab, for your valuable contributions

The Air Force Research Labs, for The Air Force Research Labs, for financially supporting most of this financially supporting most of this

research workresearch work

My committee members, for your time My committee members, for your time and commitment to my researchand commitment to my researchDr Y, the best advisor one could hope forDr Y, the best advisor one could hope forMy family, for being thereMy family, for being there


04/20/2333

Presentation OutlinePresentation Outline

• Introduction– Challenges of Conversational Data– General Applications of Research– Novelty of Research




• Introduction

• Evaluation Databases

• Modeling Speakers– Traditional Speaker Modeling– Proposed Method– Features Used– Distance Used

• Introduction


• Modeling Speakers

• Application Systems– Unsupervised Speaker Indexing – Speaker Count– Generalized Speaker Indexing

• Introduction

• Evaluation Databases– HTIMIT– SWITCHBOARD– New Conversations Database

• Introduction



• Application Systems

• Fusion of Distance Measures– “Optimized T Distance – Decision-Based Combination– Weighted Decision-Based Combination

• Introduction




• Fusion of Distance Measures

• Summary

• Introduction




• Fusion of Distance Measures

• Summary

• Further Research


04/20/2344

Introduction

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research


04/20/2355

Challenges of Conversational DataChallenges of Conversational Data

• No a priori information available from participating speakers– Training is impossible

• No a priori knowledge of change points

• Speakers alternate very rapidly– Limited amounts of data for single speaker

representations

• Distortion– Channel noise, co-channel data

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/2366

Proposed SolutionsProposed Solutions

1. Selective creation of data models

2. Distance-Based Model Comparison

3. Development of application-specific system

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/2377

Novelty of this ResearchNovelty of this Research

1. Selective creation of data models

2. Distance-Based Model Comparison

3. Development of application-specific system

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/2388

ApplicationsApplications

• Monitoring criminal conversations

• Forensics

• Automated Customer Services

• Storage/Search/Retrieval of Audio Data

• Military Activities

• Conference calls

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/2399

DatabasesDatabases

• Standard Speaker Discrimination Databases– HTMIT– Switchboard

• Temple Conversations Database (TCD)

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231010

Modeling Speakers

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231111

Traditional Speaker ModelingTraditional Speaker Modeling

• Examples– Gaussian Mixture Models– Hidden Markov Models– Neural Networks– Prosody-Based Models

• Disadvantages– Require large amounts– Sometimes require training procedure– Relatively complex

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231212

Conversational Data ModelingConversational Data Modeling

• Current Method– Equal segmentation of data– Indiscriminate use of data

• Problems– Change points unknown– Not all speech is useful– Poor performance

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231313

Proposed Speaker ModelingProposed Speaker Modeling

SSSS

VVVV

UUUU

VVVV

UUUU

VVVV

UUUU

…………

UUUU

VVVV

SSSS

VVVV

VVVV

VVVV

VVVV

VVVV

VVVV

VVVV

VVVV. . .. . .

SEGMENT 1SEGMENT 1 SEGMENT MSEGMENT M

FEATURE FEATURE COMPUTATIONCOMPUTATION


MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION


MODEL 1MODEL 1MODEL 1MODEL 1

MODEL MMODEL MMODEL MMODEL M. . .. . .

. . .. . .





Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231414

Proposed Speaker ModelingProposed Speaker Modeling

• Why voiced only?– Same speech class compared– Contains the most information

• What’s the appropriate number of phonemes?

– Large enough to sufficiently represent speakers

– Small enough to avoid speaker overlap

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231515

Features ConsideredFeatures Considered

• Linear Predictive Cepstral Coefficients

– Model the vocal tract

• Mel-Scale Frequency Cepstral Coefficients

– Model the human auditory system

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231616

Distance MeasurementsDistance Measurements

Same speaker distancesSame speaker distances

Different speaker distancesDifferent speaker distances

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231717

Distances UsedDistances Used

• Mahalanobis DistanceMahalanobis Distance

• Hotelling’s T-Square StatisticsHotelling’s T-Square Statistics

• Kullback-Leibler DistanceKullback-Leibler Distance

• Bhattacharyya DistanceBhattacharyya Distance

• Levene’s TestLevene’s Test

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231818

Analysis of Cepstral FeaturesAnalysis of Cepstral Features

• Mahalanobis Distance

0 1 2 3 40

0.02

0.04

0.06

0.08

0.1

Mahalanobis Distance Comparisons

LPCC-based Distances

Pro

bability o

f O

ccurr

ence

Intra

Inter

0 1 2 3 40

0.02

0.04

0.06

0.08

0.1

0.12

MFCC-based Distances

Pro

bability o

f O

ccurr

ence

Intra

InterIntroduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/231919

Best Number of Phonemes?Best Number of Phonemes?

0 5 10 15 20 250

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Number of segments

Mahala

nobis

dis

tance

Speaker Differentiation with Respect to Data Size

Same Speaker

Different Speaker

Number of Phonemes

Features Used - LPCCFeatures Used - LPCC

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232020

Application Systems

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232121

Unsupervised Speaker IndexingUnsupervised Speaker Indexing

• The Restrained-Relative Minimum Distance (RRMD) Approach

0 D0 D1,21,2 D D1,31,3 … …

DD2,12,1 0 D 0 D2,32,3 … …

DD3,13,1 D D3,23,2 0 … 0 …

……

0 D0 D1,21,2 D D1,31,3 … …

DD2,12,1 0 0 DD2,32,3 … …

DD3,13,1 DD3,23,2 0 … 0 …

……

REFERENCE MODELSREFERENCE MODELS

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232222

Unsupervised Speaker IndexingUnsupervised Speaker Indexing

• The Restrained-Relative Minimum Distance (RRMD) Approach

Reference 2Reference 2

Restraining Condition

Restraining Condition

Same Speaker

PassedPassedRelativeDistance

Condition

RelativeDistance

Condition

FailedFailed

Passed

FailedFailed

Unusable DataReference 1

Reference 1

Observe distanceObserve distance

Min. Distance

Same Speaker?

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232323

RRMD ApproachRRMD Approach

• Restraining Condition

– Distance Likelihood Ratio

DLR > 1 Same Speaker

DLR < 1 Check Relative

Distance Condition

),|(

),|(

22

11

xf

xfDLR

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232424


• Relative Distance Condition

– Relative Distance:

Drel = dmax – dmin

– Drel > threshold Same Speaker



dmin dmax

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232525

Experiments and ResultsExperiments and Results

• Experiments

– HTIMIT used for obtaining likelihood ratio parameters

• 1000 same speaker and 1000 different speaker utterances computed

– 100 conversations from Switchboard database used for evaluation

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232626

Indexing Results - Mahalanobis Indexing Results - Mahalanobis

MFCCMFCCLPCCLPCC

0 0.1 0.2 0.3 0.4 0.50

2

4

6

8

10

12

Classification Error with Respect to Relative Distance Threshold Mahalanobis

Per

cent

Err

or

Threshold

Indexing Error

Undecided Error

Equal Error

Introduction


Modeling Speakers

Application Systems


Summary

Further Research

0.1 0.2 0.3 0.4 0.5

2

4

6

8

10

12

14

Classification Error with Respect to Mean-Difference Threshold Mahalanobis

Per

cent

Err

or

Threshold

Indexing Error

Undecided Error


04/20/232727

Indexing Results – T-Square Indexing Results – T-Square

MFCCMFCCLPCCLPCC

0 50 100 150 2000

2

4

6

8

10

12

14

16

18

Classification Error with Respect to Mean-Difference Threshold T-Square

accu

racy

Threshold

Indexing Error

Undecided Error

0 50 100 150 2000

2

4

6

8

10

12

Classification Error with Respect to Relative Distance Threshold T-Square

Perc

en

t E

rro

r

Threshold

Indexing Error

Undecided Error

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/232828

Indexing Results - Bhattacharyya Indexing Results - Bhattacharyya

MFCCMFCCLPCCLPCC

0.5 0.6 0.7 0.8 0.9 13

3.5

4

4.5

5

5.5

6

6.5

7

7.5

Classification Error with Respect to Mean-Difference Threshold Bhattacharyya

Perc

ent

Err

or

Threshold

Indexing Error

Undecided Error

Equal error

Introduction


Modeling Speakers

Application Systems


Summary

Further Research

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

6

8

10

12

14

16

18

20

22

24

Classification Error with Respect to Mean-Difference Threshold Bhattacharyya

accu

racy

Threshold

Indexing Error

Undecided Error


04/20/232929

Indexing Results - SummaryIndexing Results - Summary

• Mahalanobis distance yielded best results

• LPCCs outperformed MFCCs

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233030

Speaker Count SystemSpeaker Count System

• The Residual Ratio Algorithm (RRA)

• Process is repeated K-1 times for counting up to K speakers

Reference Model Reference Model Selected RandomlySelected Randomly

DLR-based DLR-based Model Model ComparisonComparison


. . .. . .

DLR-based DLR-based Model Model ComparisonComparison


Too little data Removed, select

Another modelIntroduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233131

Speaker CountSpeaker Count

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

ARR

Pro

babi

lity

ARR Probability Distributions

1 Speaker2 Speakers3 Speakers4 Speakers

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

ARR

Pro

babi

lity


1 Speaker

2 Speakers

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

ARR

Pro

babi

lity


1 Speaker2 Speakers3 Speakers

Added Residual Ratio:Added Residual Ratio:

• Is the sum of the residual ratios in all elimination stages

• Should be higher for greater number of speakers

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233232

Experiments and Results Experiments and Results

• Experiments

– 4000 conversations generated from HTIMIT

– All 40 conversations from new database used

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233333

Speaker Count Results - HTIMIT Speaker Count Results - HTIMIT

MFCCMFCCLPCCLPCC

RRA Speaker Count Accuracy

0

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cent

Cor

rect

Bhattacharrya

T-Square

Mahalanobis

Speaker Count Accuracy - MFCC

0

20

40

60

80

100

1 or 2+ 1, 2 or3+

1, 2, 3or 4

Accuracy Method

Perc

en

t C

orr

ect

Bhattacharrya

T-Square

Mahalanobis

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233434

Speaker Count Results - HTIMIT Speaker Count Results - HTIMIT

MFCCMFCCLPCCLPCC


01020

3040506070

8090

100

1 or2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

Mahalanobis

OTD

DBC

WDBC

Introduction


Modeling Speakers

Application Systems


Summary

Further Research

Speaker Count Accuracy - LPCC

35

45

55

65

75

85

95

1 or 2+ 1, 2 or 3+ 1, 2, 3 or4

Accuracy Method

Per

cen

t C

orr

ect

Mahalanobis

OTD

DBC

WDBC


04/20/233535

Speaker Count Results – TCDSpeaker Count Results – TCD

MFCCMFCCLPCCLPCC

RRA Speaker Count Accuracy - LPCC

0

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

Bhattacharrya

T-Square

Mahalanobis

RRA Speaker Count Accuracy - MFCC

0

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

Bhattacharrya

T-Square

Mahalanobis

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233636

Speaker Count Results – TCDSpeaker Count Results – TCD

MFCCMFCCLPCCLPCC

Speaker Count Accuracy - LPCC

20

30

40

50

60

70

80

90

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

T-Square

OTD

DBC

WDBC


20

30

40

50

60

70

80

90

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

T-Square

OTD

DBC

WDBC

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233737

Cross EvaluationCross Evaluation

HTIMIT – LPCCs with the WDBCHTIMIT – LPCCs with the WDBCTCD – MFCCs with the T-SquareTCD – MFCCs with the T-Square

Introduction


Modeling Speakers

Application Systems


Summary

Further Research

Speaker Count Accuracy - Cross Evaluation

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Pe

rce

nt

Co

rre

ct

HTIMIT TCD


04/20/233838

Speaker Counting-IndexingSpeaker Counting-Indexing

• The Residual Ratio speaker count algorithm is applied

• Test models are associated with their matching reference models

• Unmatched models are assigned to the references from which it has the minimum distance.

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/233939

Speaker Counting /Indexing ResultsSpeaker Counting /Indexing Results

Speaker Indexing Accuracy

0102030405060708090

100

Bhatta

char

rya

T-Squ

are

Mah

alan

obis

OTDDBC

WDBC

Pe

rce

nt

Ac

cu

rac

y

Solid - HTMIT; Patterned – TCDSolid - HTMIT; Patterned – TCD

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234040


Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234141

Correlation AnalysisCorrelation Analysis

20 40 60 80 100120140Levenne

2 4 6Bhrattacharyya

50 100 150 200KL

0 100 200 300TSquared

1 2 3

20406080

100120140

Mahalanobis

Leve

nne

2

4

6

Bhr

atta

char

yya

50

100

150

200

KL

0

100

200

300

TS

quar

ed

1

2

3

Mah

alan

obis

Draftsmans Dispalay of Distances (LPCC)

Intra

Inter

Draftsman’s Display - LPCC

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234242

““Best Distance”Best Distance”

• Optimal Criteria for Fusion of Distances

– Maximize inter-speaker variation

– Minimize intra-speaker variation

– Maximize T-test value between inter-class distance distributions

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234343

Decision Level FusionDecision Level Fusion

D1 => matchD1 => match

D2 => no matchD2 => no match



Match = ¾Match = ¾

No Match = ¼No Match = ¼

Final Decision = MatchFinal Decision = Match

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234444

Weighted Decision Level FusionWeighted Decision Level Fusion

ii

ii T

T

Ti = T-value corresponding to each distance

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234545

Summary

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234646

Research GoalResearch Goal

• To differentiate between speakers in a conversation– To determine the number of speakers present– To determine who is speaking when

• To overcome the following challenges– No a priori information– Limited data size– No knowledge of change points– Co-channel speech

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234747

Summary of AccomplishmentsSummary of Accomplishments

• Novel model formation technique

• Three novel approaches for conversations-based speaker differentiation

• Distance combination techniques to enhance performance

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234848

ObservationsObservations

• Mahalanobis Distance, LPCCs optimal for standard databases

• T-Square Distance, MFCCs optimal for new database

• Best fusion technique: Weighted voting combination technique most efficient

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/234949

ConclusionConclusion

• Developed system yields about 6% EER whereas state of the art speaker indexing systems yield about 10% error rate.

• Methods for discrimination between speakers (speaker count or indexing) in CONVERSATIONS with more than two speakers have been introduced.

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/235050

Further Research

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/235151

Further ResearchFurther Research

• Investigation of prosodic speaker discrimination features

• Improving model formation technique by determining speaker change-points a priori

• Exploring the use of individual phonemes to form models

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/235252

Further Research, cont’dFurther Research, cont’d

• Investigating the use of unvoiced speech, cautiously, in the formation of models

• Speech enhancement techniques to handle distorted data

• Implementation of other fusion techniques such as KL measure of divergence

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/235353

PublicationsPublications• U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt,

“Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007

• U. Ofoegbu, A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006

• U. Ofoegbu, A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006.

• U. Ofoegbu, A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006.





04/20/235454

ACKNOWLEDGEMENTACKNOWLEDGEMENT

To the greatest teacher in the world, and the one who has made the most impact in my life,

Dr. Robert E. Yantorno.

To my best friend and the love of my life, Dr. Jude C. Abanulo

To Dr. Brian Butz, Dr. John Helferty, Dr. Saroj Biswas and Dr. Henry Sendaula

To my dissertation committee members, Dr. Iyad Obeid and Dr. Dennis Silage, and to Dr.

Rena Krakow.

To my friend, Ananth Iyer

To Abdoul Fall, Joe Fitschgrund, Angela Linse and Ralph Oyini; and to the members of the

Speech Processing Lab and the faculty of the electrical engineering department

To engineering administrators, Tamika Butler, Carol Dahlberg, Yvette Gibson and Cheryl

Sharp, and to Louise, day time janitress for the engineering building

To the Temple students who volunteered as participants in the New Conversations

Database

To Temple

To the Air Force Research Labs at Rome – Financial supporters of most of the research

To my parents, Ugo & Joseph Ofoegbu; my siblings, Amaka & Humphrey Onyendi, Nene,

Obinna and Chibuzor Ofoegbu; and my grandmother, Cordelia Osuji

To God

Thank you.




or give suggestionsor give suggestions


04/20/235555




Brett Smolenski, Ph.D.


04/20/235656

Cepstral AnalysisCepstral Analysis

Frequency Analysis of Speech Excitation ComponentExcitation Component Vocal Tract ComponentVocal Tract Component

XX==

Slowly varying formantsSlowly varying formants

Fast varying harmonicsFast varying harmonics

STFT of SpeechSTFT of Speech

==

==

++

++

Log of STFTLog of STFT Log of ExcitationLog of Excitation Log of Vocal Tract ComponentLog of Vocal Tract Component

IDFT of Log of IDFT of Log of STFTSTFT

Vocal tractVocal tract ExcitationExcitation


04/20/235757

Cepstral FeaturesCepstral Features

• Linear Predictive Cepstral Coefficients

– Obtained Recursively from LPC Coefficients

Let LPC vector = [a0 a1 a2 …ap] and

LPCC vector = [c0 c1 c2 …cp c0 … c1 c2 …cn-1]

20 ln Ec

nmpcam

kmc

m

kkmkm

,

)(1

1)(

pmcakmm

acm

kkmkmm

1,)(

1 1

1)(


04/20/235858

Conversational Data ModelingConversational Data Modeling

• Current Method– Equal Segmentation of Data– Indiscriminate use of data

• Problems– Change points unknown– Not all speech is useful


04/20/235959


• Intra-speaker and inter-speaker distance lengths are always equal, therefore:

P = sum of the covariance matrices of the two classes.

λ1 = maximum eigenvalue obtained by solving the generalized eigenvalue problem:

Q = is the square of the distance between the mean vectors of the two classes

22

21

21)(

aT )( 21

1

1

Pa

k

1

211

1 )( Pk

aQaP 11


04/20/236060


Distance Measure 1

Dis

tan

ce M

easu

re 2


04/20/236161


• Relative Distance Condition

0 100 200 300 400 500 6000

0.05

0.1

0.15

0.2

T-Square Statistics

Pro

bab

ilit

y

Distribution of T-Square Statistics for N = 5

Intra Speaker

Inter Speaker

D rel

Preferred Customer

Are you sure of the distance on this slide?


04/20/236262

Modeling AnalysisModeling Analysis

0 0.5 1 1.5 2 2.5 3 3.50

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Distance Value

Pro

babi

lity

Distribution of Mahalanobis Distance - Utterance Based

Same Speaker

Different Speaker

N = 20 – 4 seconds of voiced speech

0 0.5 1 1.5 2 2.5 3 3.50

0.005

0.01

0.015

0.02

0.025

0.03

Distance

Pro

bab

ilit

y o

f O

ccu

rren

ce


SSDU

DSDU


04/20/236363

Modeling AnalysisModeling Analysis

0.5 1 1.5 2 2.5 3 3.50

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Distance value

Pro

babili

ty

Distributions of Mahalanobis Distance - Segment Based

Same Speaker

Different Speaker

N = 5 – 1 second of voiced speech

0 0.5 1 1.5 2 2.5 3 3.50

0.005

0.01

0.015

0.02

0.025

0.03

Distance

Pro

bab

ilit

y o

f O

ccu

rren

ce


SSDU

DSDU


04/20/236464

Distance MeasuresDistance Measures

• Mahalanobis DistanceMahalanobis Distance– Measures the separation between the means of both

classes

• Hotelling’s T-Square StatisticsHotelling’s T-Square Statistics– Measures the separation between the means of both

classes and takes into consideration the data lengths

• Kullback-Leibler DistanceKullback-Leibler Distance– Measures the separation between the distribution of

both classes

• Bhattacharyya DistanceBhattacharyya Distance– Derived from measuring the classification error

between both classes

• Levene’s TestLevene’s Test– Measures absolute deviation from the center of the

class distribution


04/20/236565

Speaker Recognition Speaker Recognition

Reference SpeechReference Speech

Feature ExtractionFeature ExtractionFeature ExtractionFeature Extraction

Model BuildingModel BuildingModel BuildingModel Building

Test SpeechTest Speech

Feature Feature ExtractionExtraction

Feature Feature ExtractionExtraction ComparisonComparison

ComparisonComparison RecognitionRecognitionDecision Decision

RecognitionRecognitionDecision Decision

SystemSystemOutputOutput

• Speaker Identification– Who is this speaker?

• Speaker Verification– Is he who he claims to be?


04/20/236666

Speaker Segmentation Speaker Segmentation

• Broadcast News/Conference Data

• Conversational Data

12 13 14 15 16 17 18 19 20-0.5

0

0.5

Time (Seconds)

Am

plitu

de

12 13 14 15 16 17 18 19 20-0.5

0

0.5

Time (Seconds)

Am

plitu

de

0 5 10 15 20 25 30-0.4

-0.2

0

0.2

0.4

0.6

Time (seconds)

Am

plitu

de

0 5 10 15 20 25 30-0.4

-0.2

0

0.2

0.4

0.6

Time (seconds)

Am

plitu

de


04/20/236767

Randomly Randomly Select 2 Select 2

UtterancesUtterances

Randomly Randomly Select 2 Select 2

UtterancesUtterances

Window Window DataData




Utterances from Speaker A Compute Compute

FeatureFeature

Compute Compute FeatureFeature



UtteranUtterance 2ce 2

Utterance 1

ComputCompute e

DistanceDistance

ComputCompute e

DistanceDistance

Procedural Set-upProcedural Set-up

Intra-speaker distance computationsIntra-speaker distance computations

• 384-Speaker database used• Average Utterance Length = 5 seconds

Randomly Randomly Select Select

Utterance Utterance


Utterance Utterance

ComputeComputeFeatureFeature

ComputeComputeFeatureFeature

ComputeCompute FeatureFeature

ComputeCompute FeatureFeature

Compute Compute DistanceDistance

Compute Compute DistanceDistance


UtteranceUtterance


UtteranceUtterance





Speaker Speaker AA

Speaker Speaker BB

Inter-speaker distance computationsInter-speaker distance computations


04/20/236868

Best ‘N’ EstimationBest ‘N’ Estimation

• 245 conversations from SWITCHBOARD used245 conversations from SWITCHBOARD used• Results shown for T-Square distanceResults shown for T-Square distance

2 4 6 8 10 12 14 16 18 2070

72

74

76

78

80

82

84Average Indexing Accuracy Wth Respect to Number of Voiced Phonemes Per Models

Acc

ura

cy

N = Number of Voiced Phonemes

N = 5N = 5

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research


04/20/236969

RRA Examples – 2 SpeakersRRA Examples – 2 Speakers

0 10 20 30 40 50 60-20

0

20How the Residual Ratio Algorithm Works for Two-Speaker Conversations

Init

ial

Sp

eech

0 10 20 30 40 50 60-20

0

20

Ro

un

d 2

Res

idu

al

Time (Seconds)

0 10 20 30 40 50 60-20

0

20

Ro

un

d 3

Res

idu

al

Time (Seconds)

Speaker 1

Speaker 2

Speaker 2

Speaker 2

Speaker 1 Speaker 2Addressing the

Challenges

Applications



Fusion of Distances

Evaluation



04/20/237070

RRA Examples – 3 SpeakersRRA Examples – 3 Speakers

0 10 20 30 40 50 60 70 80 90 100-20

0

20How the Residual Ratio Algorithm Works for Three-Speaker Conversations

Initi

al S

peec

h

0 10 20 30 40 50 60 70 80 90 100-20

0

20

Rou

nd 2

Res

idua

l

Time (Seconds)

0 10 20 30 40 50 60 70 80 90 100-20

-10

0

10

Rou

nd 3

Res

idua

l

Time (Seconds)

Speaker 1

Speaker 2Speaker 3

Speaker 2

Speaker 1

Speaker 2

Speaker 3

Speaker 2

Speaker 3


Applications



Fusion of Distances

Evaluation



04/20/237171

ComparisonComparison

Speaker 2

Residual Ratio after 2nd round Residual Ratio after 2nd round of RRAof RRA

Residual Ratio after 2nd round Residual Ratio after 2nd round of RRAof RRA

TWO-SPEAKER RESIDUALTWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUALTHREE-SPEAKER RESIDUAL

Speaker 2


Applications



Fusion of Distances

Evaluation



04/20/237272

Effects of FusionEffects of Fusion

0.5 1 1.5 2 2.5 3 3.5 40

0.02

0.04

Pro

bability

Tmax - 44.3369

0.5 1 1.5 2 2.5 3 3.50

0.02

Pro

bability

Mahalanobis - 44.0652

0 200 400 600 8000

0.050.1

Pro

bability

TSquared - 35.2111

0 50 100 150 200 250 300 3500

0.050.1

Pro

bability

KL - 22.7672

2 4 6 80

0.05

Pro

bability

Bhrattacharyya - 33.7449

0 50 100 150 200 2500

0.05

Pro

bability

Distance Feature - LPCC

Levenne - 13.4432

intra

inter

LPCCsLPCCs


Applications



Fusion of Distances

Evaluation



04/20/237373


LPCCsLPCCs

1 2 3 4 5 44.05

44.1

44.15

44.2

44.25

44.3

44.35

Mahalanobis

LevenneBhrattacharyya

KLTSquared

Increase in Inter-Class Separation as Number of Distances is Increased Features - LPCC

Number of Distances

T-T

est

Val

ue

Mahalanobis


Applications



Fusion of Distances

Evaluation



04/20/237474


MFCCsMFCCs

1 2 3 4 50

0.02

Pro

bability

39.4524

1 1.5 2 2.5 3 3.5 40

0.020.04

Pro

bability

Mahalanobis - 38.2733

0 200 400 600 800 1000 12000

0.050.1

Pro

bability

TSquared - 32.5542

0 500 1000 1500 20000

0.050.1

Pro

bability

KL - 11.0738

5 10 150

0.05

Pro

bability

Bhrattacharyya - 23.4276

0 100 200 300 400 5000

0.05

Pro

bability

Distance Feature - MFCC

Levenne - 16.8735

intra

interAddressing theChallenges

Applications



Fusion of Distances

Evaluation



04/20/237575


MFCCsMFCCs

1 2 3 4 538.2

38.4

38.6

38.8

39

39.2

39.4

39.6

Mahalanobis

Levenne

KL

TSquared

Bhrattacharyya

Increase in Inter-Class Separation as Number of Distances is Increased Features - MFCC

Number of Distances

T-t

est

Val

ue


Applications



Fusion of Distances

Evaluation



04/20/237676

Best Feature SizeBest Feature Size

3 6 9 12 15 18 21 24 27 300

5

10

15

20

25

T-S

tati

stic

s

Analyses of the Effects of Increasing the Size of the Feature Set (Voiced - Segment Based - LPCC)

Number of Coefficients

MahalanobisT-SquareKLBhattacharyyaLevene


Applications



Fusion of Distances

Evaluation


Preferred Customer

Same comment as previous slide


04/20/237777

Best Feature SizeBest Feature Size

3 6 9 12 15 18 21 24 27 300

5

10

15

20

25

30

T-S

tati

stic

s

Number of Coefficients

Analyses of the Effects of Increasing the Size of the Feature Set (Voiced - Segment Based - MFCC)

MahalanobisT-SquareKLBhatacharryaLevene


Applications



Fusion of Distances

Evaluation


Preferred Customer

Same comment as previous slide


04/20/237878

Correlation AnalysisCorrelation Analysis

50 100 150 200 250Levenne

4 6 8 10Bhrattacharyya

200 400 600 800KL

200 400 600 800TSquared

1.5 2 2.5 3 3.5

50

100150

200

250

Mahalanobis

Leve

nne

4

6

8

10

Bhr

atta

char

yya

0

500

1000

KL

200

400

600

800

TS

quar

ed

1.5

2

2.5

33.5

Mah

alan

obis

Draftsmans Dispalay of Distances (MFCC)

IntraInter

Draftsman’s Display - MFCC

Introduction


Modeling Speakers

Application Systems


Summary

Further Research


04/20/237979




Brett Smolenski, Ph.D.

speech processing lab uchechukwu ofoegbu, dissertation defense 10/24/2015 1 model formation and...

Documents