speech processing lab uchechukwu ofoegbu, dissertation defense 10/24/2015 1 model formation and...

79
h Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 03/19/22 1 Model Formation and Classification Model Formation and Classification Techniques For Conversation-based Techniques For Conversation-based Speaker Discrimination Speaker Discrimination Uchechukwu O. Ofoegbu Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D.

Upload: blaise-tyrone-wilkinson

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2311

Model Formation and Classification Model Formation and Classification Techniques For Conversation-based Techniques For Conversation-based

Speaker DiscriminationSpeaker Discrimination

Uchechukwu O. Ofoegbu

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.

Page 2: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2322

Acknowledgement Acknowledgement

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D. The audience, for being a part of thisThe audience, for being a part of this

ECE faculty and staff, for your great ECE faculty and staff, for your great supportsupport

Members and Friends of the Speech Members and Friends of the Speech Lab, for your valuable contributionsLab, for your valuable contributions

The Air Force Research Labs, for The Air Force Research Labs, for financially supporting most of this financially supporting most of this

research workresearch work

My committee members, for your time My committee members, for your time and commitment to my researchand commitment to my researchDr Y, the best advisor one could hope forDr Y, the best advisor one could hope forMy family, for being thereMy family, for being there

Page 3: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2333

Presentation OutlinePresentation Outline

• Introduction– Challenges of Conversational Data– General Applications of Research– Novelty of Research

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.

• Introduction

• Evaluation Databases

• Modeling Speakers– Traditional Speaker Modeling– Proposed Method– Features Used– Distance Used

• Introduction

• Evaluation Databases

• Modeling Speakers

• Application Systems– Unsupervised Speaker Indexing – Speaker Count– Generalized Speaker Indexing

• Introduction

• Evaluation Databases– HTIMIT– SWITCHBOARD– New Conversations Database

• Introduction

• Evaluation Databases

• Modeling Speakers

• Application Systems

• Fusion of Distance Measures– “Optimized T Distance – Decision-Based Combination– Weighted Decision-Based Combination

• Introduction

• Evaluation Databases

• Modeling Speakers

• Application Systems

• Fusion of Distance Measures

• Summary

• Introduction

• Evaluation Databases

• Modeling Speakers

• Application Systems

• Fusion of Distance Measures

• Summary

• Further Research

Page 4: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2344

Introduction

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 5: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2355

Challenges of Conversational DataChallenges of Conversational Data

• No a priori information available from participating speakers– Training is impossible

• No a priori knowledge of change points

• Speakers alternate very rapidly– Limited amounts of data for single speaker

representations

• Distortion– Channel noise, co-channel data

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 6: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2366

Proposed SolutionsProposed Solutions

1. Selective creation of data models

2. Distance-Based Model Comparison

3. Development of application-specific system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 7: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2377

Novelty of this ResearchNovelty of this Research

1. Selective creation of data models

2. Distance-Based Model Comparison

3. Development of application-specific system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 8: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2388

ApplicationsApplications

• Monitoring criminal conversations

• Forensics

• Automated Customer Services

• Storage/Search/Retrieval of Audio Data

• Military Activities

• Conference calls

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 9: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/2399

DatabasesDatabases

• Standard Speaker Discrimination Databases– HTMIT– Switchboard

• Temple Conversations Database (TCD)

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 10: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231010

Modeling Speakers

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 11: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231111

Traditional Speaker ModelingTraditional Speaker Modeling

• Examples– Gaussian Mixture Models– Hidden Markov Models– Neural Networks– Prosody-Based Models

• Disadvantages– Require large amounts– Sometimes require training procedure– Relatively complex

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 12: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231212

Conversational Data ModelingConversational Data Modeling

• Current Method– Equal segmentation of data– Indiscriminate use of data

• Problems– Change points unknown– Not all speech is useful– Poor performance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 13: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231313

Proposed Speaker ModelingProposed Speaker Modeling

SSSS

VVVV

UUUU

VVVV

UUUU

VVVV

UUUU

…………

UUUU

VVVV

SSSS

VVVV

VVVV

VVVV

VVVV

VVVV

VVVV

VVVV

VVVV. . .. . .

SEGMENT 1SEGMENT 1 SEGMENT MSEGMENT M

FEATURE FEATURE COMPUTATIONCOMPUTATION

FEATURE FEATURE COMPUTATIONCOMPUTATION

MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION

MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION

MODEL 1MODEL 1MODEL 1MODEL 1

MODEL MMODEL MMODEL MMODEL M. . .. . .

. . .. . .

FEATURE FEATURE COMPUTATIONCOMPUTATION

FEATURE FEATURE COMPUTATIONCOMPUTATION

MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION

MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 14: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231414

Proposed Speaker ModelingProposed Speaker Modeling

• Why voiced only?– Same speech class compared– Contains the most information

• What’s the appropriate number of phonemes?

– Large enough to sufficiently represent speakers

– Small enough to avoid speaker overlap

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 15: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231515

Features ConsideredFeatures Considered

• Linear Predictive Cepstral Coefficients

– Model the vocal tract

• Mel-Scale Frequency Cepstral Coefficients

– Model the human auditory system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 16: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231616

Distance MeasurementsDistance Measurements

Same speaker distancesSame speaker distances

Different speaker distancesDifferent speaker distances

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 17: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231717

Distances UsedDistances Used

• Mahalanobis DistanceMahalanobis Distance

• Hotelling’s T-Square StatisticsHotelling’s T-Square Statistics

• Kullback-Leibler DistanceKullback-Leibler Distance

• Bhattacharyya DistanceBhattacharyya Distance

• Levene’s TestLevene’s Test

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 18: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231818

Analysis of Cepstral FeaturesAnalysis of Cepstral Features

• Mahalanobis Distance

0 1 2 3 40

0.02

0.04

0.06

0.08

0.1

Mahalanobis Distance Comparisons

LPCC-based Distances

Pro

bability o

f O

ccurr

ence

Intra

Inter

0 1 2 3 40

0.02

0.04

0.06

0.08

0.1

0.12

MFCC-based Distances

Pro

bability o

f O

ccurr

ence

Intra

InterIntroduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 19: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/231919

Best Number of Phonemes?Best Number of Phonemes?

0 5 10 15 20 250

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Number of segments

Mahala

nobis

dis

tance

Speaker Differentiation with Respect to Data Size

Same Speaker

Different Speaker

Number of Phonemes

Features Used - LPCCFeatures Used - LPCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 20: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232020

Application Systems

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 21: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232121

Unsupervised Speaker IndexingUnsupervised Speaker Indexing

• The Restrained-Relative Minimum Distance (RRMD) Approach

0 D0 D1,21,2 D D1,31,3 … …

DD2,12,1 0 D 0 D2,32,3 … …

DD3,13,1 D D3,23,2 0 … 0 …

……

0 D0 D1,21,2 D D1,31,3 … …

DD2,12,1 0 0 DD2,32,3 … …

DD3,13,1 DD3,23,2 0 … 0 …

……

REFERENCE MODELSREFERENCE MODELS

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 22: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232222

Unsupervised Speaker IndexingUnsupervised Speaker Indexing

• The Restrained-Relative Minimum Distance (RRMD) Approach

Reference 2Reference 2

Restraining Condition

Restraining Condition

Same Speaker

PassedPassedRelativeDistance

Condition

RelativeDistance

Condition

FailedFailed

Passed

FailedFailed

Unusable DataReference 1

Reference 1

Observe distanceObserve distance

Min. Distance

Same Speaker?

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 23: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232323

RRMD ApproachRRMD Approach

• Restraining Condition

– Distance Likelihood Ratio

DLR > 1 Same Speaker

DLR < 1 Check Relative

Distance Condition

),|(

),|(

22

11

xf

xfDLR

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 24: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232424

RRMD ApproachRRMD Approach

• Relative Distance Condition

– Relative Distance:

Drel = dmax – dmin

– Drel > threshold Same Speaker

Reference 2Reference 2

Reference 1Reference 1

dmin dmax

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 25: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232525

Experiments and ResultsExperiments and Results

• Experiments

– HTIMIT used for obtaining likelihood ratio parameters

• 1000 same speaker and 1000 different speaker utterances computed

– 100 conversations from Switchboard database used for evaluation

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 26: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232626

Indexing Results - Mahalanobis Indexing Results - Mahalanobis

MFCCMFCCLPCCLPCC

0 0.1 0.2 0.3 0.4 0.50

2

4

6

8

10

12

Classification Error with Respect to Relative Distance Threshold Mahalanobis

Per

cent

Err

or

Threshold

Indexing Error

Undecided Error

Equal Error

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

0.1 0.2 0.3 0.4 0.5

2

4

6

8

10

12

14

Classification Error with Respect to Mean-Difference Threshold Mahalanobis

Per

cent

Err

or

Threshold

Indexing Error

Undecided Error

Page 27: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232727

Indexing Results – T-Square Indexing Results – T-Square

MFCCMFCCLPCCLPCC

0 50 100 150 2000

2

4

6

8

10

12

14

16

18

Classification Error with Respect to Mean-Difference Threshold T-Square

accu

racy

Threshold

Indexing Error

Undecided Error

0 50 100 150 2000

2

4

6

8

10

12

Classification Error with Respect to Relative Distance Threshold T-Square

Perc

en

t E

rro

r

Threshold

Indexing Error

Undecided Error

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 28: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232828

Indexing Results - Bhattacharyya Indexing Results - Bhattacharyya

MFCCMFCCLPCCLPCC

0.5 0.6 0.7 0.8 0.9 13

3.5

4

4.5

5

5.5

6

6.5

7

7.5

Classification Error with Respect to Mean-Difference Threshold Bhattacharyya

Perc

ent

Err

or

Threshold

Indexing Error

Undecided Error

Equal error

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

6

8

10

12

14

16

18

20

22

24

Classification Error with Respect to Mean-Difference Threshold Bhattacharyya

accu

racy

Threshold

Indexing Error

Undecided Error

Page 29: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/232929

Indexing Results - SummaryIndexing Results - Summary

• Mahalanobis distance yielded best results

• LPCCs outperformed MFCCs

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 30: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233030

Speaker Count SystemSpeaker Count System

• The Residual Ratio Algorithm (RRA)

• Process is repeated K-1 times for counting up to K speakers

Reference Model Reference Model Selected RandomlySelected Randomly

DLR-based DLR-based Model Model ComparisonComparison

Reference Model Reference Model Selected RandomlySelected Randomly

. . .. . .

DLR-based DLR-based Model Model ComparisonComparison

Reference Model Reference Model Selected RandomlySelected Randomly

Too little data Removed, select

Another modelIntroduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 31: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233131

Speaker CountSpeaker Count

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

ARR

Pro

babi

lity

ARR Probability Distributions

1 Speaker2 Speakers3 Speakers4 Speakers

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

ARR

Pro

babi

lity

ARR Probability Distributions

1 Speaker

2 Speakers

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

ARR

Pro

babi

lity

ARR Probability Distributions

1 Speaker2 Speakers3 Speakers

Added Residual Ratio:Added Residual Ratio:

• Is the sum of the residual ratios in all elimination stages

• Should be higher for greater number of speakers

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 32: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233232

Experiments and Results Experiments and Results

• Experiments

– 4000 conversations generated from HTIMIT

– All 40 conversations from new database used

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 33: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233333

Speaker Count Results - HTIMIT Speaker Count Results - HTIMIT

MFCCMFCCLPCCLPCC

RRA Speaker Count Accuracy

0

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cent

Cor

rect

Bhattacharrya

T-Square

Mahalanobis

Speaker Count Accuracy - MFCC

0

20

40

60

80

100

1 or 2+ 1, 2 or3+

1, 2, 3or 4

Accuracy Method

Perc

en

t C

orr

ect

Bhattacharrya

T-Square

Mahalanobis

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 34: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233434

Speaker Count Results - HTIMIT Speaker Count Results - HTIMIT

MFCCMFCCLPCCLPCC

Speaker Count Accuracy - MFCC

01020

3040506070

8090

100

1 or2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

Mahalanobis

OTD

DBC

WDBC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Speaker Count Accuracy - LPCC

35

45

55

65

75

85

95

1 or 2+ 1, 2 or 3+ 1, 2, 3 or4

Accuracy Method

Per

cen

t C

orr

ect

Mahalanobis

OTD

DBC

WDBC

Page 35: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233535

Speaker Count Results – TCDSpeaker Count Results – TCD

MFCCMFCCLPCCLPCC

RRA Speaker Count Accuracy - LPCC

0

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

Bhattacharrya

T-Square

Mahalanobis

RRA Speaker Count Accuracy - MFCC

0

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

Bhattacharrya

T-Square

Mahalanobis

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 36: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233636

Speaker Count Results – TCDSpeaker Count Results – TCD

MFCCMFCCLPCCLPCC

Speaker Count Accuracy - LPCC

20

30

40

50

60

70

80

90

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

T-Square

OTD

DBC

WDBC

Speaker Count Accuracy - MFCC

20

30

40

50

60

70

80

90

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Per

cen

t C

orr

ect

T-Square

OTD

DBC

WDBC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 37: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233737

Cross EvaluationCross Evaluation

HTIMIT – LPCCs with the WDBCHTIMIT – LPCCs with the WDBCTCD – MFCCs with the T-SquareTCD – MFCCs with the T-Square

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Speaker Count Accuracy - Cross Evaluation

20

40

60

80

100

1 or 2+ 1, 2 or 3+ 1, 2, 3 or 4

Accuracy Method

Pe

rce

nt

Co

rre

ct

HTIMIT TCD

Page 38: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233838

Speaker Counting-IndexingSpeaker Counting-Indexing

• The Residual Ratio speaker count algorithm is applied

• Test models are associated with their matching reference models

• Unmatched models are assigned to the references from which it has the minimum distance.

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 39: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/233939

Speaker Counting /Indexing ResultsSpeaker Counting /Indexing Results

Speaker Indexing Accuracy

0102030405060708090

100

Bhatta

char

rya

T-Squ

are

Mah

alan

obis

OTDDBC

WDBC

Pe

rce

nt

Ac

cu

rac

y

Solid - HTMIT; Patterned – TCDSolid - HTMIT; Patterned – TCD

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 40: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234040

Fusion of Distance Measures

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 41: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234141

Correlation AnalysisCorrelation Analysis

20 40 60 80 100120140Levenne

2 4 6Bhrattacharyya

50 100 150 200KL

0 100 200 300TSquared

1 2 3

20406080

100120140

Mahalanobis

Leve

nne

2

4

6

Bhr

atta

char

yya

50

100

150

200

KL

0

100

200

300

TS

quar

ed

1

2

3

Mah

alan

obis

Draftsmans Dispalay of Distances (LPCC)

Intra

Inter

Draftsman’s Display - LPCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 42: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234242

““Best Distance”Best Distance”

• Optimal Criteria for Fusion of Distances

– Maximize inter-speaker variation

– Minimize intra-speaker variation

– Maximize T-test value between inter-class distance distributions

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 43: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234343

Decision Level FusionDecision Level Fusion

D1 => matchD1 => match

D2 => no matchD2 => no match

D3 => matchD3 => match

D4 => matchD4 => match

Match = ¾Match = ¾

No Match = ¼No Match = ¼

Final Decision = MatchFinal Decision = Match

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 44: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234444

Weighted Decision Level FusionWeighted Decision Level Fusion

ii

ii T

T

Ti = T-value corresponding to each distance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 45: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234545

Summary

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 46: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234646

Research GoalResearch Goal

• To differentiate between speakers in a conversation– To determine the number of speakers present– To determine who is speaking when

• To overcome the following challenges– No a priori information– Limited data size– No knowledge of change points– Co-channel speech

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 47: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234747

Summary of AccomplishmentsSummary of Accomplishments

• Novel model formation technique

• Three novel approaches for conversations-based speaker differentiation

• Distance combination techniques to enhance performance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 48: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234848

ObservationsObservations

• Mahalanobis Distance, LPCCs optimal for standard databases

• T-Square Distance, MFCCs optimal for new database

• Best fusion technique: Weighted voting combination technique most efficient

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 49: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/234949

ConclusionConclusion

• Developed system yields about 6% EER whereas state of the art speaker indexing systems yield about 10% error rate.

• Methods for discrimination between speakers (speaker count or indexing) in CONVERSATIONS with more than two speakers have been introduced.

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 50: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235050

Further Research

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 51: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235151

Further ResearchFurther Research

• Investigation of prosodic speaker discrimination features

• Improving model formation technique by determining speaker change-points a priori

• Exploring the use of individual phonemes to form models

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 52: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235252

Further Research, cont’dFurther Research, cont’d

• Investigating the use of unvoiced speech, cautiously, in the formation of models

• Speech enhancement techniques to handle distorted data

• Implementation of other fusion techniques such as KL measure of divergence

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 53: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235353

PublicationsPublications• U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt,

“Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007

•  U. Ofoegbu, A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006

• U. Ofoegbu, A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006.

• U. Ofoegbu, A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006.

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.

Page 54: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235454

ACKNOWLEDGEMENTACKNOWLEDGEMENT

To the greatest teacher in the world, and the one who has made the most impact in my life,

Dr. Robert E. Yantorno.

To my best friend and the love of my life, Dr. Jude C. Abanulo

To Dr. Brian Butz, Dr. John Helferty, Dr. Saroj Biswas and Dr. Henry Sendaula

To my dissertation committee members, Dr. Iyad Obeid and Dr. Dennis Silage, and to Dr.

Rena Krakow.

To my friend, Ananth Iyer

To Abdoul Fall, Joe Fitschgrund, Angela Linse and Ralph Oyini; and to the members of the

Speech Processing Lab and the faculty of the electrical engineering department

To engineering administrators, Tamika Butler, Carol Dahlberg, Yvette Gibson and Cheryl

Sharp, and to Louise, day time janitress for the engineering building

To the Temple students who volunteered as participants in the New Conversations

Database

To Temple

To the Air Force Research Labs at Rome – Financial supporters of most of the research

To my parents, Ugo & Joseph Ofoegbu; my siblings, Amaka & Humphrey Onyendi, Nene,

Obinna and Chibuzor Ofoegbu; and my grandmother, Cordelia Osuji

To God

Thank you.

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.

or give suggestionsor give suggestions

Page 55: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235555

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.

Brett Smolenski, Ph.D.

Page 56: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235656

Cepstral AnalysisCepstral Analysis

Frequency Analysis of Speech Excitation ComponentExcitation Component Vocal Tract ComponentVocal Tract Component

XX==

Slowly varying formantsSlowly varying formants

Fast varying harmonicsFast varying harmonics

STFT of SpeechSTFT of Speech

==

==

++

++

Log of STFTLog of STFT Log of ExcitationLog of Excitation Log of Vocal Tract ComponentLog of Vocal Tract Component

IDFT of Log of IDFT of Log of STFTSTFT

Vocal tractVocal tract ExcitationExcitation

Page 57: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235757

Cepstral FeaturesCepstral Features

• Linear Predictive Cepstral Coefficients

– Obtained Recursively from LPC Coefficients

Let LPC vector = [a0 a1 a2 …ap]   and

LPCC vector = [c0 c1 c2 …cp c0 … c1 c2 …cn-1]     

20 ln Ec

nmpcam

kmc

m

kkmkm

,

)(1

1)(

pmcakmm

acm

kkmkmm

1,)(

1 1

1)(

Page 58: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235858

Conversational Data ModelingConversational Data Modeling

• Current Method– Equal Segmentation of Data– Indiscriminate use of data

• Problems– Change points unknown– Not all speech is useful

Page 59: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/235959

““Best Distance”Best Distance”

• Intra-speaker and inter-speaker distance lengths are always equal, therefore:

P = sum of the covariance matrices of the two classes.

λ1 = maximum eigenvalue obtained by solving the generalized eigenvalue problem:

Q = is the square of the distance between the mean vectors of the two classes

22

21

21)(

aT )( 21

1

1

Pa

k

1

211

1 )( Pk

aQaP 11

Page 60: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236060

““Best Distance”Best Distance”

Distance Measure 1

Dis

tan

ce M

easu

re 2

Page 61: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236161

RRMD ApproachRRMD Approach

• Relative Distance Condition

0 100 200 300 400 500 6000

0.05

0.1

0.15

0.2

T-Square Statistics

Pro

bab

ilit

y

Distribution of T-Square Statistics for N = 5

Intra Speaker

Inter Speaker

D rel

Preferred Customer
Are you sure of the distance on this slide?
Page 62: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236262

Modeling AnalysisModeling Analysis

0 0.5 1 1.5 2 2.5 3 3.50

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Distance Value

Pro

babi

lity

Distribution of Mahalanobis Distance - Utterance Based

Same Speaker

Different Speaker

N = 20 – 4 seconds of voiced speech

0 0.5 1 1.5 2 2.5 3 3.50

0.005

0.01

0.015

0.02

0.025

0.03

Distance

Pro

bab

ilit

y o

f O

ccu

rren

ce

Mahalanobis Distance Comparisons

SSDU

DSDU

Page 63: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236363

Modeling AnalysisModeling Analysis

0.5 1 1.5 2 2.5 3 3.50

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Distance value

Pro

babili

ty

Distributions of Mahalanobis Distance - Segment Based

Same Speaker

Different Speaker

N = 5 – 1 second of voiced speech

0 0.5 1 1.5 2 2.5 3 3.50

0.005

0.01

0.015

0.02

0.025

0.03

Distance

Pro

bab

ilit

y o

f O

ccu

rren

ce

Mahalanobis Distance Comparisons

SSDU

DSDU

Page 64: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236464

Distance MeasuresDistance Measures

• Mahalanobis DistanceMahalanobis Distance– Measures the separation between the means of both

classes

• Hotelling’s T-Square StatisticsHotelling’s T-Square Statistics– Measures the separation between the means of both

classes and takes into consideration the data lengths

• Kullback-Leibler DistanceKullback-Leibler Distance– Measures the separation between the distribution of

both classes

• Bhattacharyya DistanceBhattacharyya Distance– Derived from measuring the classification error

between both classes

• Levene’s TestLevene’s Test– Measures absolute deviation from the center of the

class distribution

Page 65: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236565

Speaker Recognition Speaker Recognition

Reference SpeechReference Speech

Feature ExtractionFeature ExtractionFeature ExtractionFeature Extraction

Model BuildingModel BuildingModel BuildingModel Building

Test SpeechTest Speech

Feature Feature ExtractionExtraction

Feature Feature ExtractionExtraction ComparisonComparison

ComparisonComparison RecognitionRecognitionDecision Decision

RecognitionRecognitionDecision Decision

SystemSystemOutputOutput

• Speaker Identification– Who is this speaker?

• Speaker Verification– Is he who he claims to be?

Page 66: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236666

Speaker Segmentation Speaker Segmentation

• Broadcast News/Conference Data

• Conversational Data

12 13 14 15 16 17 18 19 20-0.5

0

0.5

Time (Seconds)

Am

plitu

de

12 13 14 15 16 17 18 19 20-0.5

0

0.5

Time (Seconds)

Am

plitu

de

0 5 10 15 20 25 30-0.4

-0.2

0

0.2

0.4

0.6

Time (seconds)

Am

plitu

de

0 5 10 15 20 25 30-0.4

-0.2

0

0.2

0.4

0.6

Time (seconds)

Am

plitu

de

Page 67: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236767

Randomly Randomly Select 2 Select 2

UtterancesUtterances

Randomly Randomly Select 2 Select 2

UtterancesUtterances

Window Window DataData

Window Window DataData

Window Window DataData

Window Window DataData

Utterances from Speaker A Compute Compute

FeatureFeature

Compute Compute FeatureFeature

Compute Compute FeatureFeature

Compute Compute FeatureFeature

UtteranUtterance 2ce 2

Utterance 1

ComputCompute e

DistanceDistance

ComputCompute e

DistanceDistance

Procedural Set-upProcedural Set-up

Intra-speaker distance computationsIntra-speaker distance computations

• 384-Speaker database used• Average Utterance Length = 5 seconds

Randomly Randomly Select Select

Utterance Utterance

Randomly Randomly Select Select

Utterance Utterance

ComputeComputeFeatureFeature

ComputeComputeFeatureFeature

ComputeCompute FeatureFeature

ComputeCompute FeatureFeature

Compute Compute DistanceDistance

Compute Compute DistanceDistance

Randomly Randomly Select Select

UtteranceUtterance

Randomly Randomly Select Select

UtteranceUtterance

Window Window DataData

Window Window DataData

Window Window DataData

Window Window DataData

Speaker Speaker AA

Speaker Speaker BB

Inter-speaker distance computationsInter-speaker distance computations

Page 68: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236868

Best ‘N’ EstimationBest ‘N’ Estimation

• 245 conversations from SWITCHBOARD used245 conversations from SWITCHBOARD used• Results shown for T-Square distanceResults shown for T-Square distance

2 4 6 8 10 12 14 16 18 2070

72

74

76

78

80

82

84Average Indexing Accuracy Wth Respect to Number of Voiced Phonemes Per Models

Acc

ura

cy

N = Number of Voiced Phonemes

N = 5N = 5

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 69: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/236969

RRA Examples – 2 SpeakersRRA Examples – 2 Speakers

0 10 20 30 40 50 60-20

0

20How the Residual Ratio Algorithm Works for Two-Speaker Conversations

Init

ial

Sp

eech

0 10 20 30 40 50 60-20

0

20

Ro

un

d 2

Res

idu

al

Time (Seconds)

0 10 20 30 40 50 60-20

0

20

Ro

un

d 3

Res

idu

al

Time (Seconds)

Speaker 1

Speaker 2

Speaker 2

Speaker 2

Speaker 1 Speaker 2Addressing the

Challenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 70: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237070

RRA Examples – 3 SpeakersRRA Examples – 3 Speakers

0 10 20 30 40 50 60 70 80 90 100-20

0

20How the Residual Ratio Algorithm Works for Three-Speaker Conversations

Initi

al S

peec

h

0 10 20 30 40 50 60 70 80 90 100-20

0

20

Rou

nd 2

Res

idua

l

Time (Seconds)

0 10 20 30 40 50 60 70 80 90 100-20

-10

0

10

Rou

nd 3

Res

idua

l

Time (Seconds)

Speaker 1

Speaker 2Speaker 3

Speaker 2

Speaker 1

Speaker 2

Speaker 3

Speaker 2

Speaker 3

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 71: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237171

ComparisonComparison

Speaker 2

Residual Ratio after 2nd round Residual Ratio after 2nd round of RRAof RRA

Residual Ratio after 2nd round Residual Ratio after 2nd round of RRAof RRA

TWO-SPEAKER RESIDUALTWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUALTHREE-SPEAKER RESIDUAL

Speaker 2

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 72: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237272

Effects of FusionEffects of Fusion

0.5 1 1.5 2 2.5 3 3.5 40

0.02

0.04

Pro

bability

Tmax - 44.3369

0.5 1 1.5 2 2.5 3 3.50

0.02

Pro

bability

Mahalanobis - 44.0652

0 200 400 600 8000

0.050.1

Pro

bability

TSquared - 35.2111

0 50 100 150 200 250 300 3500

0.050.1

Pro

bability

KL - 22.7672

2 4 6 80

0.05

Pro

bability

Bhrattacharyya - 33.7449

0 50 100 150 200 2500

0.05

Pro

bability

Distance Feature - LPCC

Levenne - 13.4432

intra

inter

LPCCsLPCCs

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 73: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237373

Effects of FusionEffects of Fusion

LPCCsLPCCs

1 2 3 4 5 44.05

44.1

44.15

44.2

44.25

44.3

44.35

Mahalanobis

LevenneBhrattacharyya

KLTSquared

Increase in Inter-Class Separation as Number of Distances is Increased Features - LPCC

Number of Distances

T-T

est

Val

ue

Mahalanobis

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 74: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237474

Effects of FusionEffects of Fusion

MFCCsMFCCs

1 2 3 4 50

0.02

Pro

bability

39.4524

1 1.5 2 2.5 3 3.5 40

0.020.04

Pro

bability

Mahalanobis - 38.2733

0 200 400 600 800 1000 12000

0.050.1

Pro

bability

TSquared - 32.5542

0 500 1000 1500 20000

0.050.1

Pro

bability

KL - 11.0738

5 10 150

0.05

Pro

bability

Bhrattacharyya - 23.4276

0 100 200 300 400 5000

0.05

Pro

bability

Distance Feature - MFCC

Levenne - 16.8735

intra

interAddressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 75: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237575

Effects of FusionEffects of Fusion

MFCCsMFCCs

1 2 3 4 538.2

38.4

38.6

38.8

39

39.2

39.4

39.6

Mahalanobis

Levenne

KL

TSquared

Bhrattacharyya

Increase in Inter-Class Separation as Number of Distances is Increased Features - MFCC

Number of Distances

T-t

est

Val

ue

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Page 76: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237676

Best Feature SizeBest Feature Size

3 6 9 12 15 18 21 24 27 300

5

10

15

20

25

T-S

tati

stic

s

Analyses of the Effects of Increasing the Size of the Feature Set (Voiced - Segment Based - LPCC)

Number of Coefficients

MahalanobisT-SquareKLBhattacharyyaLevene

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Preferred Customer
Same comment as previous slide
Page 77: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237777

Best Feature SizeBest Feature Size

3 6 9 12 15 18 21 24 27 300

5

10

15

20

25

30

T-S

tati

stic

s

Number of Coefficients

Analyses of the Effects of Increasing the Size of the Feature Set (Voiced - Segment Based - MFCC)

MahalanobisT-SquareKLBhatacharryaLevene

Addressing theChallenges

Applications

MethodsModeling SpeakersSpeaker Indexing

Speaker CountSpeaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further Research

Preferred Customer
Same comment as previous slide
Page 78: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237878

Correlation AnalysisCorrelation Analysis

50 100 150 200 250Levenne

4 6 8 10Bhrattacharyya

200 400 600 800KL

200 400 600 800TSquared

1.5 2 2.5 3 3.5

50

100150

200

250

Mahalanobis

Leve

nne

4

6

8

10

Bhr

atta

char

yya

0

500

1000

KL

200

400

600

800

TS

quar

ed

1.5

2

2.5

33.5

Mah

alan

obis

Draftsmans Dispalay of Distances (MFCC)

IntraInter

Draftsman’s Display - MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance Measures

Summary

Further Research

Page 79: Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense 10/24/2015 1 Model Formation and Classification Techniques For Conversation-based Speaker

Speech Processing Lab Uchechukwu Ofoegbu, Dissertation Defense

04/20/237979

Advisor: Robert Yantorno, Ph.D

Committee Members:Brian Butz, Ph.D.

Dennis Silage, Ph.D.Iyad Obeid, Ph.D.

Brett Smolenski, Ph.D.