producing emotional speech thanks to gabriel schubiner

41
Producing Emotional Speech Thanks to Gabriel Schubiner

Upload: adela-hudson

Post on 22-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Producing Emotional Speech Thanks to Gabriel Schubiner

Producing Emotional

Speech

Thanks to Gabriel Schubiner

Page 2: Producing Emotional Speech Thanks to Gabriel Schubiner

Papers

Generation of Affect in Synthesized Speech

Corpus-based approach to synthesis

Expressive visual speech using talking head

Demos

Affect Editor Quiz/Demo

Synface Demo

Page 3: Producing Emotional Speech Thanks to Gabriel Schubiner

Affect in SpeechGoals

Addition of Emotion to Synthetic speech

Acoustic Model

Typology of parameters of emotional speech

Quantification

Addresses problem of expressiveness

What benefit is gained from expressive speech?

Page 4: Producing Emotional Speech Thanks to Gabriel Schubiner

Emotion Theory/Assumptions

Emotion -> Nervous System -> Speech Output

Binary distinction

Parasympathetic vs Sympathetic

based on physical changes

universal emotions

Page 5: Producing Emotional Speech Thanks to Gabriel Schubiner

Approaches to Affect

Generative

Emotion -> Physical -> Acoustic

Descriptive

Observed acoustic params imposed

Page 6: Producing Emotional Speech Thanks to Gabriel Schubiner

Descriptive Framework

4 Parameter groups

Pitch

Timing

Voice Quality

Articulation

Assumption of independence

How could this affect design and results?

Page 7: Producing Emotional Speech Thanks to Gabriel Schubiner

PitchTiming

Accent Shape

Average Pitch

Contour Slope

Final Lowering

Pitch Range

Reference Line

Exaggeration (not used)

Fluent Pauses

Hesitation Pauses

Speech Rate

Stress Frequency

Stressed Stressable

Page 8: Producing Emotional Speech Thanks to Gabriel Schubiner

Voice Quality Articulation

Breathiness

Brilliance

Loudness

Pause Discontinuity

Pitch Discontinuity

Tremor

Laryngealization

Precision

Page 9: Producing Emotional Speech Thanks to Gabriel Schubiner

Implementation

Each parameter has scale

Each scale is independent

from other parameters

between positive and negative

Page 10: Producing Emotional Speech Thanks to Gabriel Schubiner

Implementation

Settings grouped into preset conditions for each emotion

based on prior studies

Page 11: Producing Emotional Speech Thanks to Gabriel Schubiner

Program Flow: Input

Emotion -> parameter representation

Utterance -> clauses

Agent, Action, Object, Locative

Clause and lexeme annotations

Finds all possible locations of affect and chooses whether or not to use

Page 12: Producing Emotional Speech Thanks to Gabriel Schubiner

Program Flow

Utterance -> Tree structure -> linear phonology

“compiled” for specific synthesizer with software to simulate affects not available in hardware

Page 13: Producing Emotional Speech Thanks to Gabriel Schubiner
Page 14: Producing Emotional Speech Thanks to Gabriel Schubiner

Perception

30 Utterances

5 sentences * 6 affects

Forced choice of one of six affects

magnitude and comments

Page 15: Producing Emotional Speech Thanks to Gabriel Schubiner

Elicitation Sentences

Intro

I’m almost finished

I’m going to the city

I saw your name in the paper X

I thought you really meant it

Look at that picture

Page 16: Producing Emotional Speech Thanks to Gabriel Schubiner

Pop Quiz!!!

Page 17: Producing Emotional Speech Thanks to Gabriel Schubiner

Pop Quiz Solutions

I’m almost finishedDisgust : Surprise : Sadness : Gladness : Anger : Fear

I’m going to the citySurprise : Gladness : Anger : Disgust : Sadness : Fear

I thought you really meant itAnger : Disgust : Gladness : Sadness : Fear : Surprise

Look at that pictureAnger : Fear : Disgust : Sadness : Gladness : Surprise

Page 18: Producing Emotional Speech Thanks to Gabriel Schubiner

Resultsapprox 50% recognition rate

91% sadness

Page 19: Producing Emotional Speech Thanks to Gabriel Schubiner
Page 20: Producing Emotional Speech Thanks to Gabriel Schubiner

Conclusions

Effective?

Thoughts?

Page 21: Producing Emotional Speech Thanks to Gabriel Schubiner

Corpus-based Approach to

Expressive Speech Synthesis

Page 22: Producing Emotional Speech Thanks to Gabriel Schubiner

Corpus

Collect utterances in each emotion

emotion-dependent semantics

One speaker

Good news, Bad news, Question

Page 23: Producing Emotional Speech Thanks to Gabriel Schubiner

Model: Feature Vector

FeaturesLexical stressPhrase-level stressDistance from beginning of phraseDistance from end of phrasePOSPhrase-typeEnd of syllable pitch

Page 24: Producing Emotional Speech Thanks to Gabriel Schubiner

Model: Classification

Predicts F0

5 syllable window

Uses feature vector to predict observation vector

observation vector: log(p), Δp

p = end of syllable pitch

Decision Tree

Page 25: Producing Emotional Speech Thanks to Gabriel Schubiner

Model: Target Duration

Similar to predicting F0

build tree with goal of providing Gaussian at leafs

Use mean of class as target duration

discretization

Page 26: Producing Emotional Speech Thanks to Gabriel Schubiner

ModelsUses acoustic analogue of n-grams

captures sense of contextcompared to describing full emotion as sequence

compare to Affect EditorUses only F0 and length (comp. A E)Include information about from which utterance the features are derived

intentional bias, justified?

Page 27: Producing Emotional Speech Thanks to Gabriel Schubiner

Model: SynthesisData tagged with original expression and emotion

expression-cost matrix

noted trade-off:

emotional intensity vs. smoothness

Paralinguistic events

Page 28: Producing Emotional Speech Thanks to Gabriel Schubiner

SSML

Compare to Cahn’s typology

Abstraction layers

Page 29: Producing Emotional Speech Thanks to Gabriel Schubiner

Perception Experiment

Distinguish same utterance spoken with neutral and affected prosody

Semantic content problematic?

Page 30: Producing Emotional Speech Thanks to Gabriel Schubiner

Results

Binary decision

Reasonable gain over baseline?

Page 31: Producing Emotional Speech Thanks to Gabriel Schubiner

Conclusion

Major contributions?

Paths forward?

Page 32: Producing Emotional Speech Thanks to Gabriel Schubiner

Synthesis of Expressive Visual Speech on a

Talking Head

Page 33: Producing Emotional Speech Thanks to Gabriel Schubiner

< Not these Talking Heads...

>

Page 34: Producing Emotional Speech Thanks to Gabriel Schubiner

Synthesis Background

Manipulation of video imagesVirtual model with deformation parametersSynchronized with time-aligned transcriptionArticulatory Control Model

Cohen & Massaro (1993)

Page 35: Producing Emotional Speech Thanks to Gabriel Schubiner

Data

Single actor

Given specific emotion as instruction

6 emotions + neutral

Page 36: Producing Emotional Speech Thanks to Gabriel Schubiner

Facial Animation Parameters

Face independent

FAP Matrix * scaling factor + position0

Weighted deformations of distance between vertices and feature point

Page 37: Producing Emotional Speech Thanks to Gabriel Schubiner

Modeling

Phonetic segments assigned target parameter vector

temporal blending over dominance functions

Principal components

Page 38: Producing Emotional Speech Thanks to Gabriel Schubiner

ML

Separate models for each emotion

6:1 training:testing ratio

models -> PC traj -> FAP traj * emotion param matrix

Page 39: Producing Emotional Speech Thanks to Gabriel Schubiner

Results

More extreme emotions easier to perceive

73% sad, 60% angry, 40% sad

Page 40: Producing Emotional Speech Thanks to Gabriel Schubiner

Synface Demo

Page 41: Producing Emotional Speech Thanks to Gabriel Schubiner

Discussion

Changes in approach from Cahn to Eide

Production compared to Detection