multimodal emotion recognition colin grubb advisor: nick webb

Multimodal Emotion Recognition

Colin Grubb

Advisor: Nick Webb

MOTIVATION

PREVIOUS RESEARCH

o Multimodal fusiono Research looking at audio, visual, and gesture informationo Feature Level vs. Decision Level

RESEARCH QUESTION

o To what extent can we improve emotion recognition by using classification methods on audio and visual data?

DECISION LEVEL ANALYSIS

o Set of rules vs. training a classifiero Rule set is too basic

o Will use classifier to learn outputs of unimodal systems

https://www.informatik.uni-augsburg.de/en/chairs/hcm/projects/

emovoice/

AUDIO SYSTEM

o EmoVoice (EMV)o Real Time Audio Analysiso Five emotional states w/ probabilitieso Published accuracy: 47.67%

EMOVOICE CONFIDENCE LEVELS

(Negative Active) Angry

(Negative Passive) Sad

(NEutral) Neutral

(Positve Active) Happy

(Positive Passive) Content

negativeActive <0.40, 0.20, 0.10, 0.15, 0.15>

VISUAL SYSTEM

o Software created by Prof. Shane Cottero Uses still imageso Published accuracy: 93.4%

SYSTEM LAYOUT

I’m in a good mood!

EmoVoice

Images

Emotion: Happy

Video Software

Emotion: Happy

Classifier

Output: Happy

DATA GATHERING

o 8 subjectso Five male, three female

o Audio Datao Read sample sentences

o Visual Datao Gather facial expressions from regular and long distance (6 ft.)

EXPERIMENTS

o Weka Data Mining Softwareo Used J48 Classifier

o C4.5 algorithm – decision treeo Each branch represents decision made at that node

1

2 3

Output 1 Output 2 Output 3 Output 4

http://www.cs.waikato.ac.nz/ml/weka/

EMOTION CLASSES

o Final dataset classifies betweeno Happyo Angryo Neutralo Sad

o Audio performance: 38.43%o Visual performance: 77.43 %

INITIAL PERFORMANCE

o Ran combined dataset against J.48 classifiero Multimodal data initially ineffective o Needed a way to improve dataset

Experiment Multimodal Data

EmoVoice Only Visual Only

Regular Distance 76.64 38.43 * 77.43

Long Distance 65.60 38.43 * 67.01

IMPROVING ACCURACY

o How can we use the two individual systems to complement each other?o Two pieces of information:

o What does the visual system do poorly on?o What kind of biases does EmoVoice have?

MANUAL BIAS

o Visual Systemo Performs poorly at Neutralo Some inaccuracy for all emotions tested

o EmoVoiceo Bias towards negative voiceo Very strong bias towards active voice

EMOVOICE – MODIFICATION RULES

oHappy: For all happy training instances, if PP + PA > NA & NE & NP, change EMV Class to HappyoSad: If NP is 2nd to NA and within 0.05, change EMV Class to SadoNeutral:

oIf NE tied with another confidence level, change EMV Class to NeutraloIf all probabilities within 0.05 of each other, change EMV Class to Neutral

RESULTS

Experiment Multimodal Data

EmoVoice Only Visual Only

Regular Distance 76.64 38.43 * 77.43

Long Distance 65.60 38.43 * 67.01

Regular Distance 82.47 58.17 * 77.43 *

Long Distance 70.09 58.17 * 67.36

Regular Distance – Confidence Levels

Removed

81.08 60.04 * 77.43 *

Long Distance – Confidence Levels

Removed

73.98 60.04 * 67.36 *

PostMan.Bias

FUTURE WORK

o Spring Practicumo Refine rules o Automationo Online Classifiero Mount on robot; cause apocalypse

THANK YOU FOR LISTENING.

oQuestions? Comments?

multimodal emotion recognition colin grubb advisor: nick webb

Documents