multimodal emotion recognition colin grubb advisor: nick webb
TRANSCRIPT
Multimodal Emotion Recognition
Colin Grubb
Advisor: Nick Webb
MOTIVATION
PREVIOUS RESEARCH
o Multimodal fusiono Research looking at audio, visual, and gesture informationo Feature Level vs. Decision Level
RESEARCH QUESTION
o To what extent can we improve emotion recognition by using classification methods on audio and visual data?
DECISION LEVEL ANALYSIS
o Set of rules vs. training a classifiero Rule set is too basic
o Will use classifier to learn outputs of unimodal systems
https://www.informatik.uni-augsburg.de/en/chairs/hcm/projects/
emovoice/
AUDIO SYSTEM
o EmoVoice (EMV)o Real Time Audio Analysiso Five emotional states w/ probabilitieso Published accuracy: 47.67%
EMOVOICE CONFIDENCE LEVELS
(Negative Active) Angry
(Negative Passive) Sad
(NEutral) Neutral
(Positve Active) Happy
(Positive Passive) Content
negativeActive <0.40, 0.20, 0.10, 0.15, 0.15>
VISUAL SYSTEM
o Software created by Prof. Shane Cottero Uses still imageso Published accuracy: 93.4%
SYSTEM LAYOUT
I’m in a good mood!
EmoVoice
Images
Emotion: Happy
Video Software
Emotion: Happy
Classifier
Output: Happy
DATA GATHERING
o 8 subjectso Five male, three female
o Audio Datao Read sample sentences
o Visual Datao Gather facial expressions from regular and long distance (6 ft.)
EXPERIMENTS
o Weka Data Mining Softwareo Used J48 Classifier
o C4.5 algorithm – decision treeo Each branch represents decision made at that node
1
2 3
Output 1 Output 2 Output 3 Output 4
http://www.cs.waikato.ac.nz/ml/weka/
EMOTION CLASSES
o Final dataset classifies betweeno Happyo Angryo Neutralo Sad
o Audio performance: 38.43%o Visual performance: 77.43 %
INITIAL PERFORMANCE
o Ran combined dataset against J.48 classifiero Multimodal data initially ineffective o Needed a way to improve dataset
Experiment Multimodal Data
EmoVoice Only Visual Only
Regular Distance 76.64 38.43 * 77.43
Long Distance 65.60 38.43 * 67.01
IMPROVING ACCURACY
o How can we use the two individual systems to complement each other?o Two pieces of information:
o What does the visual system do poorly on?o What kind of biases does EmoVoice have?
MANUAL BIAS
o Visual Systemo Performs poorly at Neutralo Some inaccuracy for all emotions tested
o EmoVoiceo Bias towards negative voiceo Very strong bias towards active voice
EMOVOICE – MODIFICATION RULES
oHappy: For all happy training instances, if PP + PA > NA & NE & NP, change EMV Class to HappyoSad: If NP is 2nd to NA and within 0.05, change EMV Class to SadoNeutral:
oIf NE tied with another confidence level, change EMV Class to NeutraloIf all probabilities within 0.05 of each other, change EMV Class to Neutral
RESULTS
Experiment Multimodal Data
EmoVoice Only Visual Only
Regular Distance 76.64 38.43 * 77.43
Long Distance 65.60 38.43 * 67.01
Regular Distance 82.47 58.17 * 77.43 *
Long Distance 70.09 58.17 * 67.36
Regular Distance – Confidence Levels
Removed
81.08 60.04 * 77.43 *
Long Distance – Confidence Levels
Removed
73.98 60.04 * 67.36 *
PostMan.Bias
FUTURE WORK
o Spring Practicumo Refine rules o Automationo Online Classifiero Mount on robot; cause apocalypse
THANK YOU FOR LISTENING.
oQuestions? Comments?