senior project – computer science – 2013 multimodal emotion recognition colin grubb advisor...

1
System Layout Senior Project – Computer Science – 2013 Multimodal Emotion Recognition Colin Grubb Advisor – Prof. Nick Webb Introduction Multimodal Fusion is a technique in which two or more inputs are combined together in order to improve classification accuracy on a particular problem. In this study, we aimed to improve the classification accuracy of existing systems via fusion. We took two existing pieces of software, one audio and one visual, and worked to combine them together using decision level fusion. We conducted experiments to see how we could make the two individual systems compliment each other in order to achieve the highest possible accuracy. Acknowledgements Prof. Nick Webb Prof. Shane Cotter Prof. Aaron Cass Thomas Yanuklis Conclusion and Future Work We were able to achieve higher classification accuracy via combining audio and visual data and then applying manual bias in order to handle emotions where classification accuracy was weak for the individual systems. Future work will include the automation of individual system components, an online classifier where the output will be returned in real time, and refining the manual rules used to counteract bias. There is also potential for the system to be mounted on a robot currently residing in the department Results Emotion Software Audio Software: EmoVoice (EMV) Open source, real time Naïve Bayes classifier Accuracy: 38.43% Visual Software: Partial Component Analysis (PCA) Created by Professor Shane Cotter Works on still images of faces Accuracy: 77.4% Gathering Data Experimentation EmoVoice data modified to complement PCA weaknesses and combat negative/active voice bias J48 decision tree (C.45) used as classifier Four experiments run: Regular Distance Long Distance Regular Distance – No Conf. Long Distance – No Conf Multimodal Data EmoVoice Only PCA Only Regular Distance 82.47 58.17 * 77.43 * Long Distance 70.09 58.17 * 67.36 Regular Distance – No Confidence Levels 81.08 60.04 * 77.43 * Long Distance – No Confidence Levels 73.98 60.04 * 67.36 * (Results were statistically significant with p = 0.05) Four emotional states • Angry • Happy • Neutral • Sad List of sentences read to EmoVoice Normal visual data and long range visual data (6 ft.) Datasets constructed using outputs from unimodal systems Manual Rules Created rules to modify EmoVoice output based on EmoVoice bias towards negative and active voice PCA weaknesses Rules classified by training instance class attribute Happy: If the EMV confidence levels of content and happy voice outweighed all other confidence levels, change to Happy Neutral: If all confidence levels were within 0.05 of each other, or if neutral confidence was tied for first, change instance to Neutral Sad: If second to angry within 0.05, change instance to Sad

Upload: adele

Post on 25-Feb-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Senior Project –  Computer Science  –  2013 Multimodal Emotion Recognition Colin Grubb Advisor – Prof.  Nick Webb

System Layout

Senior Project – Computer Science – 2013

Multimodal Emotion Recognition

Colin GrubbAdvisor – Prof. Nick WebbIntroduction

Multimodal Fusion is a technique in which two or more inputs are combined together in order to improve classification accuracy on a particular problem. In this study, we aimed to improve the classification accuracy of existing systems via fusion. We took two existing pieces of software, one audio and one visual, and worked to combine them together using decision level fusion. We conducted experiments to see how we could make the two individual systems compliment each other in order to achieve the highest possible accuracy.

AcknowledgementsProf. Nick Webb Prof. Shane Cotter Prof. Aaron Cass Thomas Yanuklis

Conclusion and Future WorkWe were able to achieve higher classification accuracy via combining audio and visual data and then applying manual bias in order to handle emotions where classification accuracy was weak for the individual systems. Future work will include the automation of individual system components, an online classifier where the output will be returned in real time, and refining the manual rules used to counteract bias. There is also potential for the system to be mounted on a robot currently residing in the department

Results

Emotion Software• Audio Software: EmoVoice (EMV)

• Open source, real time• Naïve Bayes classifier• Accuracy: 38.43%

• Visual Software: Partial Component Analysis (PCA)• Created by Professor Shane Cotter• Works on still images of faces• Accuracy: 77.4%

Gathering Data

Experimentation• EmoVoice data modified to

complement PCA weaknesses and combat negative/active voice bias

• J48 decision tree (C.45) used as classifier

• Four experiments run:• Regular Distance• Long Distance• Regular Distance –

No Conf.• Long Distance – No

Conf

Multimodal Data EmoVoice Only PCA OnlyRegular Distance 82.47 58.17 * 77.43 *

Long Distance 70.09 58.17 * 67.36

Regular Distance – No Confidence

Levels

81.08 60.04 * 77.43 *

Long Distance – No Confidence

Levels

73.98 60.04 * 67.36 *

(Results were statistically significant with p = 0.05)

• Four emotional states• Angry• Happy• Neutral• Sad

• List of sentences read to EmoVoice

• Normal visual data and long range visual data (6 ft.)

• Datasets constructed using outputs from unimodal systems

Manual Rules• Created rules to modify EmoVoice output based on

• EmoVoice bias towards negative and active voice• PCA weaknesses

• Rules classified by training instance class attribute• Happy: If the EMV confidence levels of content and happy

voice outweighed all other confidence levels, change to Happy

• Neutral: If all confidence levels were within 0.05 of each other, or if neutral confidence was tied for first, change instance to Neutral

• Sad: If second to angry within 0.05, change instance to Sad