senior project – computer science – 2013 multimodal emotion recognition colin grubb advisor...
DESCRIPTION
Introduction - PowerPoint PPT PresentationTRANSCRIPT
System Layout
Senior Project – Computer Science – 2013
Multimodal Emotion Recognition
Colin GrubbAdvisor – Prof. Nick WebbIntroduction
Multimodal Fusion is a technique in which two or more inputs are combined together in order to improve classification accuracy on a particular problem. In this study, we aimed to improve the classification accuracy of existing systems via fusion. We took two existing pieces of software, one audio and one visual, and worked to combine them together using decision level fusion. We conducted experiments to see how we could make the two individual systems compliment each other in order to achieve the highest possible accuracy.
AcknowledgementsProf. Nick Webb Prof. Shane Cotter Prof. Aaron Cass Thomas Yanuklis
Conclusion and Future WorkWe were able to achieve higher classification accuracy via combining audio and visual data and then applying manual bias in order to handle emotions where classification accuracy was weak for the individual systems. Future work will include the automation of individual system components, an online classifier where the output will be returned in real time, and refining the manual rules used to counteract bias. There is also potential for the system to be mounted on a robot currently residing in the department
Results
Emotion Software• Audio Software: EmoVoice (EMV)
• Open source, real time• Naïve Bayes classifier• Accuracy: 38.43%
• Visual Software: Partial Component Analysis (PCA)• Created by Professor Shane Cotter• Works on still images of faces• Accuracy: 77.4%
Gathering Data
Experimentation• EmoVoice data modified to
complement PCA weaknesses and combat negative/active voice bias
• J48 decision tree (C.45) used as classifier
• Four experiments run:• Regular Distance• Long Distance• Regular Distance –
No Conf.• Long Distance – No
Conf
Multimodal Data EmoVoice Only PCA OnlyRegular Distance 82.47 58.17 * 77.43 *
Long Distance 70.09 58.17 * 67.36
Regular Distance – No Confidence
Levels
81.08 60.04 * 77.43 *
Long Distance – No Confidence
Levels
73.98 60.04 * 67.36 *
(Results were statistically significant with p = 0.05)
• Four emotional states• Angry• Happy• Neutral• Sad
• List of sentences read to EmoVoice
• Normal visual data and long range visual data (6 ft.)
• Datasets constructed using outputs from unimodal systems
Manual Rules• Created rules to modify EmoVoice output based on
• EmoVoice bias towards negative and active voice• PCA weaknesses
• Rules classified by training instance class attribute• Happy: If the EMV confidence levels of content and happy
voice outweighed all other confidence levels, change to Happy
• Neutral: If all confidence levels were within 0.05 of each other, or if neutral confidence was tied for first, change instance to Neutral
• Sad: If second to angry within 0.05, change instance to Sad