multimodal information fusion...ling guan, paisarn muneesawang, yongjin wang, rui zhang, yun tie,...
TRANSCRIPT
22/02/2016
1
MultiModalInformation Fusion
Ling GuanRyerson Multimedia Laboratory &
Centre for Interactive Multimedia Information MiningRyerson University,
Toronto, Ontario [email protected]
http://www.rml.ryerson.ca/1
Acknowledgement The presenter would like to thank his former
and current students, P. Muneesawang, Y. Wang, R. Zhang, A. Bulzacki, M.T. Ibrahim, N. Joshi, Z. Xie, L. Gao and N. El Madany for their contributions to this research.
Slides on fusion fundamentals provided by Prof. S-Y Kung of Princeton University are greatly appreciated.
This research is supported by The Canada Research Chair (CRC) Program, Canada Foundation for Innovations (CFI), The Ontario Research Fund (ORF), and Ryerson University
Ryerson University – Multimedia Research LaboratoryL. Guan, P. Muneesawang, Y. Wang, R. Zhang, Y. Tie, A. Bulzacki, M.T. Ibrahim, N. Joshi, Z.Xie, L. Gao, N. El Madany 2
22/02/2016
2
Major Publications Y. Wang, L. Guan and A.N. Venetsanopoulos, “Kernel based fusion with application to
audiovisual emotion recognition”, IEEE Trans. on Multimedia, vol. 14, no. 3, pp. 597-607, Jun 2012.
P. Muneesawang, T. Amin and L. Guan, “A new learning algorithms for the fusion of adaptive audio-visual features for the retrieval and classification of movie clips,” Journal of Signal Processing Systems for Signal, Image and Video Technology, vol. 59, no. 2, pp. 177-188, May 2010.
Y. Wang and L. Guan, “Combining speech and facial expression for recognition of human emotional state,” IEEE Transactions on Multimedia, vol. 10, no. 5, pp. 936 -946, August 2008.
N. Joshi and L. Guan, “ASR with the combination of recognizers under non-stationary noise conditions”, Journal of Signal Processing Systems, vol. 58, no. 3, pp, 359-370, Mar 2010.
L. Guan, P. Muneesawang, Y. Wang, R. Zhang, Y. Tie, A. Bulzacki and M.T. Ibriham, “Multimedia Multimodal Technologies,” Proc. IEEE Workshop on Multimedia Signal Processing and Novel Parallel Computing (In conjunction with ICME 2009), pp. 1600-1603, NYC, USA, Jul 2009 (Overview Paper).
R. Zhang and L. Guan, “Multimodal image retrieval via Bayesian information fusion,” Proc. IEEE Int. Conf. on Multimedia and Expo, pp. 830-833, NYC, USA, Jun/Jul 2009.
Z. Xie and L. Guan“Mutimodal information fusion of audio emotion recognition based on kernel entropy component analysis,” vol. 7, no. 1, pp. 25-42, Int. J. of Semantic Computing, Aug 2013.
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 3
Why Multimedia Multimodal Methodology? (revisit)
Multimedia is a domain of multi-facets, e.g., audio, visual, text, graphics, etc.
A central aspect of multimedia processing is the coherent integration of media from different sources or multimodalities.
Easy to define each facet individually, but difficult to consider them as a combined identity
Humans are natural and generic multimedia processing machines
Can we teach computers/machines to do the same (via fusion technologies)?
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 4
22/02/2016
3
Human–Computer Interaction Learning Environments Consumer Relations Entertainment Digital Home, Domestic Helper Security/Surveillance Educational Software Computer Animation Call Centers
Potential Applications
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 5
Source of Fusion for Classification
Data/Feature #1
Interaction
Score (Decision)
Interaction
Score (Decision)
Data/Feature #2Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
6
22/02/2016
4
7Feature (Data) level fusion
Direct Data (Feature) Level Fusion
Prior knowledge can be incorporated into the fusion models by modifying
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 8
Vifeatu
features:
22/02/2016
5
9Interaction level fusion
Interaction Level Fusion
HMM (Face Model) Fused HMM
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 10
22/02/2016
6
11Decision (Score) level fusion
Modular Networks (Decision Level)
E1
•••
••
•••
••
•••
••
....
RECOGNIZED
EMOTION
........
DEC I SION. .
. .
. . .
.
. . .
.
Er
y1
yj
y2
MODULE
Ynet
E2
NETWORK
INPUT
• Hierarchical Structure
• Each Sub-network Er anexpert system
• The decision moduleclassifies the input vectoras a particular class when
Ynet = arg max yj
A Different interpretation
12
22/02/2016
7
Score Fusion
1a Score Fusion (w/o supervision)• Linear Score Fusion (confidence/prior knowledge) • Nonlinear Score Fusion (ROC-based)
1b Score Fusion (via supervision)• Linear Score Fusion (adaptive supervision)• Nonlinear Score Fusion (adaptive supervision)
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 13
Score Fusion Architecture (Audio-Visual)
• The lower layer contains local experts, each produces alocal score based on a single modality
• The upper layer combines the score.
The scores are independently obtained, which are then combined.
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 14
22/02/2016
8
Linear Fusion
Score 1
Sco
re 2
Accept
Reject
Non-uniformly weighted
Linear SVM (supervised) Fusion is an appealing alternative.
The most prevailing unsupervised approaches estimate the confidence based on prior knowledge or training data.
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 15
(Kernel, SVM)
Score 1
Sco
re 2
Nonlinear Adaptive Fusion (via supervision)
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 16
22/02/2016
9
Simple and straightforward (Good) Curse of Dimensionality (Bad) Normalization issue
Case study: Bimodal Human emotion recognition (also with a score fusion flavor)Y. Wang and L. Guan, “Recognizing human emotional state from audiovisual signals,” IEEE Transactions on Multimedia, vol. 10, no. 5, pp. 936 - 946, August 2008.
Data/Feature Fusion (early work)
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 17
Indicators of emotion
Major indicators of emotionMajor indicators of emotionMajor indicators of emotionSpeechFacial expressionBody language: highly dependent on personality, gender, age, etcSemantic meaning: two sentences could have the same lexical meaning but different emotional information
......
Major indicators of emotion
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 18
22/02/2016
10
19
Objective
To develop a generic language and cultural background independent system for recognition of human emotional state from audiovisual signals
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 20
22/02/2016
11
Audio feature extraction
Noise reduction
Input speech
Windowing
Prosodic
MFCC
Formant
Audio feature set
Leading and trailingedge elimination
Waveletthresholding
512 points, 50% overlap
PreprocessingHammingwindow
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 21
Visual feature extraction
Key Frame Extraction
Face Detection
Gabor Filter Bank
Feature Mapping
Input Image Sequence
Visual Features
Maximum Speech Amplitude
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 22
22/02/2016
12
The recognition system - with Decision Fusion
AN
DI
FE
HA
SA
SU
Decisio
n m
od
ule
Co
rrespo
nd
ing
classifier
Featu
re selection
Key frameextraction
Face detection
Gabor wavelet
Recognizedemotion
Inputvideo
Pre-processing
Windowing
Prosodic
MFCC
Formant
Input speech
Audio feature extraction
Visual feature extraction Classification scheme
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 23
Modular Networks
E1
•••
••
•••
••
•••
••
....
RECOGNIZED
EMOTION
........
DEC I SION. .
. .
. . .
.
. . .
.
Er
y1
yj
y2
MODULE
Ynet
E2
NETWORK
INPUT
• Hierarchical Structure
• Each Sub-network Er anexpert system
• The decision moduleclassifies the input vectoras a particular class when
Ynet = arg max yj
24
22/02/2016
13
Experimental results
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
Accura
cy (%
)
Audio 88.46 61.90 59.09 48.00 57.14 80.00 66.43
Visual 34.62 47.62 68.18 52.00 47.62 48.00 49.29
Audiovisual 76.92 76.19 77.27 56.00 61.90 72.00 70.00
Stepwise 80.77 61.90 77.27 80.00 76.19 76.00 75.71
Multi-classifier 88.46 80.95 77.27 80.00 80.95 84.00 82.14
Anger Disgust Fear Happiness Sadness Surprise Overall
Experiments were performed on 500 video samples from 8 subjects, speaking 6 languages Six emotion labels: Anger, Disgust, Fear, Happiness, Sadness, and Surprise 360 samples (from six subjects) were used for training, and the rest 140 (from the remaining two subjects) for testing, there is no overlap between training and testing subjects
25Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
In general, not straightforward Fusion takes place on intermediate results
dynamically
Case study: 1. Speech Recognition 2. Image Retrieval with Audio Information
Interaction Fusion
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 26
22/02/2016
14
Speech RecognitionFusing information obtained from different
processing methods
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
1. N. Joshi and L. Guan, “ASR with the combination of recognizers under non-stationary noise conditions”, Journal of Signal Processing Systems, vol. 58, no. 3, pp, 359-370, Mar 2010.
2. N. Joshi and L. Guan, “Combination of recognizers and fusion of features approach to missing data ASR under non-stationary noise conditions,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. IV, pp. 1041-1044, Honolulu, USA, April 2007.
27
The System Diagram
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 28
22/02/2016
15
Two separate HMM based models:
spectral features, missing data (MD),
MFCC features. The Fused HMM
model is used for the interaction level fusion.
Interaction Level Fusion
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 29
Speech Recognition
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
• CMN - Cepstral Mean Normalization
TABLE I. RECOGNITION RESULTS WITH TEST CORPUS + FACTORY NOISE
SNR 18dB SNR 6dB
Conventional MFCC 83.7 64.7
Conventional MFCC CMN 66.0 60.3
Conventional Spectral Features, MD 76.5 67.4
Proposed MFCC+MD 84.5 67.5
Proposed MFCC, CMN+MD 88.6 73.5
30
22/02/2016
16
Image Retrieval with Audio Cues
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
R. Zhang and L. Guan, “A collaborative Bayesian image retrieval framework,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Proc., pp. 1593-1596, Taipei, Taiwan, Apr 2009.
31
General Idea
Image matching using audio information
Bayesian fusion
Q
Database
Audio information of a query image
QQA
Semantic class
weighting
Weight propagation
QQV
Nearest neighbor retrieval
Visual information of a query image
2/22/2016 32ICME 2009
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 32
22/02/2016
17
Experimental Setup
Database◦ 4400 images collected
from Flickr◦ Featuring 44 kinds of
animals Visual feature
selection Audio feature selection◦ MFCC with frame length
equal to 256.
2/22/2016 33ICME 2009
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 33
Experimental Results
Precision as a function of the number of retrieval iterations. Observations:1. Major improvement is obtained within the first iterations.2. Information fusion improve the performance and the audio
relevance feedback further improves it.
2/22/2016 34ICME 2009
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 34
22/02/2016
18
Interface for Subjective Evaluation
2/22/2016 35ICME 200935
Interface for Subjective Evaluation (2)
Not visually similar
2/22/2016 36ICME 200936
22/02/2016
19
Could be straightforward or involving more analysis.
Rigid due to limit on information left
Case study: 1. Video Retrieval based on Audiovisual Cues
Score Level Fusion
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 37
Video Retrieval by
Audiovisual Cues
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
1. P. Muneesawang, T. Amin and L. Guan, “A new learning algorithms for the fusion of adaptive audio-visual features for the retrieval and classification of movie clips,” Journal of Signal Processing Systems, vol. 59, no. 2, pp. 177-188, May 2010.
2. T. Amin, M. Zeytinoglu and L. Guan, “Application of Laplacian mixture model for image and video retrieval,” IEEE Transactions on Multimedia, vol. 9, no, 7, pp. 1416-1429, November 2007.
3. P. Muneesawang and L. Guan, “Adaptive video indexing and automatic/semi-automatic relevance feedback,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 8, pp. 1032-1046, August 2005.
38
22/02/2016
20
2. Video Retrieval
The scores are independently obtained, which are then combined
SVM
Classifier forAudio Channel
Classifier forVideo Channel
Audio Score Visual Score
Fused Score
)( AX )(VX
39Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 39
SVM Fusion40
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 40
22/02/2016
21
Visual Feature Representation
Visual◦ Adaptive Video Indexing (AVI) Using visual templates
TFxIDF Model
Cosine Distance for Similarity Matching
][log
]}[{max
][][
jn
N
jfr
jfrjf to
j
v
2
}1,...,1,0{
)(1
ˆ1minarg ji
Tj Nl
c
i hhh
,ˆ1minarg
2
}{}1,...,1,0{
)(2
)(1
ji
lTj Nl
ic
i hhh
h
41Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 41
Visual Feature Representation42
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 42
22/02/2016
22
Audio Feature Representation
Laplacian Mixture Model (LMM) of wavelet coefficients of audio signal
Audio feature vector with model parameter (using EM estimator)
are the model parameters obtained from the i-th high-frequency subband.
1
)|w()|w()w(
21
222111
bpbpp iii
1,...,2,1,,,,, ,2,1,100 Libbm iiia f
iii bb ,2,1,1 ,,
43Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 43
44
LMM vs GMM (2 COMPONENTS)
22/02/2016
23
Video Classification Result
Recognition rate obtained by the SVM based fusion model, using video database of 6,000 clips
Five semantic concepts
45Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 45
Multimodal Human Authentication
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
withSignature, Iris and Fingerprint
46
22/02/2016
24
47
Signature RecognitionRyerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 47
Filter-bank Based
MinutiaeBased
48Ryerson University - Department of Electrical and Computer EngineeringLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
Fingerprint Image Enhancement System Overview
Match Decision
Match Decision
Fusion Final Decision
48
22/02/2016
25
Decision
49
Iris Segmentation System Overview
Feature Extraction
Feature Extraction
Iris CodesFeature Level Fusion
Match
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 49
50
Fingerprint/Signature/Iris Fusion
Final Decision
Feature Extraction
Feature Extraction
Feature Extraction
Match Score
Match Score
Match Score
Score Fusion
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 50
22/02/2016
26
Body Tracking
Camera Tracking
Movement Recognition
Face Tracking
Emotion Recognition
Hand Gesture
Recognition
Stereo Vision /
Depth Calc.
Robot Applications51
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
1. Target people group: Elderly and disabled people at homes or community houses
2. Capable of simple gestures and body language
3. Capable of simple, and, sometime, incomplete verbal communications
Robot Application: Domestic Helper- via emotion/intention recognition
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 52
22/02/2016
27
Robot Application: Domestic Helper - via emotion/intention recognition
1. Help the elderly and the disabled with their daily life.
2. Entertain the people they look after.
3. Call the nurse or emergency when in need.
Behavior
Emotion
Head tracking
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 53
Audio cues
Visual cues
Possible humanintention
Analyze human emotionsAnalyze human emotions
Analyze human actionsAnalyze human actions
Hand gesture
Body movement
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
Multimodal Fusion for Human Intention Recognition
54
22/02/2016
28
Will fusion help in a problem on hand? What is the best fusion model for a problem on hand?
◦ Data/feature level,◦ Interaction level,◦ Score/Decision level,◦ Or multilevel.
New data analysis and mining tools need to be developed to address the issues. Or the existing tools may be revisited.
Challenges
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 55
Challenge 1: Does fusion help?
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
Fusion
Sensor 1 (audio)
Sensor 2 (visual)
Recognition by audio
Recognition by video
Recognition by fusion
56
22/02/2016
29
Challenge 2: Fusion at which level?
Data/Feature #1
Interaction
Score (Decision)
Interaction
Score (Decision)
Data/Feature #2Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim
57
58
Information Entropy
Entropy is a measure of the uncertainty
associated with a random variable
Uncertainty is useful information
entropy
conditional entropy
22/02/2016
30
59
Multimodal source type
Conflict H(x), H(y) < H(x,y) joint uncertainty is larger
negative relevance
RedundancyH(x,y) < H(x), H(y) joint uncertainty is smaller
positive relevance
ComplementaryH(x) ≈ H(x,y) ≈ H(y) joint uncertainty unchanged
weak relevance
60
Multimodal fusion method
A feature has a conflict with other features If the conflict is beyond threshold, eliminate conflict feature keep the total uncertainty at an acceptable level
Redundancy fusion ( A∩B ) according to ranking of uncertainty value
Complementary fusion ( A B )
22/02/2016
31
61
Mapping from entropy to weight
Weights are the inverse of entropies high entropy -> low confidence -> low weight feature corrupted -> maximum entropy -> zero weight ∑w = 1
H = 0, then w = 1H = Hmax, then w = 0Hi = Hj, then wi = wj = 0.5
62
Entropy vs. Correlation in Fusion
Developed an entropy based method, Kernel Entropy Component Analysis (KECA)Compared with correlation based methods: KPCA – Kernel Principal Component analysis. KCCA – Kernel Canonical correlation analysis
Z. Xie and L. Guan“Mutimodal information fusion of audio emotion recognition based on kernel entropy component analysis,” vol. 7, no. 1, pp. 25-42, Int. J. of Semantic Computing, Aug 2013.
22/02/2016
32
63
• Experimental results
RML database eNTERFACE database
KECA
KPCA KCCA
Fusion – coherent integration of multimedia multimodal information
It is a natural process by human beings, but not straightforward for machines.
It may be carried out at different information levels, but how to choose the right model?
Several case studies are used to demonstrate the power of information fusion
Multiple challenges are waiting to be addressed
Summary
Ryerson University – Multimedia Research LaboratoryLing Guan, Paisarn Muneesawang, Yongjin Wang, Rui Zhang, Yun Tie, Adrian Bulzacki, Muhammad Talal Ibrahim 64