tub-irml at the mediaeval 2014 violent scenes detection task
TRANSCRIPT
Competence Center Information Retrieval & Machine Learning
TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning
Esra Acar, Sahin Albayrak
2TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Outline
16 October 2014
► The Violence Detection Method
Video Representation
Violence Detection Model
► Results & Discussion
► Conclusions & Future Work
3TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
The Violence Detection Method
16 October 2014
►The two main components of our method are:
(1) the representation of video segments, and
(2) the learning of a violence model.
4TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Video Representation (1)
16 October 2014
The generation process of sparse coding based audio and visual representations for video segments.
5TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Video Representation (2)
16 October 2014
The generation of audio and visual dictionaries with sparse coding.
6TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Video Representation (3)
16 October 2014
► In addition to the mid-level audio and visual representations,
we use low-level features which are:
Motion-related descriptors – Violent Flow (ViF) which is a
descriptor proposed for real-time detection of violent crowd
behaviors, and
Static content representations – Affect-related color
descriptors such as statistics on saturation, brightness and
hue in the HSL color space, and colorfulness.
7TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Violence Detection Model
16 October 2014
► Violence is a concept which can audio-visually be expressed in
diverse manners.
► We learn multiple models for the violence concept instead of a
unique model.
Feature space partitioning by clustering video segments in
the training dataset, and
Learn a different model for each violence sub-concept.
► We perform a classifier selection to solve the classifier
combination issue.
8TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Results & Discussion
16 October 2014
Method MAP2014 – Movies
MAP@100 – Movies
MAP2014 – Web videos
MAP@100 – Web videos
Run1 0.169 0.368 0.517 0.582
Run2 0.139 0.284 0.371 0.478
Run3 0.080 0.208 0.477 0.495
Run4 0.172 0.409 0.489 0.586
Run5 0.170 0.406 0.479 0.567
SVM-based unique model
0.093 0.302 - -
Run1 MFCC-based mid-level audio representationsRun2 HoG- and HoF-based mid-level features and ViFRun3 Affect-related color featuresRun4 Audio and visual features (except color)Run5 All audio-visual representations are linearly fused at the decision level
The MAP2014 and MAP@100 of our method with different representations
10TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Conclusions & Future Work
16 October 2014
► The mid-level audio representation based on MFCC and
sparse coding
provides promising performance in terms of MAP2014
and MAP@100 metrics, and
also outperforms our visual representations.
► As a future work, we need to
extend/improve our visual representation set, and
further investigate the feature space partitioning concept.
Competence Center Information Retrieval &Machine Learning
11
www.dai-labor.de
FonFax
+49 (0) 30 / 314 – 74+49 (0) 30 / 314 – 74 003
DAI-Labor
Technische Universität BerlinFakultät IV – Elektrontechnik & Informatik
Sekretariat TEL 14Ernst-Reuter-Platz 710587 Berlin, Deutschland
TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Esra Acar
Researcher
M.Sc.
Thanks!
013
16 October 2014