tub-irml at the mediaeval 2014 violent scenes detection task

Competence Center Information Retrieval & Machine Learning

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

Esra Acar, Sahin Albayrak

2TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Outline

16 October 2014

► The Violence Detection Method

Video Representation

Violence Detection Model

► Results & Discussion

► Conclusions & Future Work


The Violence Detection Method

16 October 2014

►The two main components of our method are:

(1) the representation of video segments, and

(2) the learning of a violence model.


Video Representation (1)

16 October 2014

The generation process of sparse coding based audio and visual representations for video segments.



16 October 2014

The generation of audio and visual dictionaries with sparse coding.



16 October 2014

► In addition to the mid-level audio and visual representations,

we use low-level features which are:

Motion-related descriptors – Violent Flow (ViF) which is a

descriptor proposed for real-time detection of violent crowd

behaviors, and

Static content representations – Affect-related color

descriptors such as statistics on saturation, brightness and

hue in the HSL color space, and colorfulness.


Violence Detection Model

16 October 2014

► Violence is a concept which can audio-visually be expressed in

diverse manners.

► We learn multiple models for the violence concept instead of a

unique model.

Feature space partitioning by clustering video segments in

the training dataset, and

Learn a different model for each violence sub-concept.

► We perform a classifier selection to solve the classifier

combination issue.


Results & Discussion

16 October 2014

Method MAP2014 – Movies

MAP@100 – Movies

MAP2014 – Web videos

MAP@100 – Web videos

Run1 0.169 0.368 0.517 0.582

Run2 0.139 0.284 0.371 0.478

Run3 0.080 0.208 0.477 0.495

Run4 0.172 0.409 0.489 0.586

Run5 0.170 0.406 0.479 0.567

SVM-based unique model

0.093 0.302 - -

Run1 MFCC-based mid-level audio representationsRun2 HoG- and HoF-based mid-level features and ViFRun3 Affect-related color featuresRun4 Audio and visual features (except color)Run5 All audio-visual representations are linearly fused at the decision level

The MAP2014 and MAP@100 of our method with different representations


Conclusions & Future Work

16 October 2014

► The mid-level audio representation based on MFCC and

sparse coding

provides promising performance in terms of MAP2014

and MAP@100 metrics, and

also outperforms our visual representations.

► As a future work, we need to

extend/improve our visual representation set, and

further investigate the feature space partitioning concept.

Competence Center Information Retrieval &Machine Learning

11

www.dai-labor.de

FonFax

+49 (0) 30 / 314 – 74+49 (0) 30 / 314 – 74 003

DAI-Labor

Technische Universität BerlinFakultät IV – Elektrontechnik & Informatik

Sekretariat TEL 14Ernst-Reuter-Platz 710587 Berlin, Deutschland

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Esra Acar

Researcher

M.Sc.

[email protected]

Thanks!

013

16 October 2014

http://www.dai-labor.de/

tub-irml at the mediaeval 2014 violent scenes detection task

Data & Analytics

violent scenes detection

violence detection methodthe

violent scenes detectiontask

violence concept

visual representation

generation of audio

violence modeling

visual features