tub-irml at the mediaeval 2014 violent scenes detection task

10
Competence Center Information Retrieval & Machine Learning TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning Esra Acar , Sahin Albayrak

Upload: esra-acar

Post on 17-Aug-2015

63.087 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Competence Center Information Retrieval & Machine Learning

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

Esra Acar, Sahin Albayrak

2TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Outline

16 October 2014

► The Violence Detection Method

Video Representation

Violence Detection Model

► Results & Discussion

► Conclusions & Future Work

3TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

The Violence Detection Method

16 October 2014

►The two main components of our method are:

(1) the representation of video segments, and

(2) the learning of a violence model.

4TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Video Representation (1)

16 October 2014

The generation process of sparse coding based audio and visual representations for video segments.

5TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Video Representation (2)

16 October 2014

The generation of audio and visual dictionaries with sparse coding.

6TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Video Representation (3)

16 October 2014

► In addition to the mid-level audio and visual representations,

we use low-level features which are:

Motion-related descriptors – Violent Flow (ViF) which is a

descriptor proposed for real-time detection of violent crowd

behaviors, and

Static content representations – Affect-related color

descriptors such as statistics on saturation, brightness and

hue in the HSL color space, and colorfulness.

7TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Violence Detection Model

16 October 2014

► Violence is a concept which can audio-visually be expressed in

diverse manners.

► We learn multiple models for the violence concept instead of a

unique model.

Feature space partitioning by clustering video segments in

the training dataset, and

Learn a different model for each violence sub-concept.

► We perform a classifier selection to solve the classifier

combination issue.

8TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Results & Discussion

16 October 2014

Method MAP2014 – Movies

MAP@100 – Movies

MAP2014 – Web videos

MAP@100 – Web videos

Run1 0.169 0.368 0.517 0.582

Run2 0.139 0.284 0.371 0.478

Run3 0.080 0.208 0.477 0.495

Run4 0.172 0.409 0.489 0.586

Run5 0.170 0.406 0.479 0.567

SVM-based unique model

0.093 0.302 - -

Run1 MFCC-based mid-level audio representationsRun2 HoG- and HoF-based mid-level features and ViFRun3 Affect-related color featuresRun4 Audio and visual features (except color)Run5 All audio-visual representations are linearly fused at the decision level

The MAP2014 and MAP@100 of our method with different representations

10TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Conclusions & Future Work

16 October 2014

► The mid-level audio representation based on MFCC and

sparse coding

provides promising performance in terms of MAP2014

and MAP@100 metrics, and

also outperforms our visual representations.

► As a future work, we need to

extend/improve our visual representation set, and

further investigate the feature space partitioning concept.

Competence Center Information Retrieval &Machine Learning

11

www.dai-labor.de

FonFax

+49 (0) 30 / 314 – 74+49 (0) 30 / 314 – 74 003

DAI-Labor

Technische Universität BerlinFakultät IV – Elektrontechnik & Informatik

Sekretariat TEL 14Ernst-Reuter-Platz 710587 Berlin, Deutschland

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task

Esra Acar

Researcher

M.Sc.

[email protected]

Thanks!

013

16 October 2014