mediaeval 2016 - eumssi team at the mediaeval person discovery challenge

10
EUMSSI team at the MediaEval Person Discovery Challenge 2016 Nam Le, JeanMarc Odobez, Sylvain Meignier {nle, odobez}@idiap.ch [email protected]

Upload: multimediaeval

Post on 09-Jan-2017

31 views

Category:

Science


1 download

TRANSCRIPT

EUMSSI  team  at  the  MediaEvalPerson  Discovery  Challenge  2016

Nam  Le,  Jean-­‐Marc  Odobez,  Sylvain  Meignier{nle,  odobez}@idiap.ch

sylvain.meignier@univ-­‐lemans.fr

Overview

07/12/2016

Olivier  Truchot

Marisol Turaine

Video  OCR  and  NER3

07/12/2016

Original Image

Text region detection

Text extraction

Text recognition

Hypothesis merging

• Multiple  image  segmentations  of  the  same  region  è all  results  are  compared  and  aggregated  over  time  è several  hypotheses  è high  recall

• NER  based  on  MITIE  with  heuristics.

Face  diarization4

07/12/2016

DPM

CRF-multi-target

Face  clustering Hierarchical clustering

shots

Face  tracking

Face  detection

Talking  face  detection5

07/12/2016

Face  track

9  directions  of  optical  flows

PCA  ⇒ 𝒙𝒕

x% x& x'(&

LSTM LSTM LSTM…

x& x) x'

Mean  Pooling Classifier

ℎ% ℎ&h'(&

DW  dataset  for  talking  face  &  dubbing:  http://bit.ly/dw-­‐dubbing

• LIUM  diarization tool:  

www-­‐lium.univ-­‐lemans.fr/en/content/liumspkdiarization• Input:  a  video• Output:  homogeneous  segments  

Speaker  diarization6

07/12/2016

Result  ranking7

07/12/2016

• Direct naming: maximize co-occurrences between clusters and named entities.− Face naming: name 𝑁-. and talking score 𝑡 𝑁-.

− Speaker naming: name 𝑁-0 and equal score 1.0

• For one shot 𝑠 : 𝑄6 =  ∅• Names which face agrees with speaker naming rank highest:

− If ∃𝑁;0 /𝑁-. = 𝑁;0 : 𝑄6  ← 𝑁-. , 2.0 + 𝑡 𝑁-.

• Otherwise, face naming has higher rank:− If ∄𝑁;0 /𝑁-

.= 𝑁;0 : 𝑄6  ← 𝑁-. , 1.0 + 𝑡 𝑁-.

− If ∄𝑁-0 /𝑁-. = 𝑁;0 : 𝑄6  ← 𝑁-0 , 1.0

Result  ranking8

07/12/2016

Shot  1 Shot  2 Shot  3 Shot  4

Query: Results:  2  – 4  – 1  -­‐ 3  

Submissions9

07/12/2016

MAP@1 MAP@10 MAP@100Sub.  (1) 30.3 22.0 21.0Sub.  (2) 58.6 42.9 42.0Sub. (3) 64.2 53.1 52.1Sub.  (4) 68.3 56.2 54.7Sub.  (5) 79.2 65.2 63.4

Face  diarization Baseline  OCR-­‐NER Face  namingSub.  (1)

Face  diarization Our  OCR-­‐NER Face  namingSub.  (2)

Face  diarization Our  OCR-­‐NER Talking  face  naming

Sub.  (3)

Face  diarization OCR-­‐NER Talking  face  naming+

Speaker  naming

Sub.  (4)Speaker  

diarization OCR-­‐NER

Sub.  (4)  +  Sub.  (1)  +  Baseline  2Sub.  (5)

12/7/16

The  End10