mediaeval 2016 - upmc at mediaeval2016 retrieving diverse social images task

38
page 1/9 UPMC at MediaEval2016 Retrieving Diverse Social Images Task Sabrina Tollari Universit´ e Pierre et Marie CURIE (UPMC) — Paris 6, UMR CNRS LIP6 MediaEval 2016 Workshop October 21th, 2016 — Hilversum, Netherlands

Upload: multimediaeval

Post on 09-Jan-2017

15 views

Category:

Science


0 download

TRANSCRIPT

page 1/9

UPMC at MediaEval2016Retrieving Diverse Social Images Task

Sabrina Tollari

Universite Pierre et Marie CURIE (UPMC) — Paris 6, UMR CNRS LIP6

MediaEval 2016 WorkshopOctober 21th, 2016 — Hilversum, Netherlands

page 2/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

page 2/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

page 2/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

page 2/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

page 2/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

page 2/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

page 3/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

F Is this strategya good idea ?

F Does therelevance ofthe baselineaffect the finalresults ?

0 50 100 150 200 250 300

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.6

0.65

0.7

0.75

0.8

nb clusters

P@

20

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

nb clusters

CR

@2

0

baseline

Only Step1 (VSM(ttu))

AHC Without Step1

AHC With Step1

AHCCompl(cred)

page 3/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

F Is this strategya good idea ?

F Does therelevance ofthe baselineaffect the finalresults ?

0 50 100 150 200 250 300

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

Only Step1 (VSM(ttu))

AHCCompl(cred)

0 50 100 150 200 250 300

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

VSM(ttu): Vector Space Model using title (t), tags (t), username (u)

0 50 100 150 200 250 3000.6

0.65

0.7

0.75

0.8

nb clusters

P@

20

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

nb clusters

CR

@2

0

baseline

Only Step1 (VSM(ttu))

AHC Without Step1

AHC With Step1

AHCCompl(cred)

page 3/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

F Is this strategya good idea ?

F Does therelevance ofthe baselineaffect the finalresults ?

0 50 100 150 200 250 300

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

VSM(ttu): Vector Space Model using title (t), tags (t), username (u)

0 50 100 150 200 250 3000.6

0.65

0.7

0.75

0.8

nb clusters

P@

20

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

nb clusters

CR

@2

0

baseline

Only Step1 (VSM(ttu))

AHC Without Step1

AHC With Step1

AHCCompl(cred)

page 3/9

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

F Is this strategya good idea ?

F Does therelevance ofthe baselineaffect the finalresults ?

0 50 100 150 200 250 300

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.6

0.65

0.7

0.75

0.8

nb clusters

P@

20

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

nb clusters

CR

@2

0

baseline

Only Step1 (VSM(ttu))

AHC Without Step1

AHC With Step1

AHCCompl(cred)

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1@

20

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

page 4/9

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

nb clusters

F1

@2

0

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

0 50 100 150 200 250 3000.68

0.7

0.72

0.74

0.76

0.78

0.8

nb clusters

P@

20 baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

0 50 100 150 200 250 3000.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

nb clusters

CR

@20

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

page 5/9

Best results using only one feature

Which feature is the best with our method (on devset) ?

0 20 40 60 80 100 120 140

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1

@2

0 (

de

vse

t)

baseline

VSM(ttu)

cred

username

text only (tdtu)

visual only (ScalCol)

visual only (cnn_ad)

random

VSM(ttu)+AHCCompl(feature)

On devset, we test several features� We use the credibility descriptors (cred) as a vector input for AHC

=⇒ cred gives the best results (of only one feature results)4 Not confirmed on testset

What is the meaning of this feature for diversity ?� One vector per user, and not per document� cred is better than grouping documents by username (on devset)� cred and username are better than text only (on devset)

page 5/9

Best results using only one feature

Which feature is the best with our method (on devset) ?

0 20 40 60 80 100 120 140

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1

@2

0 (

de

vse

t)

baseline

VSM(ttu)

cred

username

text only (tdtu)

visual only (ScalCol)

visual only (cnn_ad)

random

VSM(ttu)+AHCCompl(feature)

On devset, we test several features� We use the credibility descriptors (cred) as a vector input for AHC

=⇒ cred gives the best results (of only one feature results)4 Not confirmed on testset

What is the meaning of this feature for diversity ?� One vector per user, and not per document� cred is better than grouping documents by username (on devset)� cred and username are better than text only (on devset)

page 5/9

Best results using only one feature

Which feature is the best with our method (on devset) ?

0 20 40 60 80 100 120 140

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1

@2

0 (

de

vse

t)

baseline

VSM(ttu)

cred

username

text only (tdtu)

visual only (ScalCol)

visual only (cnn_ad)

random

VSM(ttu)+AHCCompl(feature)

0 20 40 60 80 100 120 1400.6

0.65

0.7

0.75

0.8

number of clusters

P@

20

(d

evse

t)

baseline

VSM(ttu)

cred

username

text only (tdtu)

visual only (ScalCol)

visual only (cnn-ad)

random

VSM(ttu)+AHCCompl(feature)

0 20 40 60 80 100 120 1400.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

number of clusters

CR

@2

0 (

de

vse

t)

baseline

VSM(ttu)

cred

username

text only (tdtu)

visual only (ScalCol)

visual only (cnn-ad)

random

VSM(ttu)+AHCCompl(feature)

page 6/9

Fusion of similarities

How to use different features to improve diversity ?

=⇒ Several possible ways, but we choose to fusion similarities

let sim(x , y) be a similarity between documents x and y

Linear fusion: f1, f2 two features, τ ∈ [0, 1]

simLinear(f1,f2,τ)(x , y) = τ · simf1(x , y) + (1− τ) · simf2(x , y)

Weighted-max fusion:

simWMax(f1,w1,f2,w2,··· ,fn,wn)(x , y) = maxi∈{1,··· ,n}

wi · simfi (x , y)

with n the number of features, wi weight for feature fi , suchas

∑ni=1 wi = 1

page 6/9

Fusion of similarities

How to use different features to improve diversity ?

=⇒ Several possible ways, but we choose to fusion similarities

let sim(x , y) be a similarity between documents x and y

Linear fusion: f1, f2 two features, τ ∈ [0, 1]

simLinear(f1,f2,τ)(x , y) = τ · simf1(x , y) + (1− τ) · simf2(x , y)

Weighted-max fusion:

simWMax(f1,w1,f2,w2,··· ,fn,wn)(x , y) = maxi∈{1,··· ,n}

wi · simfi (x , y)

with n the number of features, wi weight for feature fi , suchas

∑ni=1 wi = 1

page 6/9

Fusion of similarities

How to use different features to improve diversity ?

=⇒ Several possible ways, but we choose to fusion similarities

let sim(x , y) be a similarity between documents x and y

Linear fusion: f1, f2 two features, τ ∈ [0, 1]

simLinear(f1,f2,τ)(x , y) = τ · simf1(x , y) + (1− τ) · simf2(x , y)

Weighted-max fusion:

simWMax(f1,w1,f2,w2,··· ,fn,wn)(x , y) = maxi∈{1,··· ,n}

wi · simfi (x , y)

with n the number of features, wi weight for feature fi , suchas

∑ni=1 wi = 1

page 7/9

Best fusion results

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

On devset, we made a lot of experiments to optimise the weights of thefusions

The linear fusion of text and visual similarities gave much betterresults than text only (tdtu) or visual only (ScalCol)

But the linear fusion gave lower result than cred result

Finally, the best WMax fusion gave slightly better result than cred one

page 7/9

Best fusion results

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

On devset, we made a lot of experiments to optimise the weights of thefusions

The linear fusion of text and visual similarities gave much betterresults than text only (tdtu) or visual only (ScalCol)

But the linear fusion gave lower result than cred result

Finally, the best WMax fusion gave slightly better result than cred one

page 7/9

Best fusion results

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

On devset, we made a lot of experiments to optimise the weights of thefusions

The linear fusion of text and visual similarities gave much betterresults than text only (tdtu) or visual only (ScalCol)

But the linear fusion gave lower result than cred result

Finally, the best WMax fusion gave slightly better result than cred one

page 7/9

Best fusion results

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

On devset, we made a lot of experiments to optimise the weights of thefusions

The linear fusion of text and visual similarities gave much betterresults than text only (tdtu) or visual only (ScalCol)

But the linear fusion gave lower result than cred result

Finally, the best WMax fusion gave slightly better result than cred one

page 7/9

Best fusion results

0 20 40 60 80 100 120

0.46

0.48

0.5

0.52

0.54

0.56

0.58

number of clusters

F1@

20 (

devset)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

0 20 40 60 80 100 1200.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

number of clusters

P@

20

(d

evse

t)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

0 20 40 60 80 100 1200.36

0.38

0.4

0.42

0.44

0.46

0.48

0.5

number of clusters

CR

@2

0 (

de

vse

t)

baseline

VSM(ttu)

cred

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

Linear(tdtu,ScalCol,0.02)

tdtu

ScalCol

VSM(ttu)+AHCCompl(feature)

page 8/9

Run results

Step 1 Steps 2-4: AHC devset testset

Run Features F1@20 F1@20

baseline - - 0.467(ref.) -run 1 No visual 0.498(+7%) 0.430run 2 Yes text 0.569(+22%) 0.552run 3 Yes Linear(text,visual) 0.582(+25%) 0.553run 4 Yes cred 0.585(+25%) 0.543run 5 Yes WMax(text,visual,cred) 0.588(+26%) 0.544Number of queries 70 64

On testset

Text feature gives better results than cred feature

Best results is obtained using the linear fusion of visual and textualfeature similarities, not using WMax

The F1@20 for run 2 to run 5 are very close (' 0.55)

=⇒ Difficult to make reliable conclusion

page 8/9

Run results

Step 1 Steps 2-4: AHC devset testset

Run Features F1@20 F1@20

baseline - - 0.467(ref.) -run 1 No visual 0.498(+7%) 0.430run 2 Yes text 0.569(+22%) 0.552run 3 Yes Linear(text,visual) 0.582(+25%) 0.553run 4 Yes cred 0.585(+25%) 0.543run 5 Yes WMax(text,visual,cred) 0.588(+26%) 0.544Number of queries 70 64

On testset

Text feature gives better results than cred feature

Best results is obtained using the linear fusion of visual and textualfeature similarities, not using WMax

The F1@20 for run 2 to run 5 are very close (' 0.55)

=⇒ Difficult to make reliable conclusion

page 8/9

Run results

Step 1 Steps 2-4: AHC devset testset

Run Features F1@20 F1@20

baseline - - 0.467(ref.) -run 1 No visual 0.498(+7%) 0.430run 2 Yes text 0.569(+22%) 0.552run 3 Yes Linear(text,visual) 0.582(+25%) 0.553run 4 Yes cred 0.585(+25%) 0.543run 5 Yes WMax(text,visual,cred) 0.588(+26%) 0.544Number of queries 70 64

On testset

Text feature gives better results than cred feature

Best results is obtained using the linear fusion of visual and textualfeature similarities, not using WMax

The F1@20 for run 2 to run 5 are very close (' 0.55)

=⇒ Difficult to make reliable conclusion

page 8/9

Run results

Step 1 Steps 2-4: AHC devset testset

Run Features F1@20 F1@20

baseline - - 0.467(ref.) -run 1 No visual 0.498(+7%) 0.430run 2 Yes text 0.569(+22%) 0.552run 3 Yes Linear(text,visual) 0.582(+25%) 0.553run 4 Yes cred 0.585(+25%) 0.543run 5 Yes WMax(text,visual,cred) 0.588(+26%) 0.544Number of queries 70 64

On testset

Text feature gives better results than cred feature

Best results is obtained using the linear fusion of visual and textualfeature similarities, not using WMax

The F1@20 for run 2 to run 5 are very close (' 0.55)

=⇒ Difficult to make reliable conclusion

page 9/9

Conclusion and discussion

On this benchmark and with our framework

Is it worth taking time to cluster 300 results ?

� To improve F1@20 ? No.� To ensure good F1@20 ? Yes.

On devset :

� The credibility descriptors gave very good results

4 Why ? What is the meaning of these descriptors for diversity ?

4 Results not so good on testset

� The WMax operator gave the best results

4 Not confirmed on testset, maybe an overfitting problem

� The linear fusion between text and visual gave good results

4 Confirmed on testset

Thank you for your attention

page 9/9

Conclusion and discussion

On this benchmark and with our framework

Is it worth taking time to cluster 300 results ?

� To improve F1@20 ? No.� To ensure good F1@20 ? Yes.

On devset :� The credibility descriptors gave very good results

4 Why ? What is the meaning of these descriptors for diversity ?

4 Results not so good on testset

� The WMax operator gave the best results

4 Not confirmed on testset, maybe an overfitting problem

� The linear fusion between text and visual gave good results

4 Confirmed on testset

Thank you for your attention

page 9/9

Conclusion and discussion

On this benchmark and with our framework

Is it worth taking time to cluster 300 results ?

� To improve F1@20 ? No.� To ensure good F1@20 ? Yes.

On devset :� The credibility descriptors gave very good results

4 Why ? What is the meaning of these descriptors for diversity ?

4 Results not so good on testset

� The WMax operator gave the best results

4 Not confirmed on testset, maybe an overfitting problem

� The linear fusion between text and visual gave good results

4 Confirmed on testset

Thank you for your attention

page 9/9

Conclusion and discussion

On this benchmark and with our framework

Is it worth taking time to cluster 300 results ?

� To improve F1@20 ? No.� To ensure good F1@20 ? Yes.

On devset :� The credibility descriptors gave very good results

4 Why ? What is the meaning of these descriptors for diversity ?

4 Results not so good on testset

� The WMax operator gave the best results

4 Not confirmed on testset, maybe an overfitting problem

� The linear fusion between text and visual gave good results

4 Confirmed on testset

Thank you for your attention

page 9/9

Conclusion and discussion

On this benchmark and with our framework

Is it worth taking time to cluster 300 results ?

� To improve F1@20 ? No.� To ensure good F1@20 ? Yes.

On devset :� The credibility descriptors gave very good results

4 Why ? What is the meaning of these descriptors for diversity ?

4 Results not so good on testset

� The WMax operator gave the best results

4 Not confirmed on testset, maybe an overfitting problem

� The linear fusion between text and visual gave good results

4 Confirmed on testset

Thank you for your attention