mediaeval 2016 - upmc at mediaeval2016 retrieving diverse social images task

UPMC at MediaEval2016Retrieving Diverse Social Images Task

Sabrina Tollari

Universite Pierre et Marie CURIE (UPMC) — Paris 6, UMR CNRS LIP6

MediaEval 2016 WorkshopOctober 21th, 2016 — Hilversum, Netherlands

Framework

Strategy: first re-rank to improve relevance, then cluster usingagglomerative hierarchical clustering (AHC)

Baseline 1 1 1 1 1 1 1

Step 1: Re-rank baseline to improve relevance

1 2 3 4 5 6 7

Step 2: Cluster results using hierarchical clustering

6 5 4 2 1 7 3

Step 3: Sort the images, sort the clusters using image ranks in Step 1

1 2 3 7 4 5 6

Step 4: Re-rank the results alternating images from clusters

1 3 4 5 2 7 6

Framework

Baseline 1 1 1 1 1 1 1

1 2 3 4 5 6 7

6 5 4 2 1 7 3

1 2 3 7 4 5 6

1 3 4 5 2 7 6

Framework

Baseline 1 1 1 1 1 1 1

1 2 3 4 5 6 7

6 5 4 2 1 7 3

1 2 3 7 4 5 6

1 3 4 5 2 7 6

Framework

Baseline 1 1 1 1 1 1 1

1 2 3 4 5 6 7

6 5 4 2 1 7 3

1 2 3 7 4 5 6

1 3 4 5 2 7 6

Framework

Baseline 1 1 1 1 1 1 1

1 2 3 4 5 6 7

6 5 4 2 1 7 3

1 2 3 7 4 5 6

1 3 4 5 2 7 6

Framework

Baseline 1 1 1 1 1 1 1

1 2 3 4 5 6 7

6 5 4 2 1 7 3

1 2 3 7 4 5 6

1 3 4 5 2 7 6

Framework

F Is this strategya good idea ?

F Does therelevance ofthe baselineaffect the finalresults ?

0 50 100 150 200 250 300

nb clusters

baseline

Only Step1 (VSM(ttu))

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.6

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

nb clusters

baseline

AHC Without Step1

AHC With Step1

AHCCompl(cred)

Framework

0 50 100 150 200 250 300

nb clusters

baseline

AHCCompl(cred)

0 50 100 150 200 250 300

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

VSM(ttu): Vector Space Model using title (t), tags (t), username (u)

0 50 100 150 200 250 3000.6

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

nb clusters

baseline

AHC Without Step1

AHC With Step1

AHCCompl(cred)

Framework

0 50 100 150 200 250 300

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

VSM(ttu): Vector Space Model using title (t), tags (t), username (u)

0 50 100 150 200 250 3000.6

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

nb clusters

baseline

AHC Without Step1

AHC With Step1

AHCCompl(cred)

Framework

0 50 100 150 200 250 300

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.6

nb clusters

baseline

AHC without Step1

AHC with Step1

AHCCompl(cred)

0 50 100 150 200 250 3000.36

nb clusters

baseline

AHC Without Step1

AHC With Step1

AHCCompl(cred)

Influence of the number of documents

Is it worth taking time to cluster online 300 results in order toimprove the F1@20 of the first 20 documents ?

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

VSM(ttu)+AHCCompl(cred)

Similar F1@20 best value

But the peak of the curve iswider with 300 documents

=⇒ More chance to findthe best number ofclusters for testset

Most of the times (around)50 clusters gives the bestF1@20 for 300 documents

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.46

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.68

nb clusters

20 baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

0 50 100 150 200 250 3000.36

nb clusters

baseline

VSM(ttu)

AHC nbDocs=150

AHC nbDocs=300

Best results using only one feature

Which feature is the best with our method (on devset) ?

0 20 40 60 80 100 120 140

number of clusters

baseline

VSM(ttu)

username

text only (tdtu)

visual only (ScalCol)

visual only (cnn_ad)

random

VSM(ttu)+AHCCompl(feature)

On devset, we test several features� We use the credibility descriptors (cred) as a vector input for AHC

=⇒ cred gives the best results (of only one feature results)4 Not confirmed on testset

What is the meaning of this feature for diversity ?� One vector per user, and not per document� cred is better than grouping documents by username (on devset)� cred and username are better than text only (on devset)

0 20 40 60 80 100 120 140

number of clusters

baseline

VSM(ttu)

username

text only (tdtu)

random

On devset, we test several features� We use the credibility descriptors (cred) as a vector input for AHC

=⇒ cred gives the best results (of only one feature results)4 Not confirmed on testset

What is the meaning of this feature for diversity ?� One vector per user, and not per document� cred is better than grouping documents by username (on devset)� cred and username are better than text only (on devset)

0 20 40 60 80 100 120 140

number of clusters

baseline

VSM(ttu)

username

text only (tdtu)

random

0 20 40 60 80 100 120 1400.6

number of clusters

baseline

VSM(ttu)

username

text only (tdtu)

visual only (cnn-ad)

random

0 20 40 60 80 100 120 1400.36

number of clusters

baseline

VSM(ttu)

username

text only (tdtu)

visual only (cnn-ad)

random

Fusion of similarities

How to use different features to improve diversity ?

=⇒ Several possible ways, but we choose to fusion similarities

let sim(x , y) be a similarity between documents x and y

Linear fusion: f1, f2 two features, τ ∈ [0, 1]

simLinear(f1,f2,τ)(x , y) = τ · simf1(x , y) + (1− τ) · simf2(x , y)

Weighted-max fusion:

simWMax(f1,w1,f2,w2,··· ,fn,wn)(x , y) = maxi∈{1,··· ,n}

wi · simfi (x , y)

with n the number of features, wi weight for feature fi , suchas

∑ni=1 wi = 1

wi · simfi (x , y)

∑ni=1 wi = 1

wi · simfi (x , y)

∑ni=1 wi = 1

Best fusion results

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

Linear(tdtu,ScalCol,0.02)

ScalCol

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

WMax(tdtu,0.014,ScalCol,0.97,cred,0.016)

ScalCol

On devset, we made a lot of experiments to optimise the weights of thefusions

The linear fusion of text and visual similarities gave much betterresults than text only (tdtu) or visual only (ScalCol)

But the linear fusion gave lower result than cred result

Finally, the best WMax fusion gave slightly better result than cred one

Best fusion results

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

ScalCol

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

ScalCol

Best fusion results

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

ScalCol

Best fusion results

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

ScalCol

Best fusion results

0 20 40 60 80 100 120

number of clusters

devset)

baseline

VSM(ttu)

ScalCol

0 20 40 60 80 100 1200.66

number of clusters

baseline

VSM(ttu)

ScalCol

0 20 40 60 80 100 1200.36

number of clusters

baseline

VSM(ttu)

ScalCol

Run results

Step 1 Steps 2-4: AHC devset testset

Run Features F1@20 F1@20

baseline - - 0.467(ref.) -run 1 No visual 0.498(+7%) 0.430run 2 Yes text 0.569(+22%) 0.552run 3 Yes Linear(text,visual) 0.582(+25%) 0.553run 4 Yes cred 0.585(+25%) 0.543run 5 Yes WMax(text,visual,cred) 0.588(+26%) 0.544Number of queries 70 64

On testset

Text feature gives better results than cred feature

Best results is obtained using the linear fusion of visual and textualfeature similarities, not using WMax

The F1@20 for run 2 to run 5 are very close (' 0.55)

=⇒ Difficult to make reliable conclusion

Run results

On testset

Run results

On testset

Run results

On testset

Conclusion and discussion

On this benchmark and with our framework

Is it worth taking time to cluster 300 results ?

� To improve F1@20 ? No.� To ensure good F1@20 ? Yes.

On devset :

� The credibility descriptors gave very good results

4 Why ? What is the meaning of these descriptors for diversity ?

4 Results not so good on testset

� The WMax operator gave the best results

4 Not confirmed on testset, maybe an overfitting problem

� The linear fusion between text and visual gave good results

4 Confirmed on testset

Thank you for your attention

On devset :� The credibility descriptors gave very good results

mediaeval 2016 - upmc at mediaeval2016 retrieving diverse social images task

Science