sparse kernel learning for image annotation

39
Sparse Kernel Learning for Image Annotation Sean Moran and Victor Lavrenko Institute of Language, Cognition and Computation School of Informatics University of Edinburgh ICMR’14 Glasgow, April 2014

Upload: sean-moran

Post on 26-May-2015

661 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Sparse Kernel Learning for Image Annotation

Sparse Kernel Learning for Image Annotation

Sean Moran and Victor Lavrenko

Institute of Language, Cognition and ComputationSchool of Informatics

University of Edinburgh

ICMR’14 Glasgow, April 2014

Page 2: Sparse Kernel Learning for Image Annotation

Sparse Kernel Learning for Image Annotation

Overview

SKL-CRM

Evaluation

Conclusion

Page 3: Sparse Kernel Learning for Image Annotation

Sparse Kernel Learning for Image Annotation

Overview

SKL-CRM

Evaluation

Conclusion

Page 4: Sparse Kernel Learning for Image Annotation

Assigning words to pictures

Feature Extraction

GIST SIFT LAB HAAR

Tiger, Grass, Whiskers

City, Castle, Smoke

Tiger, Tree, Leaves

Eagle, Sky

Training Dataset

P(Tiger | ) = 0.15

P(Grass | ) = 0.12

P(Whiskers| ) = 0.12

Top 5 words as annotation

This talk:How best to

combinefeatures?

Multiple Features

Ranked list of words

Tiger, Grass, Tree Leaves, Whiskers

Annotation Model

P(Leaves | ) = 0.10

P(Tree | ) = 0.10

P(Smoke | ) = 0.01

Testing Image

P(City | ) = 0.03

P(Waterfall | ) = 0.05

P(Castle | ) = 0.03

P(Eagle | ) = 0.02

P(Sky | ) = 0.08

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X6

X5

X4

X3

X2

X1

X6

X5

X4

X3

X2

X1

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X6

X5

X4

X3

X2

X1

X6

X5

X4

X3

X2

X1

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X6

X5

X4

X3

X2

X1

X6

X5

X4

X3

X2

X1

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X6

X5

X4

X3

X2

X1

X6

X5

X4

X3

X2

X1

X1

X2

X3

X4

X5

X6

Page 5: Sparse Kernel Learning for Image Annotation

Previous work

I Topic models: latent Dirichlet allocation (LDA) [Barnard etal. ’03], Machine Translation [Duygulu et al. ’02]

I Mixture models: Continuous Relevance Model (CRM)[Lavrenko et al. ’03], Multiple Bernoulli Relevance Model(MBRM) [Feng ’04]

I Discriminative models: Support Vector Machine (SVM)[Verma and Jahawar ’13], Passive Aggressive Classifier[Grangier ’08]

I Local learning models: Joint Equal Contribution (JEC)[Makadia’08], Tag Propagation (Tagprop) [Guillaumin et al.’09], Two-pass KNN (2PKNN) [Verma et al. ’12]

Page 6: Sparse Kernel Learning for Image Annotation

Combining different feature types

I Previous work: linear combination of feature distances in aweighted summation with “default” kernels:

Kernels

x

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Laplacian UniformGaussian

I Standard kernel assignment: Gaussian for Gist, Laplacianfor colour features, χ2 for SIFT

Page 7: Sparse Kernel Learning for Image Annotation

Data-adaptive visual kernels

I Our contribution: permit the visual kernels themselves toadapt to the data:

Kernels

x

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Laplacian UniformGaussian

Corel 5K

I Hypothesis: Optimal kernels for GIST, SIFT etc dependenton the image dataset itself

Page 8: Sparse Kernel Learning for Image Annotation

Data-adaptive visual kernels

I Our contribution: permit the visual kernels themselves toadapt to the data:

Kernels

x

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Laplacian UniformGaussian

IAPR TC12

I Hypothesis: Optimal kernels for GIST, SIFT etc dependenton the image dataset itself

Page 9: Sparse Kernel Learning for Image Annotation

Sparse Kernel Continuous Relevance Model (SKL-CRM)

Overview

SKL-CRM

Evaluation

Conclusion

Page 10: Sparse Kernel Learning for Image Annotation

Continuous Relevance Model (CRM)

I CRM estimates joint distribution of image features (f) andwords (w)[Lavrenko et al. 2003]:

P(w, f) =∑J∈T

P(J)N∏

j=1

P(wj |J)M∏i=1

P(~fi |J)

I P(J): Uniform prior for training image JI P(~fi |J): Gaussian non-parametric kernel density estimateI P(wi |J): Multinomial for word smoothing

I Estimate marginal probability distribution over individual tags:

P(w |f) =P(w , f)∑w P(w , f)

I Top e.g. 5 words with highest P(w |f) used as annotation

Page 11: Sparse Kernel Learning for Image Annotation

Sparse Kernel Learning CRM (SKL-CRM)

I Introduce binary kernel-feature alignment matrix Ψu,v

P(I |J) =M∏i=1

R∑j=1

exp

{− 1

β

∑u,v

Ψu,vkv (~f ui ,~f uj )

}

I kv (~f ui ,~f uj ): v -th kernel function on the u-th feature type

I β: kernel bandwidth parameter

I Goal: learn Ψu,v by directly maximising annotation F1 scoreon held-out validation dataset

Page 12: Sparse Kernel Learning for Image Annotation

Generalised Gaussian Kernel

I Shape factor p: traces out an infinite family of kernels

P(~fi |~fj) =p1−1/p

2βΓ(1/p)exp

[−1

p

|~fi − ~fj |p

βp

]

I Γ: Gamma functionI β: kernel bandwidth parameter

Page 13: Sparse Kernel Learning for Image Annotation

Generalised Gaussian Kernel

I Shape factor p: traces out an infinite family of kernels

P(~fi |~fj) =p1−1/p

2βΓ(1/p)exp

[−1

p

|~fi − ~fj |p

βp

]

x

GG(x ;

p)

p =2

Page 14: Sparse Kernel Learning for Image Annotation

Generalised Gaussian Kernel

I Shape factor p: traces out an infinite family of kernels

P(~fi |~fj) =p1−1/p

2βΓ(1/p)exp

[−1

p

|~fi − ~fj |p

βp

]

x

GG(x ;

p)

p =1

Page 15: Sparse Kernel Learning for Image Annotation

Generalised Gaussian Kernel

I Shape factor p: traces out an infinite family of kernels

P(~fi |~fj) =p1−1/p

2βΓ(1/p)exp

[−1

p

|~fi − ~fj |p

βp

]

x

GG(x ;

p)

p =15

Page 16: Sparse Kernel Learning for Image Annotation

Multinomial Kernel

I Multinomial kernel optimised for count-based features:

P(~fi |~fj) =(∑

d fi ,d)!∏d (fi ,d !)

∏d

(pj ,d)fi,d

I fi,d : count for bin d in the unlabelled image iI fj,d count for the training image j

I Jelinek-Mercer smoothing used to estimate pj ,d :

pj ,d = λfj ,d∑d fj ,d

+ (1− λ)

∑j fj ,d∑

j ,d fj ,d

I We also consider standard χ2 and Hellinger kernels

Page 17: Sparse Kernel Learning for Image Annotation

Greedy kernel-feature alignment

Features

Kernels

Laplacian

GIST HAAR

Gaussian Uniform

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

0 0 0 0

0 0 0 0

0 0 0 0

GIST SIFT LAB HAAR

Laplacian

Gaussian

Uniform

Ψ vu

X6

Iteration 0:

F1 0.0

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

X6

Testing Image

Training Image

x

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Page 18: Sparse Kernel Learning for Image Annotation

Greedy kernel-feature alignment

Features

Kernels

Laplacian

GIST HAAR

Uniform

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

0 0 0 0

1 0 0 0

0 0 0 0

GIST SIFT LAB HAAR

Laplacian

Gaussian

Uniform

Ψ vu

X6

Iteration 1:

F1 0.25

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

X6

Testing Image

Training Image

x

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Gaussian

Page 19: Sparse Kernel Learning for Image Annotation

Greedy kernel-feature alignment

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

0 0 0 0

1 0 0 0

0 0 0 1

GIST SIFT LAB HAAR

Laplacian

Gaussian

Uniform

Ψ vu

X6

Iteration 2:

F1 0.34

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

X6

Testing Image

Training Image

Kernels

Laplacian Uniformx

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Gaussian

Page 20: Sparse Kernel Learning for Image Annotation

Greedy kernel-feature alignment

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

0 0 0 0

1 1 0 0

0 0 0 1

GIST SIFT LAB HAAR

Laplacian

Gaussian

Uniform

Ψ vu

X6

Iteration 3:

F1 0.38

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

X6

Testing Image

Training Image

Kernels

x

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Gaussian Laplacian Uniform

Page 21: Sparse Kernel Learning for Image Annotation

Greedy kernel-feature alignment

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

0 0 1 0

1 1 0 0

0 0 0 1

GIST SIFT LAB HAAR

Laplacian

Gaussian

Uniform

Ψ vu

X6

Iteration 4:

F1 0.42

Features

GIST HAAR

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

X1

X2

X3

X4

X5

X6

SIFT LAB

X6

Testing Image

Training Image

Kernels

Laplacian Uniformx

GG(x

;p)

p =1

x

GG(x

;p)

p =15

x

GG(x

;p)

p =2

Gaussian

Page 22: Sparse Kernel Learning for Image Annotation

Evaluation

Overview

SKL-CRM

Evaluation

Conclusion

Page 23: Sparse Kernel Learning for Image Annotation

Datasets/Features

I Standard evaluation datasets:

I Corel 5K: 5,000 images (landscapes, cities), 260 keywords

I IAPR TC12: 19,627 images (tourism, sports), 291 keywords

I ESP Game: 20,768 images (drawings, graphs), 268 keywords

I Standard “Tagprop” feature set [Guillaumin et al. ’09]:

I Bag-of-words histograms: SIFT [Lowe ’04] and Hue [van deWeijer & Schmid ’06]

I Global colour histograms: RGB, HSV, LAB

I Global GIST descriptor [Oliva & Torralba ’01]

I Descriptors, except GIST, also computed in a 3x1 spatialarrangement [Lazebnik et al. ’06]

Page 24: Sparse Kernel Learning for Image Annotation

Evaluation Metrics

I Standard evaluation metrics [Guillaumin et al. ’09]:

I Mean per word Recall (R)

I Mean per word Precision (P)

I F1 Measure

I Number of words with recall > 0 (N+)

I Fixed annotation length of 5 keywords

Page 25: Sparse Kernel Learning for Image Annotation

F1 score of CRM model variants

Corel 5K IAPR TC12 ESP Game0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

CRM

CRM 15

SKL-CRM

F1

Page 26: Sparse Kernel Learning for Image Annotation

F1 score of CRM model variants

Corel 5K IAPR TC12 ESP Game0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

CRM

CRM 15

SKL-CRM

F1

Original CRMDuygulu et al.

features

Page 27: Sparse Kernel Learning for Image Annotation

F1 score of CRM model variants

Corel 5K IAPR TC12 ESP Game0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

CRM

CRM 15

SKL-CRM

F1

Original CRMDuygulu et al.

features

Original CRM15 Tagprop

features +71%

Page 28: Sparse Kernel Learning for Image Annotation

F1 score of CRM model variants

Corel 5K IAPR TC12 ESP Game0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

CRM

CRM 15

SKL-CRM

F1

Original CRMDuygulu et al.

features

Original CRM15 Tagprop

features +71%

SKL-CRM15 Tagprop

features +45%

Page 29: Sparse Kernel Learning for Image Annotation

F1 score of SKL-CRM on Corel 5K

HSV_V3H1DS

HS_V3H1HSV

HSHH_V3H1

GISTLAB_V3H1

RGB_V3H1RGB

DH_V3H1DH

HHLAB

DS_V3H1

0.31

0.33

0.35

0.37

0.39

0.41

0.43

0.45

SKL-CRM (Valid F1)

SKL-CRM (Test F1)

Tagprop (Test F1)

Feature type

F1

Page 30: Sparse Kernel Learning for Image Annotation

F1 score of SKL-CRM on Corel 5K

HSV_V3H1DS

HS_V3H1HSV

HSHH_V3H1

GISTLAB_V3H1

RGB_V3H1RGB

DH_V3H1DH

HHLAB

DS_V3H1

0.31

0.33

0.35

0.37

0.39

0.41

0.43

0.45

SKL-CRM (Valid F1)

SKL-CRM (Test F1)

Tagprop (Test F1)

Feature type

F1

Page 31: Sparse Kernel Learning for Image Annotation

F1 score of SKL-CRM on Corel 5K

HSV_V3H1DS

HS_V3H1HSV

HSHH_V3H1

GISTLAB_V3H1

RGB_V3H1RGB

DH_V3H1DH

HHLAB

DS_V3H1

0.31

0.33

0.35

0.37

0.39

0.41

0.43

0.45

SKL-CRM (Valid F1)

SKL-CRM (Test F1)

Tagprop (Test F1)

Feature type

F1

Page 32: Sparse Kernel Learning for Image Annotation

F1 score of SKL-CRM on Corel 5K

HSV_V3H1DS

HS_V3H1HSV

HSHH_V3H1

GISTLAB_V3H1

RGB_V3H1RGB

DH_V3H1DH

HHLAB

DS_V3H1

0.31

0.33

0.35

0.37

0.39

0.41

0.43

0.45

SKL-CRM (Valid F1)

SKL-CRM (Test F1)

Tagprop (Test F1)

Feature type

F1

Page 33: Sparse Kernel Learning for Image Annotation

F1 score of SKL-CRM on Corel 5K

HSV_V3H1DS

HS_V3H1HSV

HSHH_V3H1

GISTLAB_V3H1

RGB_V3H1RGB

DH_V3H1DH

HHLAB

DS_V3H1

0.31

0.33

0.35

0.37

0.39

0.41

0.43

0.45

SKL-CRM (Valid F1)

SKL-CRM (Test F1)

Tagprop (Test F1)

Feature type

F1

Page 34: Sparse Kernel Learning for Image Annotation

Optimal kernel-feature alignments on Corel 5K

I Optimal alignments1:

I HSV: Multinomial (λ = 0.99)I HSV V3H1: Generalised Gaussian (p=0.9)I Harris Hue (HH V3H1): Generalised Gaussian (p=0.1) ≈

Dirac spike!I Harris SIFT (HS): GaussianI HS V3H1: Generalised Gaussian (p=0.7)I DenseSift (DS): Laplacian

I Our data-driven kernels more effective than standard kernels

I No alignment agrees with literature default assignment i.e.Gaussian for Gist, Laplacian for colour histogram, χ2 for SIFT

1V3H1 denotes descriptors computed in a spatial arrangement

Page 35: Sparse Kernel Learning for Image Annotation

SKL-CRM Results vs. Literature (Precision & Recall)

R P R P0.20

0.25

0.30

0.35

0.40

0.45

0.50

MBRM JEC

Tagprop GS

SKL-CRM

Corel 5K IAPR TC12

Page 36: Sparse Kernel Learning for Image Annotation

SKL-CRM Results vs. Literature (N+)

MBRM JEC Tagprop GS SKL-CRM0

50

100

150

200

250

300

Corel 5K

IAPR TC12

N+

Page 37: Sparse Kernel Learning for Image Annotation

Conclusion

Overview

SKL-CRM

Evaluation

Conclusion

Page 38: Sparse Kernel Learning for Image Annotation

Conclusions and Future Work

I Proposed a sparse kernel model for image annotation

I Key experimental findings:

I Default kernel-feature alignment suboptimal

I Data-adaptive kernels are superior to standard kernels

I Sparse set of features just as effective as much larger set

I Greedy forward selection as effective as gradient ascent

I Future work: superposition of kernels per feature type

Page 39: Sparse Kernel Learning for Image Annotation

Thank you for your attention

Sean Moran

[email protected]