semi-supervised concept detection by learning the structure of similarity graphs

19th International Conference on Multimedia ModelingHuangshan, China, Jan 7-9, 2012

Semi-supervised concept detection by learning the structure of similarity graphsSymeon Papadopoulos1, Christos Sagonas1, Ioannis Kompatsiaris1, Athena Vakali2

1 Centre for Research and Technology Hellas, Information Technologies Institute 2 Aristotle University of Thessaloniki, Informatics Department

mklab.iti.gr #2

chocolatecakechocolateganachebuttercreamshamsd

IMAGE TAGS CONCEPTS

N/A

naturelandscapewaterreflectionmirrorflickreliteabigfave

food

femaleindoorpeopleportrait

cloudslakeskywater

SOURCE: MIR-Flickr

mklab.iti.gr #3

Overview

• Problem formulation• Related work• Graph Structure Features Approach• Evaluation

– Synthetic datasets– MIR-Flickr

• Conclusions

mklab.iti.gr #4

Overview



• Conclusions

mklab.iti.gr #5

Concept detection

ML perspective• Given an image, produce a set of relevant concepts

IR perspective• Given an image collection and a concept of interest,

rank all images in order of relevance.

mklab.iti.gr

• Transductive learning setting

#6

Predict concepts associated with items of by processing together and .

Semi-supervised learning

target concepts

annotated set

D-dimensional feature vector from image i

concept indicator vector (labels) for image i

set of unknown items

mklab.iti.gr #7

Overview



• Conclusions

mklab.iti.gr #8

Related work

• Neighborhood similarity (Wang et al., 2009)– Uses image similarity graphs in combination with graph-based SSL

(Zhu, 2005; Zhou et al., 2004) – Not incremental

• Sparse similarity graph by convex optim. (Tang et al., 2009)– Applicable to online settings - Computationally intensive training step

• Hashing-based graph construction (Chen et al., 2010)– Uses KL divergence multi-label propagation, but relies on iterative

computational scheme – Difficult to apply in incremental settings

• Social dimensions (Tang & Liu, 2011)– Uses LEs for networked classification problems (i.e. when network

between nodes is explicit) – Not incremental, not applied to multimedia

mklab.iti.gr #9

Overview



• Conclusions

mklab.iti.gr #10

Graph Structure Features (GSF)

mklab.iti.gr #11

Graph construction

Construction options• full weighted graph

• kNN graph (connect k most similar images)

• εNN graph (connect images < similarity threshold)

image similarity graph

set of nodes-images

cardinality of node set

mklab.iti.gr #12

Eigenvector/value computation

Normalized graph Laplacian

degree matrix (diagonal)

adjacency matrix

(typical form of graph Laplacian: )

non-zero eigenvalues

graph structure features*

by solving

*aka Laplacian Eigenmaps

mklab.iti.gr #13

Graph structure feature learning

• Each media item is represented by a vector

• At this point, any supervised learning method could be used.[note that the whole framework is still SSL since unlabeled items are

used during graph construction]

• SVM is selected– good performance in several problems– good implementations available (LibSVM, LIBLINEAR)– real-valued output (IR perspective rank images by concept)

mklab.iti.gr #14

Intuition

2nd eigenvector of graph Laplacian

0.2748

0.24150.3077

0.24150.3144

-0.0893

-0.4552

-0.4552

-0.4663

coast, person

coast, person

coast, person

coast, person

coastcoast

coast

coastcoast

mklab.iti.gr #15

Incremental learning setting (1)

• Transductive learning setting often impractical. For each new set of unlabeled items:

1. recompute image similarity matrix2. recompute graph structure features (LEs)3. use SVM to obtain prediction scores

• Step 2 is computationally expensive.• Devise two incremental schemes:

– Linear Projection (LP) :

– Submanifold Analysis (SA) [cf. next slide]

set of k most similar images

mklab.iti.gr #16

Incremental learning setting (2)

• Submanifold Analysis [Jia et al., 2009]– Construct (k+1)x(k+1) similarity matrix WS between new

item and k most images from the annotated set– Construct sub-diagonal and sub-Laplacian matrices

– Compute eigenvalues and d eigenvectors corresponding to non-zero eigenvalues [computation is lightweight since k << n]

– Minimize reconstruction error:

– Reconstruct approximate eigenvectors:

mklab.iti.gr #17

Fusion of multiple features

Feature fusion (F-FEAT)

Similarity graph fusion (F-SIM)

Graph struct. feature fusion (F-GSF)

Result fusion (F-RES)

mklab.iti.gr #18

Overview



• Conclusions

mklab.iti.gr #19

Synthetic data - experiments

• Use of four 2D distributions with limited number of samples (thousands) to test many settings

• Performance aspects– Parameters of approach: number of features (CD), graph

construction technique (kNN, εNN) and parameters (k, ε)– Learning setting (training size, data noise, nr. of classes)– Inductive learning (LP vs SA)– Fusion method

TWO MOONS LINES CIRCLES GAUSSIANS

mklab.iti.gr #20

Role of number of GSF (CD)TWO MOONS LINES

CIRCLES GAUSSIANS

noise levels

higher CD better mAPhigher noise higher CD

mklab.iti.gr #21

Role of graph construction technique

kNN εNN

kNN better and less sensitive than εΝΝ

mklab.iti.gr #22

Role of noise (σ)TWO MOONS LINES

CIRCLES GAUSSIANScompeting methods

In most cases GSF equal or better than the expensive SVM-RBF.

mklab.iti.gr #23

Role of training samples (α%)TWO MOONS LINES

CIRCLES GAUSSIANS

In most cases few training samples (2-5%) are sufficient for high accuracy.

mklab.iti.gr #24

Number of classes (K)

LINES CIRCLES

Sufficiently good accuracy wrt. number of classes(much better than linear SVM, a bit worse than SVM-RBF).

mklab.iti.gr #25

Scalability wrt. number of features

Constant cost wrt. dimensionality

Linearly increasing cost wrt. dimensionality

mklab.iti.gr #26

Comparison between fusion methods

LINES CIRCLES

Even when one feature goes bad, result and GSF fusion still do better than the best.

mklab.iti.gr #27

Incremental schemesTWO MOONS LINES

CIRCLES GAUSSIANS

SA much better and less sensitive than LP.

mklab.iti.gr #28

Overview



• Conclusions

mklab.iti.gr #29

Experimental setting

• MIR-Flickr– 25,000 images + tags– 38 concepts (24 + 14 with two interpretations [strict/rel])

• Benchmark methods– Semantic Spaces (SESPA) [Hare & Lewis, 2010]– Multiple Kernel Learning (MKL) [Guillaumin et al., 2010]

mklab.iti.gr #30

GSF vs SESPA

GSF-F1, F2, F3: Single feature GSF

GSF-D1, D2: Result fusion using LIBLINEAR (1) and RBF (2)

GSF-C: Graph structure feature fusion

mklab.iti.gr #31

GSF vs MKL

VISUAL

TAG

Possible thanks to scalable behavior wrt.

number of features.

GSF better in: baby, bird, car, dog, river, sea.

MKL better in: baby, bird, river, sea.

mklab.iti.gr #32

Example results

mklab.iti.gr #33

Evaluation: adding unlabeled samples (1)

GIST

~6% relative increase in mAP

mklab.iti.gr #34


DenseSiftV3H1


mklab.iti.gr #35


TagRaw50


mklab.iti.gr #36

Overview



• Conclusions

mklab.iti.gr #37

Conclusions

• Concept detection approach based on the structure of image similarity graphs– Transductive learning setting– Two variants for online learning

• Thorough experimental analysis– Behavior under a variety of settings/parameters– Equivalent or better behavior compared to SoA approaches

• Fast: – SA with k=5 takes 38.4msec per image (not incl. feature extraction)– Future work: further analysis of computational characteristics +

application to larger scale datasets (NUS-Wide, ImageNet)

mklab.iti.gr #38

Thank you

Further contact: [email protected]

mklab.iti.gr #39

References (1)

• Graph-based semi-supervised learningZhu, X.: Semi-supervised learning with graphs. PhD Thesis, Carnegie

Mellon University, 0-542-19059-1 (2005)Zhou, D., Bousquet, O., Navin Lal, T., Weston, J. Schoelkopf, B.: Learning

with Local and Global Consistency. Advances in NIPS 16, MIT Press (2004) 321-328

• Related approachesWang, M., Hua, X.-S. Tang, J., Hong, R.: Beyond distance measurement:

constructing neighborhood similarity for video annotation. TMM 11 (3) (2009), 465-476

Tang, J. et al.: Inferring semantic concepts from community contributed images and noisy tags. ACM Multimedia (2009) 223-232

Chen, X. et al.: Efficient large scale image annotation by probabilistic collaborative multi-label propagation. ACM Multimedia (2010), 35-44

Tang, L., Liu, H.: Leveraging social media networks for classification. Data Mining and Knowledge Discovery 23 (3) (2011), 447-478

mklab.iti.gr #40

References (2)

• Relational classificationMacskassy, S.A., Provost, F.: Classification in Networked Data: A Toolkit

and a Univariate Case Study. Journal of Machine Learning Research 8, (2007), 935-983

• Laplacian EigenmapsMikhail, B., Partha, N.: Laplacian Eigenmaps for dimensionality reduction

and data representation. Neural Computing 15 (6), MIT Press (2003) 1373-1396

Jia, P., Yin, J., Huang, X., Hu, D.: Incremental Laplacian eigenmaps by preserving adjacent information between data points. PR Letters 30 (16) (2009), 1457–1463

mklab.iti.gr #41

References (3)

• ToolsLeyffer, S., Mahajan, A.: Nonlinear Constrained Optimization: Methods and

Software. Preprint ANL/MCS-P1729-0310 (2010)Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A Library for Large Linear

Classification. Journal of ML Research 9 (2008), 1871-1874Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM

Transactions on Intelligent Systems and Technology 2 (3) (2011), 27:1–27:27

• DatasetHuiskes, M.J., Michael S. Lew, M.S.: The MIR Flickr Retrieval Evaluation.

Proceedings of ACM Intern. Conf. on Multimedia Information Retrieval (2008)

• Competing methodsHare, J.S., Lewis, P.H.: Automatically annotating the MIR Flickr dataset. ACM ICMR

(2010), 547-556Guillaumin, M., Verbeek, J., Schmid, C.: Multimodal semi supervised learning for

image classification. Proceedings of IEEE CVPR Conference (2010), 902-909

mklab.iti.gr #42

semi-supervised concept detection by learning the structure of similarity graphs

Technology

weighted graph knn graph

graph structure features

graph structure features

graph construction svm

similar images nn graph

image similarity graphs

image similarity matrix2

graphbased ssl zhu