semi-supervised concept detection by learning the structure of similarity graphs
DESCRIPTION
Our presentation at MMM 2013, Huangshan, China.TRANSCRIPT
19th International Conference on Multimedia ModelingHuangshan, China, Jan 7-9, 2012
Semi-supervised concept detection by learning the structure of similarity graphsSymeon Papadopoulos1, Christos Sagonas1, Ioannis Kompatsiaris1, Athena Vakali2
1 Centre for Research and Technology Hellas, Information Technologies Institute 2 Aristotle University of Thessaloniki, Informatics Department
mklab.iti.gr #2
chocolatecakechocolateganachebuttercreamshamsd
IMAGE TAGS CONCEPTS
N/A
naturelandscapewaterreflectionmirrorflickreliteabigfave
food
femaleindoorpeopleportrait
cloudslakeskywater
SOURCE: MIR-Flickr
mklab.iti.gr #3
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #4
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #5
Concept detection
ML perspective• Given an image, produce a set of relevant concepts
IR perspective• Given an image collection and a concept of interest,
rank all images in order of relevance.
mklab.iti.gr
• Transductive learning setting
#6
Predict concepts associated with items of by processing together and .
Semi-supervised learning
target concepts
annotated set
D-dimensional feature vector from image i
concept indicator vector (labels) for image i
set of unknown items
mklab.iti.gr #7
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #8
Related work
• Neighborhood similarity (Wang et al., 2009)– Uses image similarity graphs in combination with graph-based SSL
(Zhu, 2005; Zhou et al., 2004) – Not incremental
• Sparse similarity graph by convex optim. (Tang et al., 2009)– Applicable to online settings - Computationally intensive training step
• Hashing-based graph construction (Chen et al., 2010)– Uses KL divergence multi-label propagation, but relies on iterative
computational scheme – Difficult to apply in incremental settings
• Social dimensions (Tang & Liu, 2011)– Uses LEs for networked classification problems (i.e. when network
between nodes is explicit) – Not incremental, not applied to multimedia
mklab.iti.gr #9
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #10
Graph Structure Features (GSF)
mklab.iti.gr #11
Graph construction
Construction options• full weighted graph
• kNN graph (connect k most similar images)
• εNN graph (connect images < similarity threshold)
image similarity graph
set of nodes-images
cardinality of node set
mklab.iti.gr #12
Eigenvector/value computation
Normalized graph Laplacian
degree matrix (diagonal)
adjacency matrix
(typical form of graph Laplacian: )
non-zero eigenvalues
graph structure features*
by solving
*aka Laplacian Eigenmaps
mklab.iti.gr #13
Graph structure feature learning
• Each media item is represented by a vector
• At this point, any supervised learning method could be used.[note that the whole framework is still SSL since unlabeled items are
used during graph construction]
• SVM is selected– good performance in several problems– good implementations available (LibSVM, LIBLINEAR)– real-valued output (IR perspective rank images by concept)
mklab.iti.gr #14
Intuition
2nd eigenvector of graph Laplacian
0.2748
0.24150.3077
0.24150.3144
-0.0893
-0.4552
-0.4552
-0.4663
coast, person
coast, person
coast, person
coast, person
coastcoast
coast
coastcoast
mklab.iti.gr #15
Incremental learning setting (1)
• Transductive learning setting often impractical. For each new set of unlabeled items:
1. recompute image similarity matrix2. recompute graph structure features (LEs)3. use SVM to obtain prediction scores
• Step 2 is computationally expensive.• Devise two incremental schemes:
– Linear Projection (LP) :
– Submanifold Analysis (SA) [cf. next slide]
set of k most similar images
mklab.iti.gr #16
Incremental learning setting (2)
• Submanifold Analysis [Jia et al., 2009]– Construct (k+1)x(k+1) similarity matrix WS between new
item and k most images from the annotated set– Construct sub-diagonal and sub-Laplacian matrices
– Compute eigenvalues and d eigenvectors corresponding to non-zero eigenvalues [computation is lightweight since k << n]
– Minimize reconstruction error:
– Reconstruct approximate eigenvectors:
mklab.iti.gr #17
Fusion of multiple features
Feature fusion (F-FEAT)
Similarity graph fusion (F-SIM)
Graph struct. feature fusion (F-GSF)
Result fusion (F-RES)
mklab.iti.gr #18
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #19
Synthetic data - experiments
• Use of four 2D distributions with limited number of samples (thousands) to test many settings
• Performance aspects– Parameters of approach: number of features (CD), graph
construction technique (kNN, εNN) and parameters (k, ε)– Learning setting (training size, data noise, nr. of classes)– Inductive learning (LP vs SA)– Fusion method
TWO MOONS LINES CIRCLES GAUSSIANS
mklab.iti.gr #20
Role of number of GSF (CD)TWO MOONS LINES
CIRCLES GAUSSIANS
noise levels
higher CD better mAPhigher noise higher CD
mklab.iti.gr #21
Role of graph construction technique
kNN εNN
kNN better and less sensitive than εΝΝ
mklab.iti.gr #22
Role of noise (σ)TWO MOONS LINES
CIRCLES GAUSSIANScompeting methods
In most cases GSF equal or better than the expensive SVM-RBF.
mklab.iti.gr #23
Role of training samples (α%)TWO MOONS LINES
CIRCLES GAUSSIANS
In most cases few training samples (2-5%) are sufficient for high accuracy.
mklab.iti.gr #24
Number of classes (K)
LINES CIRCLES
Sufficiently good accuracy wrt. number of classes(much better than linear SVM, a bit worse than SVM-RBF).
mklab.iti.gr #25
Scalability wrt. number of features
Constant cost wrt. dimensionality
Linearly increasing cost wrt. dimensionality
mklab.iti.gr #26
Comparison between fusion methods
LINES CIRCLES
Even when one feature goes bad, result and GSF fusion still do better than the best.
mklab.iti.gr #27
Incremental schemesTWO MOONS LINES
CIRCLES GAUSSIANS
SA much better and less sensitive than LP.
mklab.iti.gr #28
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #29
Experimental setting
• MIR-Flickr– 25,000 images + tags– 38 concepts (24 + 14 with two interpretations [strict/rel])
• Benchmark methods– Semantic Spaces (SESPA) [Hare & Lewis, 2010]– Multiple Kernel Learning (MKL) [Guillaumin et al., 2010]
mklab.iti.gr #30
GSF vs SESPA
GSF-F1, F2, F3: Single feature GSF
GSF-D1, D2: Result fusion using LIBLINEAR (1) and RBF (2)
GSF-C: Graph structure feature fusion
mklab.iti.gr #31
GSF vs MKL
VISUAL
TAG
Possible thanks to scalable behavior wrt.
number of features.
GSF better in: baby, bird, car, dog, river, sea.
MKL better in: baby, bird, river, sea.
mklab.iti.gr #32
Example results
mklab.iti.gr #33
Evaluation: adding unlabeled samples (1)
GIST
~6% relative increase in mAP
mklab.iti.gr #34
Evaluation: adding unlabeled samples (2)
DenseSiftV3H1
~12% relative increase in mAP
mklab.iti.gr #35
Evaluation: adding unlabeled samples (3)
TagRaw50
~4% relative increase in mAP
mklab.iti.gr #36
Overview
• Problem formulation• Related work• Graph Structure Features Approach• Evaluation
– Synthetic datasets– MIR-Flickr
• Conclusions
mklab.iti.gr #37
Conclusions
• Concept detection approach based on the structure of image similarity graphs– Transductive learning setting– Two variants for online learning
• Thorough experimental analysis– Behavior under a variety of settings/parameters– Equivalent or better behavior compared to SoA approaches
• Fast: – SA with k=5 takes 38.4msec per image (not incl. feature extraction)– Future work: further analysis of computational characteristics +
application to larger scale datasets (NUS-Wide, ImageNet)
mklab.iti.gr #39
References (1)
• Graph-based semi-supervised learningZhu, X.: Semi-supervised learning with graphs. PhD Thesis, Carnegie
Mellon University, 0-542-19059-1 (2005)Zhou, D., Bousquet, O., Navin Lal, T., Weston, J. Schoelkopf, B.: Learning
with Local and Global Consistency. Advances in NIPS 16, MIT Press (2004) 321-328
• Related approachesWang, M., Hua, X.-S. Tang, J., Hong, R.: Beyond distance measurement:
constructing neighborhood similarity for video annotation. TMM 11 (3) (2009), 465-476
Tang, J. et al.: Inferring semantic concepts from community contributed images and noisy tags. ACM Multimedia (2009) 223-232
Chen, X. et al.: Efficient large scale image annotation by probabilistic collaborative multi-label propagation. ACM Multimedia (2010), 35-44
Tang, L., Liu, H.: Leveraging social media networks for classification. Data Mining and Knowledge Discovery 23 (3) (2011), 447-478
mklab.iti.gr #40
References (2)
• Relational classificationMacskassy, S.A., Provost, F.: Classification in Networked Data: A Toolkit
and a Univariate Case Study. Journal of Machine Learning Research 8, (2007), 935-983
• Laplacian EigenmapsMikhail, B., Partha, N.: Laplacian Eigenmaps for dimensionality reduction
and data representation. Neural Computing 15 (6), MIT Press (2003) 1373-1396
Jia, P., Yin, J., Huang, X., Hu, D.: Incremental Laplacian eigenmaps by preserving adjacent information between data points. PR Letters 30 (16) (2009), 1457–1463
mklab.iti.gr #41
References (3)
• ToolsLeyffer, S., Mahajan, A.: Nonlinear Constrained Optimization: Methods and
Software. Preprint ANL/MCS-P1729-0310 (2010)Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A Library for Large Linear
Classification. Journal of ML Research 9 (2008), 1871-1874Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology 2 (3) (2011), 27:1–27:27
• DatasetHuiskes, M.J., Michael S. Lew, M.S.: The MIR Flickr Retrieval Evaluation.
Proceedings of ACM Intern. Conf. on Multimedia Information Retrieval (2008)
• Competing methodsHare, J.S., Lewis, P.H.: Automatically annotating the MIR Flickr dataset. ACM ICMR
(2010), 547-556Guillaumin, M., Verbeek, J., Schmid, C.: Multimodal semi supervised learning for
image classification. Proceedings of IEEE CVPR Conference (2010), 902-909
mklab.iti.gr #42