simultaneous image classification and annotation

19
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09 (Final Version

Upload: gudrun

Post on 25-Feb-2016

55 views

Category:

Documents


2 download

DESCRIPTION

(Final Version). Simultaneous Image Classification and Annotation. Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09. Outline. Introduction Review of sLDA and Corr -LDA Model description - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simultaneous Image Classification and Annotation

Simultaneous Image Classification and Annotation

Chong Wang, David Blei, Li Fei-FeiComputer Science Department

Princeton University

Published in CVPR 2009

Presented by Eric Wang7-3-09

(Final Version)

Page 2: Simultaneous Image Classification and Annotation

Outline• Introduction

• Review of sLDA and Corr-LDA

• Model description

• Model Inference and Parameter Estimation

• Empirical Results

• Conclusion

Page 3: Simultaneous Image Classification and Annotation

Introduction• Images Classification refers to assigning a class label to each

image which globally describes the image.

• Image Annotation refers to assigning words which describe individual regions of the image.

• The images considered in this paper are both classified with a class label and annotated with free text.

• This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.

Page 4: Simultaneous Image Classification and Annotation

Review of sLDA• For each document

• In this model, the response variable y is a continuous random variable.

• are treated as unknown constants to be estimated, rather than as random variables.Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Page 5: Simultaneous Image Classification and Annotation

Review of sLDA• An application of sLDA considered by Blei et. al was

regressing a corpus of textual movie reviews to number of stars given.

Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Page 6: Simultaneous Image Classification and Annotation

Review of Corr-LDA• For each document

• Corr-LDA is a simple extension of LDA for images to annotate image regions.

Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.

Page 7: Simultaneous Image Classification and Annotation

Annotated sLDA Model

• This step is identical to LDA, the topic proportions are drawn once per document.

• In this paper, is not optimized.

Page 8: Simultaneous Image Classification and Annotation

Annotated sLDA Model

• A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features).

• Regions are found by segmenting images using the N-cuts algorithm.

• parameterizes a particular multinomial distribution (topic) over the quantized codewords.

Page 9: Simultaneous Image Classification and Annotation

Annotated sLDA Model

• The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework.

• The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label.

• The softmax function is well studied and is also known as “multinomial logistic regression”

Page 10: Simultaneous Image Classification and Annotation

Annotated sLDA Model

• The annotations are assigned to specific regions in the same manner as in Corr-LDA.

• This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky.

• Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.

Page 11: Simultaneous Image Classification and Annotation

Inference of Latent Variables

These updates are identical to those used in Corr-LDA.

• Let parameterize a multinomial over the K topics

• Let parameterize a Dirichlet over topic distributions.

• Let parameterize a multinomial over image regions

• These updates are local to each document (thus the omission of d).

n

m

Page 12: Simultaneous Image Classification and Annotation

Inference of Latent Variables

• This equation updates the posterior distribution over topics.

• Note that this update depends on both class label c and the annotation information .mw

Page 13: Simultaneous Image Classification and Annotation

Inference of Latent Variables

Parameter Estimation

has no closed form solution and is optimized via conjugate gradient

Updates of codebook word f in codebook topic i (proportional to a constant).

Page 14: Simultaneous Image Classification and Annotation

Inference of Latent Variables

Parameter Estimation

has no closed form solution and is optimized via conjugate gradient

Updates annotation word w in annotation topic i (proportional to a constant).

Page 15: Simultaneous Image Classification and Annotation

Empirical Results• LableMe dataset

– 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.”

– 200 256x256 training images per class.• UIUC dataset

– 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.”

– 1792 256x256 training images.• 240 codeword dictionary.• Annotations which appeared less than 3 times

were removed.

Page 16: Simultaneous Image Classification and Annotation

Empirical Results: Classification

• The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images.

• The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images

• The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona .

Page 17: Simultaneous Image Classification and Annotation

Emperical Results: Classification

• Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models

• Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.

Page 18: Simultaneous Image Classification and Annotation

Empirical Results: Annotation• The F-Measure is used as a score.• Results are given over all numbers of topics considered

above.

• LabelMe: • 38.2% (corr-LDA) • 38.7% (multi-class sLDA with annotations)

• UIUC-Sport:• 34.7% (corr-LDA) • 35.0% (multiclass sLDA with annotations).

Page 19: Simultaneous Image Classification and Annotation

Conclusion• Combining image annotation with classification provides state

of the art image classification performance.

• However, the addition of the classification framework provides only a small improvement to the annotation performance.

• The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework.

• Inference was done in a Variational EM framework.