simultaneous image classification and annotation

Simultaneous Image Classification and Annotation

Chong Wang, David Blei, Li Fei-FeiComputer Science Department

Princeton University

Published in CVPR 2009

Presented by Eric Wang7-3-09

(Final Version)

Outline• Introduction

• Review of sLDA and Corr-LDA

• Model description

• Model Inference and Parameter Estimation

• Empirical Results

• Conclusion

Introduction• Images Classification refers to assigning a class label to each

image which globally describes the image.

• Image Annotation refers to assigning words which describe individual regions of the image.

• The images considered in this paper are both classified with a class label and annotated with free text.

• This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.

Review of sLDA• For each document

• In this model, the response variable y is a continuous random variable.

• are treated as unknown constants to be estimated, rather than as random variables.Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Review of sLDA• An application of sLDA considered by Blei et. al was

regressing a corpus of textual movie reviews to number of stars given.

Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Review of Corr-LDA• For each document

• Corr-LDA is a simple extension of LDA for images to annotate image regions.

Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.

Annotated sLDA Model

• This step is identical to LDA, the topic proportions are drawn once per document.

• In this paper, is not optimized.


• A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features).

• Regions are found by segmenting images using the N-cuts algorithm.

• parameterizes a particular multinomial distribution (topic) over the quantized codewords.


• The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework.

• The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label.

• The softmax function is well studied and is also known as “multinomial logistic regression”


• The annotations are assigned to specific regions in the same manner as in Corr-LDA.

• This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky.

• Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.

Inference of Latent Variables

These updates are identical to those used in Corr-LDA.

• Let parameterize a multinomial over the K topics

• Let parameterize a Dirichlet over topic distributions.

• Let parameterize a multinomial over image regions

• These updates are local to each document (thus the omission of d).

n

m


• This equation updates the posterior distribution over topics.

• Note that this update depends on both class label c and the annotation information .mw


Parameter Estimation

has no closed form solution and is optimized via conjugate gradient

Updates of codebook word f in codebook topic i (proportional to a constant).


Parameter Estimation

has no closed form solution and is optimized via conjugate gradient

Updates annotation word w in annotation topic i (proportional to a constant).

Empirical Results• LableMe dataset

– 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.”

– 200 256x256 training images per class.• UIUC dataset

– 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.”

– 1792 256x256 training images.• 240 codeword dictionary.• Annotations which appeared less than 3 times

were removed.

Empirical Results: Classification

• The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images.

• The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images

• The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona .

Emperical Results: Classification

• Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models

• Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.

Empirical Results: Annotation• The F-Measure is used as a score.• Results are given over all numbers of topics considered

above.

• LabelMe: • 38.2% (corr-LDA) • 38.7% (multi-class sLDA with annotations)

• UIUC-Sport:• 34.7% (corr-LDA) • 35.0% (multiclass sLDA with annotations).

Conclusion• Combining image annotation with classification provides state

of the art image classification performance.

• However, the addition of the classification framework provides only a small improvement to the annotation performance.

• The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework.

• Inference was done in a Variational EM framework.

simultaneous image classification and annotation

Documents

image annotation

modified slda framework

review of corr

individual regions

annotated slda modelthis

annotated slda modela

image regionsthese updates

supervised topic models