Download - Exploiting Ontologies for Automatic Image Annotation

Exploiting Ontologies for Automatic Image Annotation

M. Srikanth, J. Varner, M. Bowden, D. Moldovan

Language Computer Corporationwww.languagecomputer.com

Richardson, Texas

Motivation Automatic Image Annotation Problem Ontologies for

Defining Visual Vocabularies Hierarchical Models for image annotation

Related Work Experiments & Results Conclusion and Future Work

ContentsContents

Majority of efforts in Q/A focus on textual corpora and processing

Large amounts of information held within multimedia sources – images/audio/video

Extend the Power of Q/A into the realm of multimedia

Exploit commonality and union of text and multimedia information

Motivation: Multimedia Question Motivation: Multimedia Question AnsweringAnswering

Some ways in which multimedia can be used in Q/A Multimedia (video clip/image) as Answer Multimedia and Lexical combination providing enhanced

understanding to Answer questions

Caption: Ronaldo seals Brazil's place in the last eight with a shot through Geert de Vlieger's legs late on to eliminate Belgium

Question: What color jersey did Brazil wear in the World Cup?

Multimedia Question AnsweringMultimedia Question Answering

Feature extraction High- and Low-level features

Object recognition Auto Annotation of images

Object semantics extraction Locative/temporal/etc

Build Knowledge Representation from Image/Video

Merge with audio/text Knowledge Representation Lexical information from ASR and VOCR

Provide Multimedia Q/A based using Multimedia Ontologies

ApproachApproach

Feature extraction High- and Low-level features

Object recognition Auto Annotation of images

Object semantics extraction Locative/temporal/etc

Build Knowledge Representation from Image/Video

Merge with audio/text Knowledge Representation Lexical information from ASR and VOCR

Provide Multimedia Q/A based using Multimedia Ontologies

Automatic Image AnnotationAutomatic Image Annotation

Task of automatically assigning words to an image that describe the contents of the image

Most models exploit the correlation between images and words

Exploit the correlation between the annotation words themselves to1. Define visual vocabularies

2. Develop hierarchical models for automatic image annotation

Use ontological information about annotation words to improve image annotation

Models for translating visual representation of concept to textual representation (Duygulu et al., 2002)

Based on Brown model for Machine Translation (Brown et al., 1993)

Image Features translate to Annotation Words K-Means used to cluster image features to generate

blobs

Dependencies between blobs and words is not explicitly captured

Use ontology to drive the definition of blobs

Prior Work: Translation ModelsPrior Work: Translation Models

Hierarchical Aspect Cluster Model (T. Hofmann, 1998)

Induces an hierarchical structure from co-occurrence of image features

Topology is externally defined Depth of the induced hierarchy is user selected Levels define the generality of the concept

expressed in regions and words

The hierarchies defined in ontologies have well-defined semanticsImage feature hierarchy induced from a text ontology

Prior Work: HACM ModelPrior Work: HACM Model

Estimate P(w|I) to classify an Image I (represented by image features) into one of the classes (annotation word w)

Generative Models Flat classification: Learn one classifier per annotation word SVM Classifier (Cusano et al., 2004)

Discriminative Models Jeon and Manmatha (2004) showed improvements over

translation using Maximum Entropy Models Unigram (blob, word) and Bigram: (horizontal blob pairs,

word) feature

Explore hierarchical classification using ontology

Prior Work: Classification ApproachesPrior Work: Classification Approaches

Image Representation usingImage Representation usingVisual VocabularyVisual Vocabulary

Image Segmentatio

n

Feature Extraction

Image Representation

Image

Image Segmentation1. Image regions corresponding to objects in the image2. Grid-based image segmentation

Feature Extraction Extract image features from image regions

Color, Shape, Texture

Image Representation1. real-valued feature vectors2. Visual vocabulary derived based on clustering feature

vectors Cluster centers (Blobs) define the vocabulary

Visual vocabulary from OntologiesVisual vocabulary from Ontologies

Image regions from images are organized in the hierarchy based on the image annotation

Image attributes of children nodes are related parent node’s image attributes

Using Ontologies in Translation Models Using Ontologies in Translation Models for Automatic Image Annotationfor Automatic Image Annotation

1. Ontology-induced visual vocabulary– Annotation word hierarchy used in selecting the initial set of

blobs for K-means clustering

2. Ontology-weighed K-means clustering– Weight the cluster membership of image regions in the

estimation of cluster centers (blobs)

c

r

Rr

Ww

rfcrwtc

Wcn

cwncrwt

)(),(

||)(

1),(),(

*

n(w,c) – number of image regions in cluster c associated with word wn(c) – number of image regions in cluster cf(r) – feature vector for region r

Image Annotation by Hierarchical Image Annotation by Hierarchical ClassificationClassification

• Based on hierarchical approach to text classification (McCallum et al., 1998)– Statistical, back-off model induced by the hierarchy derived from

annotation word ontology

– Given an image I with blob sequence , the probability of word w is given by

– Assuming a Bernoulli model for annotations, the blob likelihood given a word is estimated as

},...,,{ 21 mbbb

||

1 1

1

),|()|(

),|()|(

)|(

),|()|(),|(

W

r

m

l rlr

m

l l

wbPwP

wbPwP

IP

wIPwPIwP

||

1

||

1

||

1

)|(),(

)|(),(1),|( V

s

T

i iis

T

i ii

JwPJbNV

JwPJbNwbP

V – Visual vocabularyT – Training set of annotated imagesW – Set of annotation words

Image Annotation using Hierarchical Image Annotation using Hierarchical Classification (contd.)Classification (contd.)

The IS-A hierarchy among annotation words is used to estimate blob-likelihood probability

))(),'|'(),'|'(

),'|'(),'|'(()'|'(

bPanimalbPfelinebP

catbPtigerbPftigerbP

mlemle

mlemle

wv

mle vbPwvwbP )|(),(),|(

tiger

cat

feline

animal

…

ROOT

cougar leopard lion lynx

• Feature weights learned using EM algorithm

Corel Data Set Annotated images using pre-processed data from

(Duygulu, et al., 2002) 4500 images annotated using 374 words 4000 for training; 500 for testing

Image Representation Image Segmentation using N-cuts (Duygulu et al.,

2002) 36 different image features represent each image

region

Ontology: WordNet Hierarchy with 714 unique concepts was induced from

374 annotation words

ExperimentsExperiments

Annotation systems predict P(w|I) A cut-off or threshold required to assign annotations Unnormalized: take top 5 words Normalized: take top m words, where m is #of

annotations for I

Metrics Number of words of positive recall Mean per-word Precision-Recall

All words in the dictionary Selected set of words

Retrieved: words retrieved using the method Common: words predicted by all annotation systems Union: all words predicted by at least one annotation system

Image Annotation EvaluationImage Annotation Evaluation

Features Description Precision Recall Predicted Positive Recall

KM-500 Baseline K-means clustering 0.2204 0.2412 28 27

WKM-500 Weighted K-means clustering 0.2042 0.2524 27 26

ONT-714Using 714 clusters with one cluster per word in the induced ontology

0.2634 0.2724 36 35

ONT-500Reducing ONT-714 to 500 clusters by combining “close clusters”

0.2482 0.2499 33 32

Results: Translation Models and Results: Translation Models and OntologiesOntologies

Precision/Recall numbers are average over “pooled” set of 42 words Observations

Using ontologies increase the number of words predicted with postive recall

Hierarchy based initial clusters attaches better semantics to clusters

Results for ontology-induced clusters is based on ‘One blob per concept’

Results: Classification Approaches and Results: Classification Approaches and OntologiesOntologies

Comparing Flat classification versus Hierarchical classification for image annotations

Features Precision Recall # Ret. #Pos. Recall

Flat + KMeans-500 0.1627 0.2766 152 86

Hier + KMeans-500 0.1805 0.3174 146 93

Precision/Recall numbers correspond to using the KM-500 visual vocabulary

Observations Improved Precision (10%) and Recall (14%) values Increase in number of annotations with positive recall Hierarchy derived from annotation ontology results in improved

performance

Results: Hierarchical Classification with Results: Hierarchical Classification with Ontology-induced Visual VocabulariesOntology-induced Visual Vocabularies

Hierarchical approach improves precision/recall values on different visual vocabularies

ONT-714 has improved positive recall numbers Ontologies defined on text annotations provide a good

framework for developing hierarchical models for image features

Measures KM-500 WKM-500 ONT-714 ONT-500

Baseline – Flat Classification Method

Precision 0.1627 0.1867 0.1647 0.1643

Recall 0.2766 0.2831 0.2724 0.2697

Predicted 152 153 150 141

Positive Recall 86 90 84 80

Hierarchical Classification Method

Precision 0.1805 0.1882 0.1723 0.1754

Recall 0.3174 0.3135 0.2926 0.2903

Predicted 146 140 150 137

Positive Recall 93 91 91 81

Results: Comparing Translation and Results: Comparing Translation and Classification ApproachesClassification Approaches

Measures KM-500 WKM-500 ONT-714 ONT-500

# Common Words 27 26 35 32

Translation Method

Precision 0.3270 0.3134 0.3040 0.3124

Recall 0.3720 0.4043 0.3244 0.3253

Flat Classification Method

Precision 0.3243 0.3157 0.2924 0.3000

Recall 0.5666 0.5649 0.5591 0.5632

Hierarchical Classification Method

Precision 0.3223 0.3104 0.3018 0.3068

Recall 0.5652 0.5362 0.5453 0.5605

Comparison based on common annotation words predicted by different models

Significant improvement in recall using classification approaches

Experimental Results: Ontology in translation model

19.5% increase in average precision 13% increase in average recall

Ontology in classification 10% increase in average precision 14% increase in average recall

Using word hierarchies improve annotation results when used as a source for selecting initial blobs, and as framework for hierarchical classification

Ontologies in Automatic Image Ontologies in Automatic Image AnnotationAnnotation

Proposed methods for using ontologies in automatic image annotation

Translation Models: Defining Visual vocabulary Hierarchical Classification Models: Provide the

hierarchy for models defined image features Explore the use of ontologies in other approaches to

automatic image annotation Discriminative models

Exploit the dependence between annotation words in automatic image annotation

Correlation between annotation words of an image can be exploited

Summary and Future WorkSummary and Future Work

Utilize hierarchical organization of concepts and language models on image blobs to develop multi-modal ontologies

Use multi-modal ontologies in Q/A

Summary and Future Work (Contd.)Summary and Future Work (Contd.)

Transportation WordNet hierarchy with Multimedia data

Multimedia Ontology: Example NodeMultimedia Ontology: Example Node

Thank You.

Download - Exploiting Ontologies for Automatic Image Annotation

Top Related