exploiting ontologies for automatic image annotation

of 25 /25
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation www.languagecomputer.com Richardson, Texas

Author: forest

Post on 21-Jan-2016




0 download

Embed Size (px)


M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation www.languagecomputer.com Richardson, Texas. Exploiting Ontologies for Automatic Image Annotation. Contents. Motivation Automatic Image Annotation Problem Ontologies for Defining Visual Vocabularies - PowerPoint PPT Presentation


  • Exploiting Ontologies for Automatic Image AnnotationM. Srikanth, J. Varner, M. Bowden, D. Moldovan

    Language Computer Corporationwww.languagecomputer.comRichardson, Texas

  • ContentsMotivationAutomatic Image Annotation ProblemOntologies forDefining Visual VocabulariesHierarchical Models for image annotationRelated WorkExperiments & ResultsConclusion and Future Work

  • Motivation: Multimedia Question AnsweringMajority of efforts in Q/A focus on textual corpora and processingLarge amounts of information held within multimedia sources images/audio/videoExtend the Power of Q/A into the realm of multimediaExploit commonality and union of text and multimedia information

  • Multimedia Question AnsweringSome ways in which multimedia can be used in Q/AMultimedia (video clip/image) as AnswerMultimedia and Lexical combination providing enhanced understanding to Answer questions

  • ApproachFeature extractionHigh- and Low-level featuresObject recognitionAuto Annotation of imagesObject semantics extractionLocative/temporal/etcBuild Knowledge Representation from Image/VideoMerge with audio/text Knowledge RepresentationLexical information from ASR and VOCRProvide Multimedia Q/A based using Multimedia OntologiesFeature extractionHigh- and Low-level featuresObject recognitionAuto Annotation of imagesObject semantics extractionLocative/temporal/etcBuild Knowledge Representation from Image/VideoMerge with audio/text Knowledge RepresentationLexical information from ASR and VOCRProvide Multimedia Q/A based using Multimedia Ontologies

  • Automatic Image AnnotationTask of automatically assigning words to an image that describe the contents of the imageMost models exploit the correlation between images and wordsExploit the correlation between the annotation words themselves toDefine visual vocabulariesDevelop hierarchical models for automatic image annotationUse ontological information about annotation words to improve image annotation

  • Prior Work: Translation ModelsModels for translating visual representation of concept to textual representation (Duygulu et al., 2002)Based on Brown model for Machine Translation (Brown et al., 1993)Image Features translate to Annotation WordsK-Means used to cluster image features to generate blobs

    Dependencies between blobs and words is not explicitly capturedUse ontology to drive the definition of blobs

  • Prior Work: HACM ModelHierarchical Aspect Cluster Model (T. Hofmann, 1998)Induces an hierarchical structure from co-occurrence of image featuresTopology is externally definedDepth of the induced hierarchy is user selectedLevels define the generality of the concept expressed in regions and wordsThe hierarchies defined in ontologies have well-defined semanticsImage feature hierarchy induced from a text ontology

  • Prior Work: Classification ApproachesEstimate P(w|I) to classify an Image I (represented by image features) into one of the classes (annotation word w)Generative ModelsFlat classification: Learn one classifier per annotation wordSVM Classifier (Cusano et al., 2004)Discriminative ModelsJeon and Manmatha (2004) showed improvements over translation using Maximum Entropy ModelsUnigram (blob, word) and Bigram: (horizontal blob pairs, word) featureExplore hierarchical classification using ontology

  • Image Representation usingVisual VocabularyImage SegmentationImage regions corresponding to objects in the imageGrid-based image segmentationFeature ExtractionExtract image features from image regionsColor, Shape, TextureImage Representationreal-valued feature vectorsVisual vocabulary derived based on clustering feature vectorsCluster centers (Blobs) define the vocabularyImage SegmentationFeature ExtractionImage RepresentationImage

  • Visual vocabulary from OntologiesImage regions from images are organized in the hierarchy based on the image annotationImage attributes of children nodes are related parent nodes image attributes

  • Using Ontologies in Translation Models for Automatic Image AnnotationOntology-induced visual vocabularyAnnotation word hierarchy used in selecting the initial set of blobs for K-means clusteringOntology-weighed K-means clusteringWeight the cluster membership of image regions in the estimation of cluster centers (blobs)n(w,c) number of image regions in cluster c associated with word wn(c) number of image regions in cluster cf(r) feature vector for region r

  • Image Annotation by Hierarchical ClassificationBased on hierarchical approach to text classification (McCallum et al., 1998)Statistical, back-off model induced by the hierarchy derived from annotation word ontologyGiven an image I with blob sequence , the probability of word w is given by

    Assuming a Bernoulli model for annotations, the blob likelihood given a word is estimated asV Visual vocabularyT Training set of annotated imagesW Set of annotation words

  • Image Annotation using Hierarchical Classification (contd.)The IS-A hierarchy among annotation words is used to estimate blob-likelihood probability tigercatfelineanimalROOTcougarleopardlionlynxFeature weights learned using EM algorithm

  • ExperimentsCorel Data SetAnnotated images using pre-processed data from (Duygulu, et al., 2002)4500 images annotated using 374 words4000 for training; 500 for testing

    Image RepresentationImage Segmentation using N-cuts (Duygulu et al., 2002)36 different image features represent each image region

    Ontology: WordNetHierarchy with 714 unique concepts was induced from 374 annotation words

  • Image Annotation EvaluationAnnotation systems predict P(w|I)A cut-off or threshold required to assign annotationsUnnormalized: take top 5 wordsNormalized: take top m words, where m is #of annotations for I

    MetricsNumber of words of positive recallMean per-word Precision-RecallAll words in the dictionarySelected set of words Retrieved: words retrieved using the methodCommon: words predicted by all annotation systemsUnion: all words predicted by at least one annotation system

  • Results: Translation Models and OntologiesPrecision/Recall numbers are average over pooled set of 42 words ObservationsUsing ontologies increase the number of words predicted with postive recallHierarchy based initial clusters attaches better semantics to clustersResults for ontology-induced clusters is based on One blob per concept

    FeaturesDescriptionPrecisionRecallPredictedPositive RecallKM-500Baseline K-means clustering0.22040.24122827WKM-500Weighted K-means clustering0.20420.25242726ONT-714Using 714 clusters with one cluster per word in the induced ontology0.26340.27243635ONT-500Reducing ONT-714 to 500 clusters by combining close clusters0.24820.24993332

  • Results: Classification Approaches and OntologiesComparing Flat classification versus Hierarchical classification for image annotationsPrecision/Recall numbers correspond to using the KM-500 visual vocabularyObservationsImproved Precision (10%) and Recall (14%) valuesIncrease in number of annotations with positive recallHierarchy derived from annotation ontology results in improved performance

    FeaturesPrecisionRecall# Ret.#Pos. RecallFlat + KMeans-5000.16270.276615286Hier + KMeans-5000.18050.317414693

  • Results: Hierarchical Classification with Ontology-induced Visual VocabulariesHierarchical approach improves precision/recall values on different visual vocabulariesONT-714 has improved positive recall numbersOntologies defined on text annotations provide a good framework for developing hierarchical models for image features

    MeasuresKM-500WKM-500ONT-714ONT-500Baseline Flat Classification MethodPrecision0.16270.18670.16470.1643Recall0.27660.28310.27240.2697Predicted152153150141Positive Recall86908480Hierarchical Classification MethodPrecision0.18050.18820.17230.1754Recall0.31740.31350.29260.2903Predicted146140150137Positive Recall93919181

  • Results: Comparing Translation and Classification Approaches

    Comparison based on common annotation words predicted by different modelsSignificant improvement in recall using classification approaches

    MeasuresKM-500WKM-500ONT-714ONT-500# Common Words27263532Translation MethodPrecision0.32700.31340.30400.3124Recall0.37200.40430.32440.3253Flat Classification MethodPrecision0.32430.31570.29240.3000Recall0.56660.56490.55910.5632Hierarchical Classification MethodPrecision0.32230.31040.30180.3068Recall0.56520.53620.54530.5605

  • Ontologies in Automatic Image AnnotationExperimental Results:Ontology in translation model19.5% increase in average precision13% increase in average recallOntology in classification10% increase in average precision14% increase in average recallUsing word hierarchies improve annotation results when used as a source for selecting initial blobs, and as framework for hierarchical classification

  • Summary and Future WorkProposed methods for using ontologies in automatic image annotationTranslation Models: Defining Visual vocabularyHierarchical Classification Models: Provide the hierarchy for models defined image featuresExplore the use of ontologies in other approaches to automatic image annotationDiscriminative models

    Exploit the dependence between annotation words in automatic image annotationCorrelation between annotation words of an image can be exploited

  • Summary and Future Work (Contd.)Utilize hierarchical organization of concepts and language models on image blobs to develop multi-modal ontologiesUse multi-modal ontologies in Q/A

  • Multimedia Ontology: Example NodeTransportation WordNet hierarchy with Multimedia data

  • Thank You.

    Annotation words are organized in an hierarchy; Hierarchy induced by IS-A relations in WordNet