beyondnouns: exploitingprepositionsand ...fidler/slides/csc2523/arvie_week4_gupta.pdf ·...

21
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers by Abhinav Gupta & Larry S. Davis presented by Arvie Frydenlund

Upload: hadan

Post on 29-Nov-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Beyond nouns: Exploiting prepositions andcomparative adjectives for learning visual classifiers

by Abhinav Gupta & Larry S. Davispresented by Arvie Frydenlund

Page 2: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Paper information

I ECCV 2008I Slides at http://www.cs.cmu.edu/ ∼abhinavg/I http://www.cs.cmu.edu/%7Eabhinavg/eccv2008.ppt

Page 3: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Objectives of the paper

Task:I Auto-annotation of image regions to labels

Methods:I Two models learnedI Training model

I Learns classifiers for nouns and relationships at the same timeI Learns priors on possible relationships for pairs of nouns

I Inference model given the above classifiers and priorsIssues:

I Dataset is weakly labeledI Not all labels are used all the time in the dataset

Page 4: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Weakly labeled data

President Obama debates Mitt Romney, while the audience sitsin the background. (while the audience sits behind the debaters)

Page 5: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Co-occurrence AmbiguitiesOnly have images of cars that include a street

A man beside a car on the street in front of a fence.

Page 6: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Noun relationships

Car

Street

Street

Car

I On(Car, Street)I P(red labeling) > P(blue labeling)

Page 7: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Prepositions and comparative adjective

Most common prepositions:

I above, across, after, against, along, at, behind, below,beneath, beside, between, beyond, by, down, during, in, inside,into, near, off, on onto, out, outside, over

I since, till, after, before, from, past, to, around, though,thoughout

I for, except, about, like, of

Comparative adjective:

I larger, smaller, taller, heavier, faster

http://www.cs.cmu.edu/ ∼abhinavg/

Page 8: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Relationships Actually Used

Used 19 in totalI above, behind, beside, more textured, brighter, in, greener,

larger, left, near, far, from, ontopof, more blue, right, similar,smaller, taller, shorter

Page 9: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Images and regions

I Each image is pre-segmented and (weakly) annotated by a setof nouns and relations between the nouns

I Regions are represented by a feature vector based on:I Appearance (RGB, Intensity)I Shape (Convexity, Moments)

I Models for nouns are based on features of the regionsI Relationships models are based in differential features:

I Difference of average intensityI Difference of location

http://www.cs.cmu.edu/ ∼abhinavg/

Page 10: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Egg-Chicken

I Learning models for the nouns and relationships requiresassigning labels

I Assigning labels requires some model for nouns andrelationships

I Solution is to use EM:I E: compute noun annotation assignments to labels given old

parametersI M: compute new parameters given the the E-step assignments

I Classifiers are initialized by previous automated-annotationmethods i.e. Duygulu et al., Object recognition as machinetranslation, EECV (2002)

Page 11: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Generative training model

I CA and CR are classifiers(models) for the nounassignments andrelationships

I Ij and Ik are region featuresfor regions j and k . Ijk arethe differential features.

I ns and np are two nouns.I r is a relationship.I L(θ) = (CA,CR)

Fig. 2 from A. Gupta & L.S. Davis

Page 12: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Training

I Too expensive to evaluate L(θ) directlyI Use EM to estimate L(θ), with assignments as hidden values.I Assume predicates are independent given image and

assignmentI Obviously wrong, since most predicates preclude othersI Can’t be ‘on top of’ and ‘beside’

Page 13: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Training relationships modelled

I CA, noun model, is implemented as a nearest neighbour basedlikelihood model

I CR , relationship mode, is implemented as a decision stumpbased likelihood model

I Most relationships are modelled correctlyI A few were not

I In: ‘Not captured by colour, shape, and location’(?)I on-top-ofI taller due to poor segmentation algorithm

http://www.cs.cmu.edu/ ∼abhinavg/

Page 14: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Inference model

I Given trained CA and CR

from the above modelI Find

P(n1, n2, ...|I1, I2, ...,CA,CR)

I Each region represented by anoun node

I Edges between nodes areweighted by the likelihoodobtained by differentialfeatures

Fig. 3 from A. Gupta & L.S. Davis

Page 15: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Experimental setup

I Corel5K datasetI 850 training images, tagged with nouns and manually labeled

relationshipsI Vocabulary size 173 nouns, 19 relationshipsI Same segmentation and feature vectors as Duygulu et al.,

Object recognition as machine translation, EECV (2002)I Training model test set 150 images (from training set)I Inference model test set 100 images (given that those images

have the same vocabulary)

http://www.cs.cmu.edu/ ∼abhinavg/

Page 16: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Training model evaluationI Use two metrics:

I Range semantics: counts number of correctly labeled words,while treating each label with the same weight

I Frequency counts: counts number of correctly labeled regions,which weights more frequent words heigher

I Compared to simple IBM1 (MT model, 1993) and Duygulu etal., MT model

http://www.cs.cmu.edu/ ∼abhinavg/

Page 17: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Inference model evaluationI Annotating unseen imagesI Doesn’t use Corel annotations due to missing labelsI 24% and 17% reduction in missed labelsI 63% and 59% reduction false labels

http://www.cs.cmu.edu/ ∼abhinavg/

Page 18: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Inference model examples

Duygulu et al. is the top and the paper’s results are the bottom

Page 19: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Inference model Precision-Recall

Duygulu et al. is [1]

Page 20: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Novelties and limitations

Achievements:I Novel use of prepositions and comparative adjectives for

automatic annotationI Use previous annotation models for bootstrappingI Good results

Limitations:I Only uses two argument predicates, results in ‘greener’I Can’t do pink flower exampleI Assumes one to one relationship between nouns and image

segments

Page 21: Beyondnouns: Exploitingprepositionsand ...fidler/slides/CSC2523/Arvie_week4_gupta.pdf · translation,EECV(2002) Generativetrainingmodel I C A andC R areclassifiers (models)forthenoun

Questions?

I One of the motivations was the co-occurrence problem.Wouldn’t a simpler model with better training data solve thisproblem?

I Image caption generation to annotation stack?I Model simplification: assuming independence of predicates?I Scale with vocabulary and number of relationships used?

‘Bluer’ and ‘greener’ work for outdoor scenes