matching words and pictures - micc.unifi.it · • assuming an unknown one-one correspondence...

22
Matching Words and Pictures H. Dunlop

Upload: ngohanh

Post on 07-Dec-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Matching Words and Pictures

H. Dunlop

H. Dunlop

• Improvements in translation from visual to semantic representations lead to improvements in image access

• Numerous applications:−Auto-annotation−Region naming (aka object recognition)−Browsing−Searching−Auto-illustration

From visual to semantic representations

Multimedia Translation

• Data:

Words are associated with images, but correspondences are unknown

sun sea skysun sea sky

Statistical Machine Translation

• Assuming an unknown one-one correspondence between words, come up with a joint probability distribution linking words in the two languages

• Missing data problem: solution is Expectation Maximization (EM)

Given the translation probabilities, estimate the

correspondences

Given the correspondences, estimate the translation probabilities

Overview

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabular by

Pinar Duygulu, Kobus Barnard, Nando de Freitas, David Forsyth

Input Representation

• Segment with Normalized Cuts:

• Only use regions larger than a threshold (typically 5-10 per image)

• Form vector representation of each region

• Cluster regions with k-means to form blob tokens

sun sky waves sea

word tokens

Input Representation

• Represent each region with a feature vector

– Size: portion of the image covered by the region

– Position: coordinates of center of mass

– Color: avg. and std. dev. of (R,G,B), (L,a,b) and (r=R/(R+G+B),g=G/(R+G+B))

– Texture: avg. and variance of 16 filter responses

– Shape: area / perimeter2, moment of inertia, region area / area of convex hull

Assignments

• Each word is predicted with some probability by each blob

Expectation Maximization

• Select word with highest probability to assign to each blob

N

n

M

j

L

i

ninjnj

n n

bwtiapbwp

1 1 1

)|()()|(

probability that blob bni

translates to word wnj

probability of obtaining word wnj given instance of blob bni

# of images

# of words

# of blobs

Expectation Maximization

• Initialize to blob-word co-occurrences:

• Iterate:

Given the translation probabilities, estimate the

correspondences

Given the correspondences, estimate the translation

probabilities

Word Prediction

• On a new image:

– Segment

– For each region:

• Extract features

• Find the corresponding blob token using nearest neighbor

• Use the word posterior probabilities to predict words

Refusing to Predict

• Require: p(word|blob) > threshold

– ie. Assign a null word to any blob whose best predicted word lies below the threshold

• Prunes vocabulary, so fit new lexicon

Indistinguishable Words

• Visually indistinguishable:

– cat and tiger, train and locomotive

• Indistinguishable with our features:

– eagle and jet

• Entangled correspondence:

– polar – bear

– mare/foals – horse

• Solution: cluster similar words

– Obtain similarity matrix

– Compare words with symmetrised KL divergence

– Apply N-Cuts on matrix to get clusters

– Replace word with its cluster label

Experiments

• Train with 4500 Corel images

– 4-5 words for each image

– 371 words in vocabulary

– 5-10 regions per image

– 500 blobs

• Test on 500 images

Auto-Annotation

• Determine most likely word for each blob

• If probability of word is greater than some threshold, use in annotation

Measuring Performance

• Do we predict the right words?

Region Naming / Correspondence

Successful Results

Successful Results

Unsuccessful Results

Merging Regions

Statistical solutions

pLSALDA