learning semantics with less supervision. agenda beyond fixed keypoints beyond keypoints open...

Learning Semantics with Less Supervision

Agenda

• Beyond Fixed Keypoints• Beyond Keypoints• Open discussion

Part Discovery from Partial Correspondence

[Subhransu Maji and Gregory Shakhnarovich, CVPR 2013]

Keypoints in diverse categories

Where are the keypoints?Can you name them?

Does the name of a keypoint matter?

Maji and Shakhnarovich HCOMP’12

We can mark correspondences without naming parts

Annotation interface on MTurk

Example landmarks are provided:

Example annotations

Annotators mark 5 landmark pairs on average

Are the landmarks consistent across annotators?

Yes

Semantic part discovery

Given a window in the first image we can find the corresponding window in the second image

propagate correspondence in the “semantic graph”


Discover parts using breadth-first traversal

Iter 0 Iter 1 Iter 2

The semantic graph alone is not good enough

Trained using latent LDAscale, translation, membership

Graph only Graph + appearance


Graph only Graph + Appearance

Examples of learned parts

Part-based representation

other activations on the training setimage

Detecting church buildings: individual parts

better seeds

graph mining

Detecting church buildings: collection of parts

• Detection is challenging due to structural variability• Latent LDA parts + voting AP=39.9%, DPM AP=34.7%

Label Transfer

Ask users to label parts where it makes sense:

-> arch

-> tower

-> window

Transfer labels on test images:

Agenda

• Beyond Fixed Keypoints• Beyond Keypoints• Open Discussion

Unsupervised Discovery of Mid-Level Discriminative Patches

Sarubh Singh, Abhinav Gupta and Alexei Efros, ECCV12

Can we get nice parts without supervision?

• Idea 0: K-means clustering in HOG space

Still not good enough

• The SVM memorizes bad examples and still scores them highly

• However, the space of bad examples is much more diverse

• So we can avoid overfitting if we train on a training subset but look for patches on a validation subset

Why K-means on HOG fails?

• Chicken & Egg Problem– If we know that a set of patches are visually

similar we can easily learn a distance metric for them

– If we know the distance metric, we can easily find other members

Idea 1: Discriminative Clustering

• Start with K-Means• Train a discriminative classifier for the distance

function, using all other classes as negative examples

• Re-assign patches to clusters whose classifier gives highest score

• Repeat

Idea 2: Discriminative Clustering+

• Start with K-Means or kNN• Train a discriminative classifier for the distance

function, using Detection• Detect the patches and assign to top k clusters• Repeat

Can we get good parts without supervision?

• What makes a good part?– Must occur frequently in one class

(representative)– Must not occur frequently in all classes

(discriminative)

Discriminative Clustering+

Idea 3: Discriminative Clustering++

• Split the discovery dataset into two equal parts (training and validation)

• Train on the training subset• Run the trained classifier on the validation set

to collect examples• Exchange training and validation sets• Repeat

Discriminative Clustering++

Doublets: Discover second-order relationships

• Start with high-scoring patches• Find spatial correlations to other (weaker patches)• Rank the potential doublets on validation set

Doublets

AP on MIT Indoor-67 scene recognition dataset

Blocks that shout: Distinctive Parts for Scene Classification

Juneja, Vedaldi, Jawahar and Zisserman, CVPR13

bookstore

buffet

computer room

closet

Three steps

• Seeding (proposing initial parts)

• Expansion (learning part detectors)

• Selection (identifying good parts)

Step 1: Seeding

• Segment the image• Find proposal regions based on “objectness”• Compute HOG features for each

Step 2: Expansion

• Train Exemplar SVM for each seed region [Malisiewitz et al]

• Apply it on validation set to collect more examples

• Retrain and repeat

Step 3: Selection

• Good parts should occur frequently in small number of classes but infrequently in the rest

• Collect top 5 parts from each validation image, sort occurrences of each part by score and keep the top r

• Compute the entropy for each part over the class distribution. Retain lowest-entropy parts

• Filter out any parts too similar to others (based on cosine similarity of their SVM weights)

Features and learning

• Features: Explored Dense RootSIFT, BoW, LLS, Improved Fisher Vectors

• Non-linear SVM (sqrt kernel)

Results on MIT Indoor-67

Singh et al Juneja et al

Seeding K-means on HOG Exemplar SVM

Feature space HOG IFV

SVM Linear Non-linear

Selection Purity & discriminativeness(penalizes parts that perform well for multiple clusters)

Entropy rank(allows for parts that work for multiple clusters)

AP on MIT 67 49.4 61.1

Learning Collections of Parts for Object Recognition

[Endres, Shih, Jiaa and Hoiem, CVPR13]

Overview of the method

• Seeding: Random samples including full bounding box and sub-window boxes

• Expanding: Exemplar SVM, fast training (using LDA)• Selection:

• Greedy method, pick parts that require each training example to be explained by a part

• Appearance Consistency: Include parts that have high SVM score• Spatial Consistency: Prefer parts that come from the same location

within bounding box• Training and Detection:

• Boosting over Category Independent Object Proposals [Endres & Hoiem]

Results on PASCAL 2010 detection

Averages of patches on the top 15 detections on the validation set for a set of parts

Agenda

• Beyond Fixed Keypoints• Beyond Keypoints• Open Discussion

Gender Recognition on Labeled Faces in the Wild

Method Gender AP

Kumar et al, ICCV 2009 95.52

Frontal Face poselet 96.43

Much easier dataset – no occlusion, high resolution, centered frontal faces

[Zhang et al, arXiv:1311.5591]

Gender Recognition on Labeled Faces in the Wild

Method Gender AP

Kumar et al, ICCV 2009 95.52

Frontal Face poselet 96.43

Poselets + Deep Learning 99.54

Much easier dataset – no occlusion, high resolution, centered frontal faces

Male of female?

[Zhang et al, arXiv:1311.5591]

Poselets vs DPMs vs Discriminative Patches

DPMs Poselets Discriminative Patches

Approach Parametric Non-parametric Non-parametric

Speed Faster (fewer types) Slower Slower (many types)

Redundancy Little A lot (improves accuracy)

A lot

Spatial model Sophisticated Primitive (threshold) Primitive

Supervision requirements

Needs 2 keypoints Needs more keypoints (10+)

No supervision

Uses multi-scale signal?

Two scale levels Yes, multiple scales yes

Jointly trained Yes No No

Attached semantics

Primitive Sophisticated Medium

Supervision in parts

unsupervised stronglysupervised

ISMSIFT

POSELETSDPMs

DISCRIMINATIVE PATCHES

Questions for open discussion• What is the future for mid-level parts?• More supervision vs less supervision?• Should low-level parts be hard-coded or

jointly trained?• Parametric vs non-parametric approaches?• Parts with/without associated semantics

learning semantics with less supervision. agenda beyond fixed keypoints beyond keypoints open...

Documents

good parts slide

supervision slide

appearance slide

average slide

eccv12 slide

naming parts slide

classes discriminative

validation subset slide