learning semantics with less supervision. agenda beyond fixed keypoints beyond keypoints open...
TRANSCRIPT
Learning Semantics with Less Supervision
Agenda
• Beyond Fixed Keypoints• Beyond Keypoints• Open discussion
Part Discovery from Partial Correspondence
[Subhransu Maji and Gregory Shakhnarovich, CVPR 2013]
Keypoints in diverse categories
Where are the keypoints?Can you name them?
Does the name of a keypoint matter?
Maji and Shakhnarovich HCOMP’12
We can mark correspondences without naming parts
Annotation interface on MTurk
Example landmarks are provided:
Example annotations
Annotators mark 5 landmark pairs on average
Are the landmarks consistent across annotators?
Yes
Semantic part discovery
Given a window in the first image we can find the corresponding window in the second image
propagate correspondence in the “semantic graph”
Semantic part discovery
Discover parts using breadth-first traversal
Iter 0 Iter 1 Iter 2
The semantic graph alone is not good enough
Trained using latent LDAscale, translation, membership
Graph only Graph + appearance
Semantic part discovery
Graph only Graph + Appearance
Examples of learned parts
Part-based representation
other activations on the training setimage
Part-based representation
other activations on the training setimage
Detecting church buildings: individual parts
better seeds
graph mining
Detecting church buildings: collection of parts
• Detection is challenging due to structural variability• Latent LDA parts + voting AP=39.9%, DPM AP=34.7%
Label Transfer
Ask users to label parts where it makes sense:
-> arch
-> tower
-> window
Transfer labels on test images:
Agenda
• Beyond Fixed Keypoints• Beyond Keypoints• Open Discussion
Unsupervised Discovery of Mid-Level Discriminative Patches
Sarubh Singh, Abhinav Gupta and Alexei Efros, ECCV12
Can we get nice parts without supervision?
• Idea 0: K-means clustering in HOG space
Still not good enough
• The SVM memorizes bad examples and still scores them highly
• However, the space of bad examples is much more diverse
• So we can avoid overfitting if we train on a training subset but look for patches on a validation subset
Why K-means on HOG fails?
• Chicken & Egg Problem– If we know that a set of patches are visually
similar we can easily learn a distance metric for them
– If we know the distance metric, we can easily find other members
Idea 1: Discriminative Clustering
• Start with K-Means• Train a discriminative classifier for the distance
function, using all other classes as negative examples
• Re-assign patches to clusters whose classifier gives highest score
• Repeat
Idea 2: Discriminative Clustering+
• Start with K-Means or kNN• Train a discriminative classifier for the distance
function, using Detection• Detect the patches and assign to top k clusters• Repeat
Can we get good parts without supervision?
• What makes a good part?– Must occur frequently in one class
(representative)– Must not occur frequently in all classes
(discriminative)
Discriminative Clustering+
Discriminative Clustering+
Idea 3: Discriminative Clustering++
• Split the discovery dataset into two equal parts (training and validation)
• Train on the training subset• Run the trained classifier on the validation set
to collect examples• Exchange training and validation sets• Repeat
Discriminative Clustering++
Doublets: Discover second-order relationships
• Start with high-scoring patches• Find spatial correlations to other (weaker patches)• Rank the potential doublets on validation set
Doublets
AP on MIT Indoor-67 scene recognition dataset
Blocks that shout: Distinctive Parts for Scene Classification
Juneja, Vedaldi, Jawahar and Zisserman, CVPR13
bookstore
buffet
computer room
closet
Three steps
• Seeding (proposing initial parts)
• Expansion (learning part detectors)
• Selection (identifying good parts)
Step 1: Seeding
• Segment the image• Find proposal regions based on “objectness”• Compute HOG features for each
Step 2: Expansion
• Train Exemplar SVM for each seed region [Malisiewitz et al]
• Apply it on validation set to collect more examples
• Retrain and repeat
Step 3: Selection
• Good parts should occur frequently in small number of classes but infrequently in the rest
• Collect top 5 parts from each validation image, sort occurrences of each part by score and keep the top r
• Compute the entropy for each part over the class distribution. Retain lowest-entropy parts
• Filter out any parts too similar to others (based on cosine similarity of their SVM weights)
Features and learning
• Features: Explored Dense RootSIFT, BoW, LLS, Improved Fisher Vectors
• Non-linear SVM (sqrt kernel)
Results on MIT Indoor-67
Singh et al Juneja et al
Seeding K-means on HOG Exemplar SVM
Feature space HOG IFV
SVM Linear Non-linear
Selection Purity & discriminativeness(penalizes parts that perform well for multiple clusters)
Entropy rank(allows for parts that work for multiple clusters)
AP on MIT 67 49.4 61.1
Learning Collections of Parts for Object Recognition
[Endres, Shih, Jiaa and Hoiem, CVPR13]
Overview of the method
• Seeding: Random samples including full bounding box and sub-window boxes
• Expanding: Exemplar SVM, fast training (using LDA)• Selection:
• Greedy method, pick parts that require each training example to be explained by a part
• Appearance Consistency: Include parts that have high SVM score• Spatial Consistency: Prefer parts that come from the same location
within bounding box• Training and Detection:
• Boosting over Category Independent Object Proposals [Endres & Hoiem]
Results on PASCAL 2010 detection
Averages of patches on the top 15 detections on the validation set for a set of parts
Agenda
• Beyond Fixed Keypoints• Beyond Keypoints• Open Discussion
Gender Recognition on Labeled Faces in the Wild
Method Gender AP
Kumar et al, ICCV 2009 95.52
Frontal Face poselet 96.43
Much easier dataset – no occlusion, high resolution, centered frontal faces
[Zhang et al, arXiv:1311.5591]
Gender Recognition on Labeled Faces in the Wild
Method Gender AP
Kumar et al, ICCV 2009 95.52
Frontal Face poselet 96.43
Poselets + Deep Learning 99.54
Much easier dataset – no occlusion, high resolution, centered frontal faces
Male of female?
[Zhang et al, arXiv:1311.5591]
Poselets vs DPMs vs Discriminative Patches
DPMs Poselets Discriminative Patches
Approach Parametric Non-parametric Non-parametric
Speed Faster (fewer types) Slower Slower (many types)
Redundancy Little A lot (improves accuracy)
A lot
Spatial model Sophisticated Primitive (threshold) Primitive
Supervision requirements
Needs 2 keypoints Needs more keypoints (10+)
No supervision
Uses multi-scale signal?
Two scale levels Yes, multiple scales yes
Jointly trained Yes No No
Attached semantics
Primitive Sophisticated Medium
Supervision in parts
unsupervised stronglysupervised
ISMSIFT
POSELETSDPMs
DISCRIMINATIVE PATCHES
Questions for open discussion• What is the future for mid-level parts?• More supervision vs less supervision?• Should low-level parts be hard-coded or
jointly trained?• Parametric vs non-parametric approaches?• Parts with/without associated semantics