learning spatial context: can stuff help us find things? geremy heitz daphne koller april 14, 2008...
Post on 15-Jan-2016
216 views
TRANSCRIPT
Learning Spatial Context:
Can stuff help us find things?
Geremy HeitzDaphne Koller
April 14, 2008DAGS
Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.
Thing (n): An object with a specific size and shape.
Outline
Sliding window object detection What is context? The Things and Stuff (TAS) model Results
Object Detection
Task: Find all the cars in this image. Return a “bounding
box” for each Evaluation:
Maximize true positives
Minimize false positives
Precision-Recall tradeoff
Sliding Window Detection Consider every bounding box
All shifts All scales Possibly all rotations
Each box gets a score: D(x,y,s,Θ)
Detections: Local peaks in D() Pros:
Covers the entire image Flexible to allow variety of
D()’s Cons:
Brute force – can be slow Only considers features in box
D = 1.5
D = -0.3
Features: Haar wavelets
Haar filters and integral imageViola and Jones, ICCV 2001
The average intensity in the block is computed with four sums independently of the block size.BOOSTING!
Features: Edge fragments
Weak detector = Match of edge chain(s) from training image to edgemap of test image
Opelt, Pinz, Zisserman, ECCV 2006
BOOSTING!
Histograms of oriented gradients
• Dalal & Trigs, 2006
• SIFT, D. Lowe, ICCV 1999
SVM!
Sliding Window Results
PASCALVisual Object Classes Challenge
Cows 2006
score(A,B) = |A∩B| / |AUB|True Pos: B s.t. score(A,B) > 0.5 for some AFalse Pos: B s.t. score(A,B) < 0.5 for all A False Neg: A s.t. score(A,B) < 0.5 for all B
B A
0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
1
My DetectorINRIA-Douze
Recall Rate
Pre
cisi
on
Recall = TP / (TP + FN)Precision = TP / (TP + FP)
Satellite Detection Example
Quantitative Evaluation
0 40 80 120 160
0.2
0.4
0.6
0.8
1
False Positives Per Image
Rec
all R
ate
Why does this suck?
True Positives in Context
False Positives in Context
False Positives out of Context
Context!
What is Context?
What is Context?
Scene-Thing:
Stuff-Stuff:
gist car “likely”
keyboard “unlikely”
Thing-Thing:
Torralba et al., 2005
Gouldet al., 2008
Rabinovich et al., 2007
What is Context?
Stuff-Thing: Based on
intuitive “relationships”
Trees = no cars
Houses = cars nearby
Ro
ad =
cars here
Things
Candidate detections Bounding Box +
Score Boolean R.V. Ti
Ti = 1: Candidate is a positive detection
Thing-only model
Ti
ImageWindow
Wi
))(exp(1
1)(
WDWTP i
Stuff
Coherent image regions Coarse “superpixels” Feature vector Fj in Rn
Cluster label Sj
Stuff-only model Naïve Bayes
Sj
Fj
jjjjj SFPSPFSP )(),(
ssjj sSF ,~
Relationships
Descriptive Relations “Near”, “Above”,
“In front of”, etc. Choose a set R Rij: Relation
between detection i and region j
Relationship model
S72 = Trees
S 4 = H
ouses
S10 =
Ro
ad
Rij
TiSj
The TAS Model
RijTi Sj
Fj
ImageWindow
Wi
Wi: Window
Ti: Object Presence
Sj: Region Label
Fj: Region Features
Rij: Relationship
N
J
Unrolled Model
T1
S1
S2
S3
S4
S5
T2
T3
R21 = “Above”
R31 = “Left”
R13 = “In”
R33 = “In”
R11 = “Left”
CandidateWindows
ImageRegions
Learning Everything observed except Sj’s Expectation-Maximization
Mostly discrete variables Like Mixture-of-Gaussians
An ode to directed models:
Oh directed probabilistic modelsYou are so beautiful and palatableBecause unlike your undirected friendsYour parameters are so very interpretable
- Unknown Russian Mathematician
(Translated by Geremy Heitz)
RijTi Sj
Fj
ImageWindow
Wi
N
J
Learned Satellite Clusters
Inference
Goal:
Gibbs Sampling Easy to sample Ti’s given Sj’s
and vice versa
Could do distributional particles
RijTi Sj
Fj
ImageWindow
Wi
N
J
Results - Satellite
Prior:Detector Only
Posterior:TAS ModelRegion Labels
Results - Satellite
0 40 80 120 160
0.2
0.4
0.6
0.8
1
False Positives Per Image
Rec
all R
ate
Base DetectorTAS Model
PASCAL VOC Challenge
2005 Challenge 2232 images split into {train, val, test} Cars, Bikes, People, and Motorbikes
2006 5304 images plit into {train, test} 12 classes, we use Cows and Sheep
Results reported for challenge with state-of-the-art approaches Caveat: They didn’t get to see the test set
before the challenge, but I did!
Results – PASCAL
Cows
Results – PASCAL
Bicycles
Cluster #3
Results – PASCAL
Good examples
Discover “true positives”
Remove “false positives”
Results – VOC 2005
0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
1
Recall Rate
Pre
cisi
on
TAS ModelBase DetectorsINRIA-Dalal
0.1 0.2 0.3 0.4 0.5 0.6
0.2
0.4
0.6
0.8
1
Recall Rate
Pre
cisi
on
Motorbike
Car
0.1 0.2 0.3 0.4
0.2
0.4
0.6
0.8
1
Recall Rate
Pre
cisi
on
Bicycle
0.1 0.2 0.3 0.4 0.5 0.6
0.2
0.4
0.6
0.8
1
Recall Rate
Pre
cisi
on
People
Results – VOC 2006
0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
1TAS ModelBase DetectorsINRIA-Douze
Recall Rate
Pre
cisi
on
0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
1
Recall Rate
Pre
cisi
on
Cow Sheep
Conclusions
Detectors can benefit from context The TAS model captures
an important type of context We can improve any sliding window
detector using TAS The TAS model can be interpreted and
matches our intuitions Geremy is smart
0 40 80 120 160
0.2
0.4
0.6
0.8
1
False Positives Per Image
Rec
all R
ate
Base DetectorTAS Model
Prior:Detector Only
Posterior:TAS Model
Region Labels
Detections in Context
Task: Identify all cars in the satellite image
Idea: The surrounding context adds info to the local window detector
+ =Houses
Road
Equations
))(exp(1
1)(
WDWTP i
jjjjj SFPSPFSP )(),(
ssjj sSF ,~