c280, computer vision - peopletrevor/cs280notes/16context.pdf · c280, computer vision prof. trevor...
TRANSCRIPT
![Page 1: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/1.jpg)
C280 Computer VisionC280, Computer Vision
Prof. Trevor [email protected]
Lecture 16: Recognition in Context
![Page 2: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/2.jpg)
Last Lecture• Naïve‐Bayes Nearest Neighbor (Irani)
• ISM (Liebe)• ISM (Liebe)
• Constellation Models (Fergus)
• Transformed LDA Models (Sudderth)
• 3‐D view models (Saravese)
![Page 3: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/3.jpg)
This week• Two last topics in recognition:
– ContextContext
– Articulation
![Page 4: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/4.jpg)
Today: Three papers on computational models of context:
• A. Torralba, K. P. Murphy, and W. T. Freeman, "Contextual models for object detection using boosted random fields," in Advances in Neural Information Processing Systems 17 (NIPS), 2005.
• D. Hoiem, A. A. Efros, and M. Hebert, "Putting objects in perspective," in Computer Vision and P R i i 2006Pattern Recognition, 2006
• G. Heitz and D. Koller, "Learning spatial context: Using stuff to find things," in ECCV 2008, pp. 30‐43.
![Page 5: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/5.jpg)
Why is detection hard?
y
x
y
10 000 patches/object/image
1,000,000 images/dayPlus, we want to do this for ~ 1000 objects
10,000 patches/object/image
time
![Page 6: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/6.jpg)
Is local information enough?
Slide credit: A. Torralba
![Page 7: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/7.jpg)
With hundreds of categories
roadtablechairkeyboard tablecarkeyboard
road
If we have 1000 categories (detectors), and each detector produces 1 fa every 10 images, we will have 100 false alarms per image… pretty much garbage…
Slide credit: A. Torralba
![Page 8: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/8.jpg)
Is local information even enough?
Slide credit: A. Torralba
![Page 9: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/9.jpg)
Is local information even enough?
Information Contextual features
Local features
DistanceSlide credit: A. Torralba
![Page 10: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/10.jpg)
The system does not care about the scene, but we do…
We know there is a keyboard present in this scene even if we cannot see it clearly.,
We know there is no keyboard present in this scene
… even if there is one indeed.Slide credit: A. Torralba
![Page 11: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/11.jpg)
The multiple personalities of a blob
Slide credit: A. Torralba
![Page 12: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/12.jpg)
The multiple personalities of a blob
Slide credit: A. Torralba
![Page 13: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/13.jpg)
Slide credit: A. Torralba
![Page 14: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/14.jpg)
Slide credit: A. Torralba
![Page 15: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/15.jpg)
Slide credit: A. Torralba
![Page 16: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/16.jpg)
Look-Alikes by Joan Steiner
Slide credit: A. Torralba
![Page 17: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/17.jpg)
Look-Alikes by Joan Steiner
Slide credit: A. Torralba
![Page 18: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/18.jpg)
Look-Alikes by Joan Steiner
Slide credit: A. Torralba
![Page 19: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/19.jpg)
The context challenge
How far can you go withoutHow far can you go without using an object detector?
Slide credit: A. Torralba
![Page 20: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/20.jpg)
What are the hidden objects?
12
12
Slide credit: A. Torralba
![Page 21: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/21.jpg)
What are the hidden objects?
Chance ~ 1/30000 Slide credit: A. Torralba
![Page 22: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/22.jpg)
The importance of context
• Cognitive psychology– Palmer 1975 – Biederman 1981– …
• Computer vision– Noton and Stark (1971)– Hanson and Riseman (1978)– Barrow & Tenenbaum (1978) – Ohta, kanade, Skai (1978)– Haralick (1983)– Strat and Fischler (1991)( )– Bobick and Pinhanez (1995)– Campbell et al (1997)
Slide credit: A. Torralba
![Page 23: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/23.jpg)
Multiclass object detection andMulticlass object detection and context modeling
Antonio TorralbaAntonio Torralba
In collaboration with Kevin P. Murphy and William T. Freemanp y
![Page 24: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/24.jpg)
Object representations
Inside the object(intrinsic features)
PartsGlobal
Object size
PixelsPartsGlobal appearance
Pixels
A l & R th (02) M h dd P tl d (97) T k P tl d (91) Vid l N t Ull (03)Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)
Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.
![Page 25: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/25.jpg)
Object representationsInside the object(intrinsic features)
Outside the object(contextual features)
Object size
PartsGlobal appearance
Local contextGlobal context Pixels
Kruppa & Shiele, (03), Fink & Perona (03)A l & R th (02) M h dd P tl d (97) T k P tl d (91) Vid l N t Ull (03)
pp , ( ), ( )
Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03)
He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99)
Strat & Fischler (91), Murphy, Torralba & Freeman (03)
Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)
Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.
![Page 26: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/26.jpg)
Previous work on context
• Strat & Fischler (91) Context defined using hand-written rules about relationships between objectsg p j
![Page 27: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/27.jpg)
Previous work on context
• Fink & Perona (03)U t t f b ti f th bj t t iUse output of boosting from other objects at previous
iterations as input into boosting for this iteration
![Page 28: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/28.jpg)
Previous work on context
• Murphy, Torralba & Freeman (03)U l b l t t t di t bj t b t th i d liUse global context to predict objects but there is no modeling
of spatial relationships between objects.
S
E1 E2
S
Op1,c1 OpN,c1Op1,c2 OpN,c2
Class 1 Class 2 Keyboards
vp1,c1 vpN,c1. . . vp1,c2 vpN,c2. . .
c2maxVc1
maxV
X1 X2vg
![Page 29: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/29.jpg)
Previous work on context
• Carbonetto, de Freitas & Barnard (04)• Enforce spatial consistency between labels using MRF• Enforce spatial consistency between labels using MRF
![Page 30: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/30.jpg)
Graphical models for image labelinglabeling
Densely connected graphs with low informative connectionsNearest neighbor grid
Want to model long-range correlations between labels
![Page 31: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/31.jpg)
Previous work on context
• He, Zemel & Carreira-Perpinan (04)U l i bl i d l di l iUse latent variables to induce long distance correlations
between labels in a Conditional Random Field (CRF)
![Page 32: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/32.jpg)
Outline of this talk
• Use global image features (as well as local features) in boosting to help object ) g p jdetection
• Learn structure of dense CRF (with longLearn structure of dense CRF (with long range connections) using boosting, to exploit spatial correlationsexploit spatial correlations
![Page 33: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/33.jpg)
Image database
• ~2500 hand labeled images with t tisegmentations
• ~30 objects and stuff
• Indoor and outdoor
• Sets of images are separated by locations and camera (digital/webcam)
• No graduate students or low-income-t d t l l it d f l b listudent-class exploited for labeling.
![Page 34: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/34.jpg)
Which objects are important?
Average gpercentage ofpixels occupiedby each object.
![Page 35: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/35.jpg)
Object representation
• Discrete/bounded/rigidScreen, car, pedestrian, bottle, …
E t d d/ b d d/d f bl• Extended/unbounded/deformableBuilding, sky, road, shelves, desk, …
We will use region labeling as a representation.
![Page 36: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/36.jpg)
Learning local features(i t i i bj t f t )(intrinsic object features)
s21 s2
n…
s31 s3
n…
road
building
s11 s1
n…
1 n
car
VLnVL
1 Pixels
We maximize the probability of the true labels using Boosting.
![Page 37: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/37.jpg)
Object local features(Borenstein & Ullman, ECCV 02)
* *x
Convolve with oriented filter
Threshold Convolve withsegmentation
fragment
Normalized correlation with an object patch
Patches from5x5 to 30x30 pixels.
![Page 38: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/38.jpg)
Results with local features
![Page 39: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/39.jpg)
Results with local features
Screen
![Page 40: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/40.jpg)
Results with local features
CCar
![Page 41: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/41.jpg)
Global context: location primingH f ith t bj t d t t ?
Context features that represent the
How far can we go without object detectors?
s2 s2…s3
1 s3n
…p
scene instead of other objects.
The global features can provide:
s11 s1
n…
s21 s2
n…
• Object presence
• Location priming
S l i iVG
• Scale priming
![Page 42: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/42.jpg)
Object global featuresFirst we create a dictionary of scene features and object locations:
Featuremap
Associatedscreen location
First we create a dictionary of scene features and object locations:
*
...Only the vertical position of the object is well constrained by the global features
![Page 43: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/43.jpg)
Object global featuresHow to compute the global features
Downsample(10x10)
* < , >
(10x10)
Downsample(8x8, 16x16, 32x32)( )
![Page 44: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/44.jpg)
Car detection with global featuresFeatures selected by boosting:
Car
Boosting round
…round
![Page 45: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/45.jpg)
Combining global and local
s2 s2…s3
1 s3n
…
s11 s1
n…
s 1 s2n
VLnVL
1 VG
ROC for same total number of features (100 boosting rounds):
car building road screen keyboard mouse deskGlobal and localOnly local
![Page 46: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/46.jpg)
Clustering of objects with local and global feature sharingand global feature sharing
Clustering with global and local featuresClustering with local features Clustering with global and local featuresClustering with local features
Objects are similar if they share local features and they appear in the same contexts.
![Page 47: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/47.jpg)
Outline of this talk
• Use global image features (as well as local features) in boosting to help object ) g p jdetection
• Learn structure of dense CRF (with longLearn structure of dense CRF (with long range connections) using boosting, to exploit spatial correlationsexploit spatial correlations
![Page 48: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/48.jpg)
Adding correlations between objects
s2 s2…s3
1 s3n
…
s11 s1
n…
s 1 s2n
VLnVL
1 VG
We need to learn
• The structure of the graph
• The pairwise potentials
![Page 49: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/49.jpg)
Learning in CRFs• Parameters
– Lafferty, McCallum, Pereira (ICML 2001)• Find global optimum using gradient methods plus exact inference
(forwards-backwards) in a chain– Kumar & Herbert, NIPS 2003
U d lik lih d i 2D CRF• Use pseudo-likelihood in 2D CRF– Carbonetto, de Freitas & Barnard (04)
• Use approximate inference (loopy BP) and pseudo-likelihood on 2D MRFMRF
• Structure– He, Zemel & Carreira-Perpinan (CVPR 04)
• Use contrastive divergence• Use contrastive divergence– Torralba, Murphy, Freeman (NIPS 04)
• Use boosting
![Page 50: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/50.jpg)
Sequentially learning the structureIteration
Final outputFinal output
![Page 51: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/51.jpg)
Sequentially learning the structureAt each iteration of boosting
•We pick a weak learner applied to the image(local or global features)
•We pick a weak learner applied to a subset of the label-beliefs at the previous iteration. These subsets are chosen from a dictionary of labeled graph fragments from the training setof labeled graph fragments from the training set.
![Page 52: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/52.jpg)
Car detection
![Page 53: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/53.jpg)
Car detection
From intrinsic features
A car out of context is less of a carFrom contextual features
![Page 54: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/54.jpg)
Screen/keyboard/mouse
![Page 55: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/55.jpg)
CascadeViola & Jones (2001)
Set to zero the beliefs of nodes with low probability of containing theSet to zero the beliefs of nodes with low probability of containing the target.
Perform message passing only on undecided nodesPerform message passing only on undecided nodes
The detection of the screen reduces the search spacethe search space for the mouse detector.
![Page 56: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/56.jpg)
Cascade
![Page 57: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/57.jpg)
Cascade
Local
C t tContext
![Page 58: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/58.jpg)
![Page 59: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/59.jpg)
Putting Objects in PerspectivePutting Objects in Perspective
Derek HoiemDerek Hoiem
Alexei A. Efros
Martial Hebert
Carnegie Mellon University
Robotics Institute
![Page 60: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/60.jpg)
Understanding an ImageUnderstanding an Image
![Page 61: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/61.jpg)
Today: Local and IndependentToday: Local and Independent
![Page 62: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/62.jpg)
What the Detector SeesWhat the Detector Sees
![Page 63: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/63.jpg)
Local Object DetectionLocal Object DetectionTrue
Detection
False DetectionsDetections
T
MissedMissed
True Detections
Local Detector: [Dalal-Triggs 2005]
![Page 64: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/64.jpg)
Work in Context Work in Context
• Image understanding in the 70’sGuzman (SEE) 1968Hansen & Riseman (VISIONS) 1978Barrow & Tenenbaum 1978
Brooks (ACRONYM) 1979Marr 1982Ohta & Kanade 1973Barrow & Tenenbaum 1978
Yakimovsky & Feldman 1973Ohta & Kanade 1973
• Recent work in 2D contextKumar & Hebert 2005 He, Zemel, Cerreira-Perpiñán 2004
Torralba, Murphy, Freeman 2004Fink & Perona 2003
pCarbonetto, Freitas, Banard 2004 Winn & Shotton 2006
![Page 65: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/65.jpg)
Real Relationships are 3DReal Relationships are 3D
CloseNot CloseClose
![Page 66: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/66.jpg)
Recent Work in 3DRecent Work in 3D
[Han & Zu 2003][Han & Zu 2003]
[Oli & T lb 2001][Oliva & Torralba 2001]
[Torralba, Murphy & Freeman 2003] [Han & Zu 2005]
![Page 67: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/67.jpg)
Objects and ScenesObjects and Scenes
• Biederman’s Relations among Objects in a Well-Formed Scene (1981):
Hock, Romanski, Galie, & Williams 1978
Scene (1981):
– Support
– Size
– Position
– Interposition– Size – Interposition
– Likelihood of Appearance
![Page 68: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/68.jpg)
Contribution of this PaperContribution of this Paper
• Biederman’s Relations among Objects in a Well-Formed Scene (1981):
Hock, Romanski, Galie, & Williams 1978
Scene (1981):
– Support
– Size
– Position
– Interposition– Size – Interposition
– Likelihood of Appearance
![Page 69: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/69.jpg)
Object SupportObject Support
![Page 70: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/70.jpg)
Surface EstimationSurface EstimationImage Support Vertical Sky
V Left V Center V Right V Porous V SolidV-Left V-Center V-Right V-Porous V-Solid
ObjectS f ?
[Hoiem, Efros, Hebert ICCV 2005]
Software available online
Surface?
Support?
![Page 71: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/71.jpg)
Object Size in the ImageObject Size in the Image
Image World
![Page 72: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/72.jpg)
Object Size ↔ Camera Viewpoint Object Size ↔ Camera Viewpoint
Input Image Loose Viewpoint PriorInput Image Loose Viewpoint Prior
![Page 73: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/73.jpg)
Object Size ↔ Camera Viewpoint Object Size ↔ Camera Viewpoint
Input Image Loose Viewpoint PriorInput Image Loose Viewpoint Prior
![Page 74: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/74.jpg)
Object Size ↔ Camera Viewpoint Object Size ↔ Camera Viewpoint
Object Position/Sizes ViewpointObject Position/Sizes Viewpoint
![Page 75: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/75.jpg)
Object Size ↔ Camera Viewpoint Object Size ↔ Camera Viewpoint
Object Position/Sizes ViewpointObject Position/Sizes Viewpoint
![Page 76: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/76.jpg)
Object Size ↔ Camera Viewpoint Object Size ↔ Camera Viewpoint
Object Position/Sizes ViewpointObject Position/Sizes Viewpoint
![Page 77: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/77.jpg)
Object Size ↔ Camera Viewpoint Object Size ↔ Camera Viewpoint
Object Position/Sizes ViewpointObject Position/Sizes Viewpoint
![Page 78: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/78.jpg)
What does surface and viewpoint say about objects?What does surface and viewpoint say about objects?say about objects?say about objects?
Image P(surfaces) P(viewpoint)
P(object) P(object | surfaces) P(object | viewpoint)
![Page 79: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/79.jpg)
What does surface and viewpoint say about objects?What does surface and viewpoint say about objects?say about objects?say about objects?
Image P(surfaces) P(viewpoint)
P(object | surfaces, viewpoint)P(object)
![Page 80: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/80.jpg)
Scene Parts Are All InterconnectedScene Parts Are All Interconnected
Objects
3D SurfacesCamera Viewpoint
![Page 81: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/81.jpg)
Input to Our AlgorithmInput to Our AlgorithmSurface Estimates Viewpoint PriorObject Detection
Local Car DetectorLocal Car Detector
Surfaces: [Hoiem-Efros-Hebert 2005]
Local Ped Detector
Local Detector: [Dalal-Triggs 2005]
![Page 82: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/82.jpg)
Scene Parts Are All InterconnectedScene Parts Are All Interconnected
Objects
3D SurfacesViewpoint
![Page 83: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/83.jpg)
Our Approximate ModelOur Approximate Model
Objects
3D SurfacesViewpoint
![Page 84: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/84.jpg)
Inference over Tree Easy with BP Inference over Tree Easy with BP Viewpoint
θ
Local Object Evidence
Local Object EvidenceObjects
o1 on...
…Local Surface
EvidenceLocal Surface
EvidenceLocal Surfaces
s1 sn
…
![Page 85: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/85.jpg)
Viewpoint estimationViewpoint estimation
Viewpoint Prior Viewpoint Final
elih
ood
kelih
ood
HorizonHeight Height Horizon
LikeLik
![Page 86: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/86.jpg)
Object detectionObject detectionCar: TP / FP
Ped: TP / FPInitial (Local) Final (Global)
Car Detection
Ped: TP / FP
4 TP / 2 FP 4 TP / 1 FP
Car Detection
4 TP / 2 FP 4 TP / 1 FP
Ped Detection
3 TP / 2 FPLocal Detector: [Dalal-Triggs 2005]
4 TP / 0 FP
![Page 87: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/87.jpg)
Experiments on LabelMe DatasetExperiments on LabelMe Dataset
• Testing with LabelMe dataset: 422 images– 923 Cars at least 14 pixels tall
– 720 Peds at least 36 pixels tall720 Peds at least 36 pixels tall
![Page 88: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/88.jpg)
Each piece of evidence improves performanceEach piece of evidence improves performanceperformanceperformance
Car Detection Pedestrian Detection
Local Detector from [Murphy-Torralba-Freeman 2003]
![Page 89: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/89.jpg)
Can be used with any detector that outputs confidencesCan be used with any detector that outputs confidencesoutputs confidencesoutputs confidences
Car Detection Pedestrian Detection
Local Detector: [Dalal-Triggs 2005] (SVM-based)
![Page 90: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/90.jpg)
Accurate Horizon EstimationAccurate Horizon Estimation
[Murphy-Torralba-Freeman 2003]
[Dalal-Triggs 2005]Horizon Prior
Median Error: 8.5% 4.5% 3.0%
] gg ]
90%90% Bound:
![Page 91: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/91.jpg)
Qualitative ResultsQualitative Results
Car: TP / FP Ped: TP / FP
Initial: 2 TP / 3 FP Final: 7 TP / 4 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
![Page 92: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/92.jpg)
Qualitative ResultsQualitative Results
Car: TP / FP Ped: TP / FP
Initial: 1 TP / 14 FP Final: 3 TP / 5 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
![Page 93: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/93.jpg)
Qualitative ResultsQualitative Results
Car: TP / FP Ped: TP / FP
Initial: 1 TP / 23 FP Final: 0 TP / 10 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
![Page 94: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/94.jpg)
Qualitative ResultsQualitative Results
Car: TP / FP Ped: TP / FP
Initial: 0 TP / 6 FP Final: 4 TP / 3 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
![Page 95: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/95.jpg)
Summary & Future WorkSummary & Future Work
ters Ped
P d
me Ped
Car
R i i 3DReasoning in 3D:• Object to object
S
meters
• Scene label• Object segmentation
![Page 96: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/96.jpg)
ConclusionConclusion
• Image understanding is a 3D problemMust be solved jointly– Must be solved jointly
• This paper is a small stepMuch remains to be done– Much remains to be done
![Page 97: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/97.jpg)
![Page 98: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/98.jpg)
Learning Spatial Context:Learning Spatial Context:Using stuff to find things
Geremy HeitzD h K llDaphne Koller
Stanford University
October 13, 2008ECCV 2008ECCV 2008
![Page 99: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/99.jpg)
Things vs. StuffFrom: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996.g
Stuff (n): Material defined by aThing (n): An object with a
Co pute s o , 996
Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial
Thing (n): An object with a specific size and shape.
p pextent or shape.
![Page 100: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/100.jpg)
Finding Thingsg g
Context is key!Context is key!
![Page 101: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/101.jpg)
OutlineOutline
What is Context? What is Context? The Things and Stuff (TAS) model Results
![Page 102: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/102.jpg)
Satellite Detection Examplep
D(W) = 0.8
D(W) = 0.8
![Page 103: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/103.jpg)
Error AnalysisyTypically…
True Positives areIN CONTEXT
F l P i i
IN CONTEXT
False Positives areOUT OF CONTEXT
We need to look outside the bounding box!
![Page 104: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/104.jpg)
Types of Contextyp
Scene-Thing:gist car “likely”
k b d “ lik l ”g
St ff St ff
keyboard “unlikely”
Thing Thing:
[ Torralba et al., LNCS 2005 ]
[ Gould et al., [ Rabinovich et Stuff-Stuff: Thing-Thing:[ Gou d et a ,
IJCV 2008 ][ Rabinovich et al., ICCV 2007 ]
![Page 105: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/105.jpg)
Types of Context
Stuff-Thing:g Based on spatial
relationships
Intuition:
Trees = no cars
“Cars drive on roads”
“Cows graze on grass”Cows graze on grass
“Boats sail on water”Goal: UnsupervisedGoal: Unsupervised
![Page 106: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/106.jpg)
OutlineOutline
What is Context? What is Context? The Things and Stuff (TAS) model Results
![Page 107: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/107.jpg)
Thingsg Detection “candidates”
L d h h ld “ d ” Low detector threshold -> “over-detect” Each candidate has a detector score
![Page 108: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/108.jpg)
Thingsg
Candidate detections Candidate detections Image Window Wi + Score
Boolean R.V. Ti Boolean R.V. Ti Ti = 1: Candidate is a
positive detection
Thing model ImageWindow
Wi
1)( WTP i Ti
))(exp(1)(
WDi
![Page 109: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/109.jpg)
Stuff
Coherent image regions Coherent image regions Coarse “superpixels” Feature vector Fj in Rn
j
Cluster label Sj in {1…C}
Stuff model Naïve Bayes SSj jjjjj SFPSPFSP )(),(
Fj ssjj sSF ,~
![Page 110: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/110.jpg)
Relationships
Descriptive Relations Descriptive Relations “Near”, “Above”,
“In front of”, etc. S72 = Trees
T1
Choose set R = {r1…rK}
Rijk=1: Detection i and
S72 Trees
jregion j have relation k
Relationship model
Ti Sj R1 10 in=1Rijk
1,10,in
![Page 111: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/111.jpg)
The TAS ModelThe TAS Model
Image Wi: WindowWindow
Wi
i
Ti: Object Presence
RijkTi SjSj: Region Label
F R i F tN K
Fj
Fj: Region Features
Rijk: Relationship
N
J
K
ijk pJ
S Al AlSupervisedin Training Set
AlwaysObserved
AlwaysHidden
![Page 112: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/112.jpg)
Unrolled ModelUnrolled Model
R1 1 left = 1
T1
S1
SR2,1,above = 0
R1,1,left 1
1 S2
S3T2
R3,1,left = 1 S3
S4
2
T3R1,3,near = 0
4
S5
3
R3,3,in = 1Candidate ImageWindows
gRegions
![Page 113: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/113.jpg)
Learning the Parametersg
Assume we know R Sj is hidden
ImageWindow
Wi Everything else observed
Expectation-Maximization“Conte t al cl ste ing” RT S
i
“Contextual clustering” Parameters are readily
interpretable
RijkTi SjN K
interpretable FjJ
Supervisedin Training Set
AlwaysObserved
AlwaysHidden
![Page 114: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/114.jpg)
Learned Satellite ClustersLearned Satellite Clusters
![Page 115: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/115.jpg)
Which Relationships to Use?p
Rijk = spatial relationship Rijk spatial relationship between candidate i and region j
Rij1 = candidate in regionRij2 = candidate closer than 2 bounding boxes (BBs) to regionRij3 = candidate closer than 4 BBs to regionRij3 = candidate closer than 4 BBs to regionRij4 = candidate farther than 8 BBs from regionRij5 = candidate 2BBs left of regionRij6 did t 2BB i ht f i
How do we avoid overfitting?Rij6 = candidate 2BBs right of regionRij7 = candidate 2BBs below regionRij8 = candidate more than 2 and less than 4 BBs from region
o do e a o d o e g
…RijK = candidate near region boundary
![Page 116: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/116.jpg)
Learning the Relationshipsg p
Intuition “Detached” Rijk = inactive
relationshipSt t l EM it t
Rij1 Rij2 RijK
Structural EM iterates: Learn parameters Decide which edge to toggle T S Decide which edge to toggle
Evaluate with l(T|F,W,R) Requires inference
Ti Sj
Requires inference Better results than using
standard E[l(T,S,F,W,R)]
Fj
![Page 117: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/117.jpg)
InferenceInference
Goal: Goal:
Block Gibbs Sampling Easy to sample Ti’s given Sj’sEasy to sample Ti s given Sj s
and vice versa
![Page 118: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/118.jpg)
OutlineOutline
What is Context? What is Context? The Things and Stuff (TAS) model Results
![Page 119: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/119.jpg)
Base Detector - HOG
[ Dalal & Triggs, CVPR, 2006 ] HOG Detector:HOG Detector:
Feature Vector X SVM Classifier
![Page 120: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/120.jpg)
Results - SatelliteResults Satellite
Prior:Detector Only
Posterior:Detections
Posterior:Region Labels
![Page 121: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/121.jpg)
Results - SatelliteResults Satellite
0.8
1e 10% improvement in recall at 40 fppi
0 4
0.6
call
Rat
e ~10% improvement in recall at 40 fppi
0.2
0.4
Rec
Base DetectorTAS Model
40 80 120 1600
False Positives Per Image
Base Detector
False Positives Per Image
![Page 122: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/122.jpg)
PASCAL VOC ChallengePASCAL VOC Challenge
2005 Challenge 2005 Challenge 2232 images split into {train, val, test} Cars, Bikes, People, and Motorbikes, , p ,
2006 Challenge 5304 images plit into {train, test} 12 classes, we use Cows and Sheep
![Page 123: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/123.jpg)
Base Detector Error Analysisy
Cows
![Page 124: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/124.jpg)
Discovered Context - Bicyclesy
Bicycles
Cl ster #3Cluster #3
![Page 125: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/125.jpg)
TAS Results – Bicyclesy
Examplesp
Discover “true positives”
Remove “false positives”BIKE?
??
? ?
![Page 126: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/126.jpg)
Results – VOC 2005Results VOC 2005
![Page 127: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/127.jpg)
Results – VOC 2006Results VOC 2006
![Page 128: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/128.jpg)
ConclusionsConclusions
Detectors can benefit from context Detectors can benefit from context The TAS model captures
an important type of contextan important type of context We can improve any sliding window
detector using TASg The TAS model can be interpreted and
matches our intuitions We can learn which relationships to use
![Page 129: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/129.jpg)
Today: Three papers on computational models of context:
• A. Torralba, K. P. Murphy, and W. T. Freeman, "Contextual models for object detection using boosted random fields," in Advances in Neural Information Processing Systems 17 (NIPS), 2005.
• D. Hoiem, A. A. Efros, and M. Hebert, "Putting objects in perspective," in Computer Vision and P R i i 2006Pattern Recognition, 2006
• G. Heitz and D. Koller, "Learning spatial context: Using stuff to find things," in ECCV 2008, pp. 30‐43.
![Page 130: C280, Computer Vision - Peopletrevor/CS280Notes/16Context.pdf · C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 16: Recognition in Context](https://reader033.vdocuments.mx/reader033/viewer/2022051320/5a78f9707f8b9a5a148e84a2/html5/thumbnails/130.jpg)
Who needs context anyway?We can recognize objects even out of contextWe can recognize objects even out of context
BanksySlide credit: A. Torralba