grammars in computer vision presented by: thomas kollar slides courtesy of song-chun zhu
TRANSCRIPT
![Page 1: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/1.jpg)
Grammars in computer Grammars in computer visionvision
Presented by: Thomas Kollar
Slides courtesy of Song-Chun Zhu
![Page 2: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/2.jpg)
PartsGlobal appearance
Local contextGlobal context
Object size
Inside the object(intrinsic features)
Outside the object(contextual features)
Pixels
Kruppa & Shiele, (03), Fink & Perona (03)
Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03)
He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99)
Strat & Fischler (91), Torralba (03), Murphy, Torralba & Freeman (03)
Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)
Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)
Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.
Context in computer Context in computer visionvision
![Page 3: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/3.jpg)
Why grammars?Why grammars?
Guzman (SEE), 1968 Noton and Stark 1971 Hansen & Riseman (VISIONS),
1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Marr, 1982 Ohta & Kanade, 1978 Yakimovsky & Feldman, 1973
[Ohta & Kanade 1978]
![Page 4: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/4.jpg)
Why grammars?Why grammars?
![Page 5: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/5.jpg)
Why grammars?Why grammars?
![Page 6: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/6.jpg)
Which papers?Which papers?
F. Han and S.C. Zhu, Bottom-up/Top-down Image Parsing with Attribute Grammar, 2005.
Zijian Xu; A hierarchical compositional model for representation and sketching of high-resolution human images, PhD Thesis 2007.
Song-Chun Zhu and David Mumford; A stochastic grammar of images, 2007.
L. Lin, S. Peng, J. Porway, S.C. Zhu, and Y. Wang, An empirical study of object category recognition: sequential testing with generalized samples, 2007.
![Page 7: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/7.jpg)
DatasetsDatasets
![Page 8: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/8.jpg)
Large-scale image Large-scale image labelinglabeling
![Page 9: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/9.jpg)
Our Goal:Our Goal:
![Page 10: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/10.jpg)
Three projects using and-Three projects using and-or graphsor graphs
1. Modeling an environment with rectangles.
2. Creating sketches
![Page 11: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/11.jpg)
CommonalitiesCommonalities
Use context sensitive grammars Called And-Or graphs in these papers
Provides top-down and bottom-up influence
Most are generative all the way to the pixel level
Configuration matters E.g. they don’t assume independence given
the parent
These can take the form of a MRF
![Page 12: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/12.jpg)
ChallengesChallenges
Objects have large within-category variations
Scenes have variation
![Page 13: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/13.jpg)
ChallengesChallenges
Describing people has variation
![Page 14: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/14.jpg)
Grammar definitionGrammar definition
![Page 15: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/15.jpg)
And-or graphsAnd-or graphs
![Page 16: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/16.jpg)
Modeling with rectanglesModeling with rectangles
![Page 17: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/17.jpg)
Modeling with Modeling with rectanglesrectangles
![Page 18: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/18.jpg)
Six production rulesSix production rules
![Page 19: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/19.jpg)
Two examplesTwo examples
![Page 20: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/20.jpg)
Three phasesThree phases
1. Bottom-up detection Compute edge segments and a number of
vanishing points. These vanishing points are grouped into a line set and rectangle hypotheses are found using RANSAC, generating a number of rectangles from a bottom up proposal.
2. Initialize the terminal nodes greedily Pick the most promising hypotheses with
heaviest weight by increase in posterior probability.
3. Incorporate top-down influence Each step of the algorithm picks the most
promising proposal among the 5 candidate rules by increase in posterior probability.
When a new non-terminal node is accepted (1) insert and create a new proposal (2) reweight the proposals (3) pass attributes between the node and parent.
![Page 21: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/21.jpg)
Probability ModelsProbability Models
)()()),(|(maxarg*freefreeG CpGpCGCIpG
• p(C_free) follows the primal sketch model.
• p(G) is the probability of the parse tree
• p(I | G) is the reconstruction likelihood
![Page 22: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/22.jpg)
Probability ModelsProbability Models
)( )(
))(|)(())(),(|)(())(|)(())(()(GA AchildB
ooo
N
AXBXpAnAlAXpAlAnpAlpGp
• p(l) is the probability of a rule
• p(n | l) is the probability of the number of components given the type of rule.
• p(X | l, n) is the probability of the geometry of A.
• p(X(B) | X(A)) ensures regularities between the geometries (e.g. that aligned rectangles have almost the same shape).
1)"")(|3)(( cubeAlAnp
qcubeAlp )"")((
e.g. each square should look reasonable
e.g. for the line rule, enforce that everything lines up
![Page 23: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/23.jpg)
Probability ModelsProbability Models
N
k yx
M
mmksk
ksk
mnskIhyxByxI
ZCIp
1 ),( 1
22
,
,)(,)),(),((
2
1exp
1)|(
• Primal sketch modelkskkkkkkt yxyxnyyxxByxI
k ,),( ),,(),,;,(),(
![Page 24: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/24.jpg)
Inference: bottom-up Inference: bottom-up detection of rectanglesdetection of rectangles
• RANSAC is run to propose a number of rectangles using vanishing points
![Page 25: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/25.jpg)
Inference: initialize Inference: initialize terminal nodesterminal nodes
• Input: candidate set of rectangles from previous phase
• Output: a set of non-terminal nodes representing rectangles
• While(not done):• re-compute weights• Greedily select the rectangle with the
highest weight• Create a new non-terminal node in the
grammar
![Page 26: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/26.jpg)
Inference: initialize Inference: initialize terminal nodesterminal nodes
• Input: non-terminal rectangles from previous step
• Output: a parse graph
• While (not done):• re-compute weights• Greedily select the highest weight
candidate rule• Add rule to parse graph along with any
top-down predictions.
• Weights are computed similarly to before.
![Page 27: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/27.jpg)
Example of Example of top-down/bottom-up top-down/bottom-up
inferenceinference
![Page 28: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/28.jpg)
ResultsResults
![Page 29: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/29.jpg)
ResultsResults
![Page 30: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/30.jpg)
ResultsResults
![Page 31: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/31.jpg)
ResultsResults
![Page 32: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/32.jpg)
ROC curveROC curve
![Page 33: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/33.jpg)
Generating sketchesGenerating sketches
Additional semantics
![Page 34: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/34.jpg)
ChallengesChallenges
Geometric deformationsclothes are very flexible
Photometric variabilities large variety of colors, shading and
texture
Topological configurations combinatorial number of clothes designs
![Page 35: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/35.jpg)
Decomposing a sketchDecomposing a sketch
![Page 36: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/36.jpg)
And-Or graphAnd-Or graph
“In a computing and recognition phase, we first activate some sub-templates in a bottom-up step. For example, we can detect the face and skin color to locate the coarse position of some components, which help to predict the positions of other components by context.”
![Page 37: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/37.jpg)
Sketch sub-partsSketch sub-parts
![Page 38: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/38.jpg)
Example grammarExample grammar
![Page 39: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/39.jpg)
Sub-templatesSub-templates
![Page 40: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/40.jpg)
Probability modelProbability model
![Page 41: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/41.jpg)
Overview of the Overview of the algorithmalgorithm
![Page 42: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/42.jpg)
Sketch resultsSketch results
![Page 43: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/43.jpg)
Sketch resultsSketch results
![Page 44: Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu](https://reader038.vdocuments.mx/reader038/viewer/2022102809/5697bfb81a28abf838c9f69e/html5/thumbnails/44.jpg)
ConclusionsConclusions
Grammar-based model was presented for generating sketches.
Markov random fields at lowest level.
Top-down/bottom-up inference performed.