grammars in computer vision presented by: thomas kollar slides courtesy of song-chun zhu

Grammars in computer Grammars in computer visionvision

Presented by: Thomas Kollar

Slides courtesy of Song-Chun Zhu

PartsGlobal appearance

Local contextGlobal context

Object size

Inside the object(intrinsic features)

Outside the object(contextual features)

Pixels

Kruppa & Shiele, (03), Fink & Perona (03)

Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03)

He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99)

Strat & Fischler (91), Torralba (03), Murphy, Torralba & Freeman (03)

Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)

Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)

Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.

Context in computer Context in computer visionvision

Why grammars?Why grammars?

Guzman (SEE), 1968 Noton and Stark 1971 Hansen & Riseman (VISIONS),

1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Marr, 1982 Ohta & Kanade, 1978 Yakimovsky & Feldman, 1973

[Ohta & Kanade 1978]

Why grammars?Why grammars?

Which papers?Which papers?

F. Han and S.C. Zhu, Bottom-up/Top-down Image Parsing with Attribute Grammar, 2005.

Zijian Xu; A hierarchical compositional model for representation and sketching of high-resolution human images, PhD Thesis 2007.

Song-Chun Zhu and David Mumford; A stochastic grammar of images, 2007.

L. Lin, S. Peng, J. Porway, S.C. Zhu, and Y. Wang, An empirical study of object category recognition: sequential testing with generalized samples, 2007.

DatasetsDatasets

Large-scale image Large-scale image labelinglabeling

Our Goal:Our Goal:

Three projects using and-Three projects using and-or graphsor graphs

1. Modeling an environment with rectangles.

2. Creating sketches

CommonalitiesCommonalities

Use context sensitive grammars Called And-Or graphs in these papers

Provides top-down and bottom-up influence

Most are generative all the way to the pixel level

Configuration matters E.g. they don’t assume independence given

the parent

These can take the form of a MRF

ChallengesChallenges

Objects have large within-category variations

Scenes have variation


Describing people has variation

Grammar definitionGrammar definition

And-or graphsAnd-or graphs

Modeling with rectanglesModeling with rectangles

Modeling with Modeling with rectanglesrectangles

Six production rulesSix production rules

Two examplesTwo examples

Three phasesThree phases

1. Bottom-up detection Compute edge segments and a number of

vanishing points. These vanishing points are grouped into a line set and rectangle hypotheses are found using RANSAC, generating a number of rectangles from a bottom up proposal.

2. Initialize the terminal nodes greedily Pick the most promising hypotheses with

heaviest weight by increase in posterior probability.

3. Incorporate top-down influence Each step of the algorithm picks the most

promising proposal among the 5 candidate rules by increase in posterior probability.

When a new non-terminal node is accepted (1) insert and create a new proposal (2) reweight the proposals (3) pass attributes between the node and parent.

Probability ModelsProbability Models

)()()),(|(maxarg*freefreeG CpGpCGCIpG

• p(C_free) follows the primal sketch model.

• p(G) is the probability of the parse tree

• p(I | G) is the reconstruction likelihood


)( )(

))(|)(())(),(|)(())(|)(())(()(GA AchildB

ooo

N

AXBXpAnAlAXpAlAnpAlpGp

• p(l) is the probability of a rule

• p(n | l) is the probability of the number of components given the type of rule.

• p(X | l, n) is the probability of the geometry of A.

• p(X(B) | X(A)) ensures regularities between the geometries (e.g. that aligned rectangles have almost the same shape).

1)"")(|3)(( cubeAlAnp

qcubeAlp )"")((

e.g. each square should look reasonable

e.g. for the line rule, enforce that everything lines up


N

k yx

M

mmksk

ksk

mnskIhyxByxI

ZCIp

1 ),( 1

22

,

,)(,)),(),((

2

1exp

1)|(

• Primal sketch modelkskkkkkkt yxyxnyyxxByxI

k ,),( ),,(),,;,(),(

Inference: bottom-up Inference: bottom-up detection of rectanglesdetection of rectangles

• RANSAC is run to propose a number of rectangles using vanishing points

Inference: initialize Inference: initialize terminal nodesterminal nodes

• Input: candidate set of rectangles from previous phase

• Output: a set of non-terminal nodes representing rectangles

• While(not done):• re-compute weights• Greedily select the rectangle with the

highest weight• Create a new non-terminal node in the

grammar

Inference: initialize Inference: initialize terminal nodesterminal nodes

• Input: non-terminal rectangles from previous step

• Output: a parse graph

• While (not done):• re-compute weights• Greedily select the highest weight

candidate rule• Add rule to parse graph along with any

top-down predictions.

• Weights are computed similarly to before.

Example of Example of top-down/bottom-up top-down/bottom-up

inferenceinference

ResultsResults

ROC curveROC curve

Generating sketchesGenerating sketches

Additional semantics


Geometric deformationsclothes are very flexible

Photometric variabilities large variety of colors, shading and

texture

Topological configurations combinatorial number of clothes designs

Decomposing a sketchDecomposing a sketch

And-Or graphAnd-Or graph

“In a computing and recognition phase, we first activate some sub-templates in a bottom-up step. For example, we can detect the face and skin color to locate the coarse position of some components, which help to predict the positions of other components by context.”

Sketch sub-partsSketch sub-parts

Example grammarExample grammar

Sub-templatesSub-templates

Probability modelProbability model

Overview of the Overview of the algorithmalgorithm

Sketch resultsSketch results

ConclusionsConclusions

Grammar-based model was presented for generating sketches.

Markov random fields at lowest level.

Top-down/bottom-up inference performed.

grammars in computer vision presented by: thomas kollar slides courtesy of song-chun zhu

Documents

posterior probability

probability modelspc

number of rectangles

computer visionwhy grammars

ohta kanade

promising proposal

new proposal

songchun zhu