scene grammars, factor graphs, and belief...
TRANSCRIPT
![Page 1: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/1.jpg)
Scene Grammars, Factor Graphs,and Belief Propagation
Pedro Felzenszwalb
Brown University
Joint work with Jeroen Chua
![Page 2: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/2.jpg)
Probabilistic Scene Grammars
General purpose framework for image understandingand machine perception.
• What are the objects in the scene, and how are they related?
• Objects have parts that are (recursively) objects.
• Relationships provide context for recognition.
• Relationships are captured by compositional rules.
![Page 3: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/3.jpg)
Example: Image Restoration
Clean image x Measured image y = x + n.
Ambiguous problem.
Impossible to restore a pixel by itself.
Context is of key importance.
Requires modeling relationships between pixels.
![Page 4: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/4.jpg)
Example: Object Recognition
![Page 5: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/5.jpg)
Example: Object Recognition
Context is of key importance.Requires modeling relationships between objects.
![Page 6: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/6.jpg)
Vision as Bayesian Inference
The goal is to recover information about the world from an image.
• Hidden variables X (the world/scene).
• Observations Y (the image).
• Consider the posterior distribution and Bayes Rule
p(X |Y ) =p(Y |X )p(X )
p(Y )
• The approach involves an imaging model p(Y |X )
• And a prior distribution p(X )
![Page 7: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/7.jpg)
Modeling scenes
p(X )
Scenes are complex high-dimensional structures.
The number of possible scenes is very large (infinite), yet sceneshave regularities.
• Faces have eyes.
• Boundaries of objects are piecewise smooth.
The set of regular scenes forms a “Language”.
Scenes can be generated using stochastic grammars.
![Page 8: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/8.jpg)
The Framework
• Representation: Probabilistic scene grammar.
• Transformation: Grammar model to factor graph.
• Inference: Loopy belief propagation.
• Learning: Maximum likelihood (EM).
![Page 9: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/9.jpg)
Scene Grammar
Scenes are structures generated by a stochastic grammar.
Scenes are composed of objects of several types.
Objects are composed of parts that are (recursively) objects.
Parts tend to be in certain relative locations.
The parts that make up an object can vary.
![Page 10: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/10.jpg)
Scene Grammar
• Finite set of symbols (object types) Σ.
• Finite pose space Ω for each symbol.
• Finite set of productions R.
A0 → A1, . . . ,AK Ai ∈ Σ
• Rule selection probabilities p(r).
• Conditional pose distributions associated with each rule.
pi (ωi |ω0)
• Self-rooting probabilities εA.
![Page 11: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/11.jpg)
Generating a scene
Set of building blocks, or bricks, (A, ω) ∈ Σ× Ω.
Brick (A, ω) is on if the scene has an object of type A in pose ω.
Stochastic process:
• Initially all bricks are off.
• Independently turn each brick (A, ω) on with probability εA.
• The first time a brick is turned on, expand it.
Expanding (A, ω):
• Select a rule A→ A1, . . . ,AK.• Select K poses (ω1,. . . ,ωK ) conditional on ω.
• Turn on bricks (A1, ω1), . . . , (AK , ωK ).
![Page 12: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/12.jpg)
A grammar for scenes with faces
• Symbols Σ = FACE,EYE,NOSE,MOUTH.
• Poses space Ω = (x , y , size).
• Rules:
(1) FACE→ EYE,EYE,NOSE,MOUTH(2) EYE→ (3) NOSE→ (4) MOUTH→
• Conditional pose distributions for (1) specify typical locationsof face parts within a face.
• Each symbol has a small self rooting probability.
![Page 13: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/13.jpg)
Random scenes with face model
![Page 14: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/14.jpg)
A grammar for images with curves
• Symbols Σ = C,P.
• Pose of C specifies position and orientation.
• Pose of P specifies position.
• Rules:
(1) C(x , y , θ)→ P(x , y)(2) C(x , y , θ)→ P(x , y),C(x + ∆xθ, y + ∆yθ, θ)(3) C(x , y , θ)→ C(x , y , θ + 1)(4) C(x , y , θ)→ C(x , y , θ − 1)(5) P→
![Page 15: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/15.jpg)
Random images
![Page 16: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/16.jpg)
Computation
Grammar defines a distribution over scenes.
A key problem is computing conditional probabilities.
What is the probability that thereis a nose near location (20, 32)given that there is an eye atlocation (15, 29)?
What is the probability that eachpixel in the clean image is on,given the noisy observations?
![Page 17: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/17.jpg)
Factor Graphs
A factor graph represents a factored distribution.
p(X1,X2,X3,X4) = f1(X1,X2)f2(X2,X3,X4)f3(X3,X4)
Variable nodes (circles)
Factor nodes (squares)
![Page 18: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/18.jpg)
Factor Graph Representation for Scenes
“Gadget” represents a brick
Binary random variables
• X brick on/off
• Ri rule selection
• Ci child selection
Factors
• f1 Leaky-or
• f2 Selection
• f3 Selection
• fD Data model
![Page 19: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/19.jpg)
Σ = A,B.Ω = 1, 2.A(x)→ B(y)B(x)→ .
ΨL
XΨS
RΨS C
C
A(1)
ΨL
XΨS
RΨS C
C
A(2)
ΨL
XΨS
RΨSB(1)
ΨL
XΨS
RΨSB(2)
![Page 20: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/20.jpg)
Loopy belief propagation
Inference by message passing.
µf→v (xv ) =∑
xN(f )\v
Ψ(xN(f ))∏
u∈N(f )
µu→f (xu)
In general message computation is exponential in degree of factors.
For our factors, message computation is linear in degree.
![Page 21: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/21.jpg)
Conditional inference with LBP
Σ = FACE,EYE,NOSE,MOUTHFACE→ EYE,EYE,NOSE,MOUTH
Marginal probabilities conditional on one eye.
Face Eye Nose Mouth
Marginal probabilities conditional on two eyes.
Face Eye Nose Mouth
![Page 22: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/22.jpg)
Conditional inference with LBP
• Evidence for an object provides context for other objects.
• LBP combines “bottom-up” and “top-down” influence.
• LBP captures chains of contextual evidence.
• LBP naturally combines multiple contextual cues.
Face Eye Nose Mouth
![Page 23: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/23.jpg)
Conditional inference with LBP
Contour completion with curve grammar.
![Page 24: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/24.jpg)
Face detection
p(X |Y ) ∝ p(Y |X )p(X )
p(Y |X ) defined by templates for each symbol.
Defines local evidence for each brick in the factor graph.
Belief Propagation combines “weak” local evidence from all bricks.
![Page 25: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/25.jpg)
Face detection resultsGround Truth HOG Filters Face Grammar
![Page 26: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/26.jpg)
Scenes with several faces
HOG filters
Grammar
![Page 27: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/27.jpg)
Curve detection
p(X ) defined by a grammar for curves.
p(Y |X ) defined by noisy observations at each pixel
X Y
![Page 28: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/28.jpg)
Curve detection dataset
Ground-truth: human-drawn object boundaries from BSDS.
![Page 29: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/29.jpg)
Curve detection results
![Page 30: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/30.jpg)
![Page 31: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/31.jpg)
![Page 32: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/32.jpg)
Summary
• Perceptual inference requires models of the world.
• Stochastic grammars can model complex scenes.
• Part-based representation.
• Scenes and objects have variable structure.
• Belief propagation provides a general computational engine.
• The same framework applies to many different tasks.
![Page 33: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/33.jpg)
Future work
• Data model p(Y |X ).
• Grammar learning.
• Scaling inference to larger grammars.
![Page 34: Scene Grammars, Factor Graphs, and Belief Propagationcs.brown.edu/people/pfelzens/talks/scene.pdf · Inference: Loopy belief propagation. Learning: Maximum likelihood (EM). Scene](https://reader031.vdocuments.mx/reader031/viewer/2022021902/5b932ad209d3f206218ccd4c/html5/thumbnails/34.jpg)
PERSON→ FACE,ARMS,LOWERFACE→ EYES,NOSE,MOUTHFACE→ HAT,EYES,NOSE,MOUTHEYES→ EYE,EYEEYES→ SUNGLASSESHAT→ BASEBALLHAT→ SOMBREROLOWER→ SHOE, SHOE,LEGSLEGS→ PANTSLEGS→ SKIRT