1 lecture 15 bayesian networks in computer vision gary bradski sebastian thrun *
Post on 22-Dec-2015
226 views
TRANSCRIPT
1
Lecture 15Lecture 15Bayesian Networks in Computer Bayesian Networks in Computer
VisionVisionGary BradskiGary Bradski
Sebastian ThrunSebastian Thrun
http://robots.stanford.edu/cs223b/index.html
*
2
What is a Bayesian Network?
(random) variables A conditional
probability distribution quantifies the effects of the parents on node.
The graph is directed and acyclic.
It’s a Factored Joint Distribution and/or Causal Diagram
P(F|C) P(R|C,A)
P(W)
A joint distribution, here p(W,C,A,R,R), is everything we can know about the problem,but it grows exponentially, here 25-1=31. Factoring the distribution in a Bayesnetdecreases the number of parameters, here from 31 to 11 (note probabilities sum to onewhich decreases the number of parameters to be specified).
P(A|W)P(C|W)
causal links
dependencies
3
Causality and Bayesian Nets
Mains
Transf.
Diode
Diode
Capac.
Ammeter
Battery
Observed
Un-Observed
One can also think of Bayesian Networks as a “Circuit Diagram” of Probability Models
• The Links indicate causal effect, not direction of information flow.• Just as we can predict effects of changes on the circuit diagram, we can predict consequences of “operating” on our probability model diagram.
4
Inference
• Once we have a model, we need to make it consistent by “diffusing” the distributions around until they are all consistent with one another.
• Central algorithm for this:
Belief Propagation
5
Specifically:
9
Belief Propagation
Message
Messages
Going down arrow, sum out parent Going up arrow, Bayes Law)(
)()|()|(
BP
APABPBAP
Bayes Law:
“Causal” message “Diagnostic” message
* some figures from: Peter Lucas BN lecture course
6
Belief Propagation )(
)()|()|(
BP
APABPBAP
Bayes Law:
)|()( jijVV VVPVj
i
Diagnostic message against arrow
)()( jjVV VPVj
i
Causal message with arrow
* some figures from: Peter Lucas BN lecture course
7
Inference in general graphs• Belief propagation is only guaranteed to be
correct for trees• A general graph should be converted to a
junction tree, by clustering nodes• Computationally complexity is exponential in
size of the resulting clusters (NP-hard)
8
Junction tree: BN Junction TreeAlgorithm for turning a Bayesian Network with loops into a junction tree
* Lauritzen 96
1. “Moralize” the graph by connecting parents2. Drop the arrows.3. Triangulate (connect nodes if a loop of >3 exists)4. Put in intersection variables
X6
X4X5
X2X3
X1
X6
X4X5
X2X3
X1
X6
X4X5
X2X3
X1Graph:
X6
X4X5
X2X3
X1(1) (2) (3)
Junction Tree:
Image fromSam Roweis
9
Global message passing: Two pass
rootroot
Collect
rootroot
Distribute
Figure from P. Green
• Select one clique as the root
• Two pass message passing: first collect evidence, then distribute evidence.
10
Junction Tree Inference
Image fromCecil Huang
11
Global message passing: Parallel, distributed version
X1
X2
X3 X4
X1
X2
X3 X4
Stage 1. Stage 2.
• All nodes can simultaneously send messages out, if it has received the messages from all its parents
• Parallel processing (topology level parallelism).
12
Details
Junction Tree Algorithm
13
Junction Tree Properties
An undirected graph whose vertices (clusters) are sets of variables with three properties:1. Singly connected property (only one path)2. Potential property (all variables are represented)3. Running intersection property (variable in 2 nodes
implies that all nodes on the path have the variable)
),(),(),,(1
),,,,( ecψdcψcbaψZ
edcbap
a b
c
d e
{a,b,c}
{c}
{c} {c,e}
{c,d}
a b
c
d e
Graph: Moralized, triangular graph:
Junction Tree:
Collect and Distribute passnecessary for Inference
14
Junction Tree 1
Image fromSam Roweis
15
Junction Tree 2
Image fromSam Roweis
16
Message Passing in Junction Tree• Potential
– U, the space of U (subset of the set of all nodes/vertices V) is the Cartesian product of the state sets of the nodes of U
– A discrete potential on U is a mapping from U to the non-negative real numbers Ro.
– Each clique and seperator in the junction tree has a potential (actually marginalized joint distribution on the nodes in the clique/seperator)
• Propagation/message passing between two adjacent cliques C1, C2 (S0 is their seperator)
– Marginalize C1’s potential to get new potential for S0
– Update C2’s potential
– Update S0’s potential to its new potential
01
1
\
*
SC
CSo
0
022
**
S
SCC
17
Message Passing General• BayesNet forms a tree
– Pearl’s algorithm is Message Passing first out and then back in from a given node
• Not a tree (has loops)– Turn loops into cliques until net is a tree, then
use Pearl’s algorithm
• Cliques turn out to be too big– Exact computation is exponential in size of
largest cliques– Use approximation algorithms (many)
18
Towards Decisions
19
From Bayes’ Net to Decision/Influence Network
Start out with a causal Bayesian Network. In this case, Possible causes of leaf loss in an apple tree.
We want to know what to do about this.
We duplicate the network because we are going toAdd an intervention: Treating sickness
The intervention will cost us,
but might help with our utility:
Making a profit when we Harvest.
Given the cost, we can now infer the optimalTreat/no-treat policy
20
Replicate cold net and adddecision and cost/utility nodes
Influence Example
No fever means, coldless likely => Treat
No fever, no runny nosehealthy, don’t treatNo fever, runny nose =>
allergy => treat
21
General
22
Probabilistic graphical models
Probabilistic models
Directed Undirected
Graphical models
Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA
Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models
(Bayesian belief nets) (Markov nets)
23
Graphical Models Taxonomy
24
Typical forms for the Conditional Probability Distributions (CPDs)
at graph nodes• For Discrete-state nodes
– Tabular (CPT) – Decision tree– Deterministic CPD– SoftMax(logistic/sigmoid)– Noisy-OR– MLP– SVM?
• For Continuous-state nodes– Gaussian– Mixture of Gaussians– Linear Gaussian– Conditional Gaussian– Regression tree
25
We can’t always compute exact inference. We then useApproximate Inference
Beam searchA* search
Importance samplingMCMC
ExpectationPropagation
Mean field
A vi's ca teg oriza tion fo r ap p roxim ate in fe ren ce a lg orith m s
S am p lin gM eth od s
S earchM eth od s
L oop yP rop ag ation
A p p roxim ate com p u ta tionon exac t m od e l
V aria tion a lm eth od s
M in ib u cke ts B oyen -K o lle rm eth od fo r D B N
P ro jec tion
E xac t com p u ta tionon ap p roxim ate m od e l
A p p roxim ate in fe ren ce a lg orith m s
26
Software
Libraries
27
Name Authors Src API Exec Free Inference Comments
Bassist U. Helsinki C++ Y U 0 MH Generates C++ for MCMC.
BayesiaLab Bayesia Ltd N N - $ jtree
"Supervised and unsupervised learning, clustering, analysis toolbox, adaptive questionnaires, dynamic models"
BNTMurphy (U.C.Berkeley) Matlab/C Y WUM 0 Many
Also handles dynamic models, like HMMs and Kalman filters.
BNJ Hsu (Kansas) Java - - 0 jtree, IS -
BUGSMRC/Imperial College N N WU 0 Gibbs -
Deal Bottcher et al R - - 0 None Structure learning.
GDAGsimWilkinson (U. Newcastle) C Y WUM 0 Exact
Bayesian analysis of large linear Gaussian directed models.
Genie U. Pittsburgh N WU WU 0 Jtree -
GMRFsimRue (U. Trondheim) C Y WUM 0 MCMC
Bayesian analysis of large linear Gaussian undirected models.
GMTkBilmes (UW), Zweig (IBM) N Y U 0 Jtree Designed for speech recognition.
Grappa Green (Bristol) R - - 0 Jtree -Hugin Expert Hugin N Y W $ Jtree -
HydraWarnes (U.Wash.) Java - - 0 MCMC -
Java Bayes Cozman (CMU) Java Y WUM 0Varelim, jtree -
MIMHyperGraph Software N N W $ Jtree Up to 52 variables.
MSBNx Microsoft N Y W 0 Jtree -Netica Norsys N WUM W $ jtree -
PMT Pavlovic (BU) Matlab/C - - 0special purpose -
PNL Eruhimov (Intel) C++ - - 0 Many A C++ version of BNT; will be released 12/03.
Pulcinella IRIDIA Lisp Y WUM 0 ? Uses valuation systems for non-probabilistic calculi.
RISODodier (U.Colorado) Java Y WUM 0 Polytree Distributed implementation.
Tetrad CMU N N WU 0 None -UnBBayes ? Java - - 0 jtree K2 for struct learning
VibesWinn & Bishop (U. Cambridge) Java Y WU 0 Variational Not yet available.
WinMine Microsoft N N W 0 None Learns BN or dependency net structure.
XBAIES 2.0 Cowell (City U.) N N W 0 Jtree -
Bayesian Net SoftwareAppend A
28
Compare All BayesNet Software
G Y ManyY
Append A
29
Compare All BayesNet Software
G Y ManyY
Append A
30
Compare All BayesNet Software
G Y ManyY
Append A
31
Compare All BayesNet SoftwareKEY
G Y ManyY
Append A
32
BN ResearchersMAJOR RESEARCHERSMicrosoft: http://www.research.microsoft.com/research/dtg/ Heckerman & Chickering are big there, currently pushing uses of Dependency NetworksProf. Russell (Berkeley): http://http.cs.berkeley.edu/~russell/ Wants more expressive probabilistic language. Currently pushing Center for Intelligent Systems at Berkeley http://www.eecs.berkeley.edu/CIS Brings together wide range of luminariesProf. Jordan (Berkeley): http://www.cs.berkeley.edu/~jordan/ Writing book, Data retrieval, structure learning, clustering. Variational methods, All.Yair Weiss (Berkely=>Hebrew U): http://www.cs.berkeley.edu/~yweiss/ Computationally tractable approximation. Vision, now at Hebrew U.Prof. Koller (Stanford): http://robotics.stanford.edu/~koller/courses.html Writing book, probabilistic relational models (PRMs) more expressive languages, All.Prof. Frey (Waterloo): http://www.cs.toronto.edu/~frey/ Vision models, machine learning reformulationsProf. Pearl (UCLA): http://bayes.cs.ucla.edu/jp_home.html Founder. Causality theoryBill Freeman (MIT, was MERL, Learning, vision): http://www.ai.mit.edu/people/wtf/ Low level vision, learning theory now at MITPeter Spirtes (CMU, Tetrad project): http://hss.cmu.edu/HTML/departments/philosophy/people/directory/Peter_Spirtes.htmlKevin Murphy (MIT, BN Toolkit): http://www.ai.mit.edu/~murphyk/ Toolboxes (BNT), computational speedups, tutorialsJonathan Yedidia (MERL): http://www.merl.com/people/yedidia/ Learning theoryPietro Perona (CalTech): http://www.erc.caltech.edu/ Vision Center for NeuroMorphic information http://www.erc.caltech.edu/ Brings together machine learning, BN, vision, design etcRon Parr (Duke University) http://www.cs.duke.edu/~parr/ Game theory, reinforcement, multi-agentNir Friedman (Hebrew U): http://www.cs.huji.ac.il/~nirf/ Computational biology, efficient inferenceAvi Pfeffer (Harvard): http://www.eecs.harvard.edu/~avi/ Richer probabilistic expressibility, intelligent systemsZoubin Ghahramani (Gatsby Institute, London): http://www.gatsby.ucl.ac.uk/~zoubin Variational BayesFinn Jensen, (Hugin, Denmark): http://www.cs.auc.dk/~fvj Classical (expert-system style) BNsUffe Kjaerulff, (Hugin, Denmark): http://www.cs.auc.dk/~uk DittoEric Horvitz, (Microsoft): http://research.microsoft.com/~horvitz/ Decision making, user interfaceTommi Jaakkola, (MIT): http://www.ai.mit.edu/people/tommi/tommi.html Theory, structure learning from bio dataRoss Shachter, (Stanford): http://www.stanford.edu/dept/MSandE/faculty/shachter/ Influence diagramsDavid Spiegelhalter, (Univ. College London): http://www.mrc-bsu.cam.ac.uk/BSUsite/AboutUs/People/davids.shtml Bayesian and medical BNsSteffan Laurizten, (Europe): http://www.math.auc.dk/~steffen/ Statisical theory Phil Dawid, (Univ College London): http://www.ucl.ac.uk/~ucak06d/ Statistical theoryKathy Laskey, (George Mason): http://www.ucl.ac.uk/~ucak06d/ Object-oriented BNs, military applicationsJeff Bilmes, (U Washington): http://www.ee.washington.edu/faculty/bilmes/ DBNs for speechHagai Attias, (Microsoft): http://research.microsoft.com/users/hagaia/ Variational and sampling for (acoustic) signal processing
World wide list of Bayesians (not just networks): http://bayes.stat.washington.edu/bayes_people.html CONFERENCESUAI: http://robotics.stanford.edu/~uai01/NIPS: http://www.cs.cmu.edu/Groups/NIPS/
Append C
33
PNL vs. Other Graphical Models LibrariesName Author Src Cost GUI Un/dir Utility DBN Gauss Inference Learning
.edu .comParam Struct
PNL Intel C++0 0|$ -
U,D* + +
Jtree, BP, Gibbs+ +
BNT Murphy Matlab
0 0 -
D
+ + +
Jtree, BP, Gibbs, varelim
+ +
GMTk Bilmes C++ 0 0 - D - + - Jtree + +
Hugin Hugin - $ $ + D + - + Jtree + -
BUGS MRC - 0 ∞ + D - - + Gibbs + -
Genie U. Pitt. - 0 ∞ + D + - - Jtree - -
MSBN Microsoft - 0 $ + D + - - Jtree - -
WinMine Microsoft - 0 $ + U,D - - - - + +
JavaBayes Cozman Java0 ∞ +
D- - -
Varelim- -
Present Library:
Intel Library is much more comprehensive
Append C
34
Examples of Use
Applications
35
Face Modeling and Recognition Using Bayesian Networks
Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski
Face feature finder (separate)
Learn Gabor filter “jet” at each point
System:
Add Pose switching variable
36
Face Modeling and Recognition Using Bayesian Networks
Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski
Results:
Results:Results:
BNPFR – Bayesnet with PoseBNFR – Bayesnet w/o PoseEHMM – Embedded HMMEGM – Gabor jets
Pose
37
Looking for all possible joint configuration J is computationally impractical. Therefore, segmentation takes place in two stages. First, we segment the head and torso, and determine the position of the neck. Then, we jointly segment the upper arms, forearms and hands, and determine the position of the remaining joints.
The Segmentation Problem
AHT ji
Aijijji
HTijijF qPqPPQQQJ,QJ,
JOJOQJO,,
),|(),|(maxarg),|(maxarg
HTA QQ , state assignments for the arm and head&torso regions
HTA JJ , joints for the arms and head&torso components.
Step I Step II
38
Upper Body Model
LeftHand
Hl
JointsJ
ComponentsC
ObservationsOObservations
Oij
F A J CF
F AAJAJAJOji
ijijq
ijijijji
ij uPPqPqOPuOPPij,,
})()|(),|(),,|({})({)(
Anthropological Measurements
A
LeftForearm
Fl
LeftUpper Arm
Ul
HeadH
TorsoT
RightUpper Arm
Ur
RightForearm
Fr
RightHand
Hl
HandSizeSh
HeadSizeShd
TorsoSizeSt
UpperArm Size
Sa
ForearmSize Sf
LeftWrist
Wl
LeftElbow
El
LeftShoulder
Sl
NeckN
RightShoulder
Sr
RightElbow
Er
RightWrist
Wr
HandSizeSh
UpperArm Size
Sa
ForearmSize Sf
39
Body Tracking Results
40
Audio-Visual Continuous Speech Recognition. The Overall System
Face Detection
Acoustic Features (MFCC)
Mouth Detection
Audio video signal
Visual Features
AV Model
Train
Reco
Mouth Tracking
41
Speaker Independent AVCSR
A coupled HMM for audio visual speech recognition
Audio observations of size 13, modeled with 3states, 32 mixture/state, diagonal covariance matrix (39 English phoenemes).
Visual observations of size 13, modeled with 3states, 12 mixture/state, diagonal covariance matrix (13 English visemes).
AV Speech Reco
42
AVCSR Experimental Results
•WER obtained on X2MTVS database, 300 speakers, 10 digit enumeration sentences.
The system improves by over 55% the recognition rate of the acoustic only speech recognition at SNR 0db!
43
Bill Freeman (MIT AI Lab) created a simple model of early visual processing:
MRFs for Hyper-Resolution
He presented blurred images and trained on the sharp original, thentested on new images
ActualCubic SplineInput Bayesian Net
44
MRFs for Shape from ShadingThe illumination, which changes with each frame,is factored from the reflectance which stays the same:
This model is then used to insert graphics with proper lighting:
vs.
Frames overtime =>
45
Blei, Jordan Malik
46
Blei, Jordan Malik
47
Blei, Jordan Malik
48
Blei, Jordan Malik
49
Example of learned models(from Frey)
50
Example of learned models(from Frey)