1 lecture 15 bayesian networks in computer vision gary bradski sebastian thrun *

1

Lecture 15Lecture 15Bayesian Networks in Computer Bayesian Networks in Computer

VisionVisionGary BradskiGary Bradski

Sebastian ThrunSebastian Thrun

http://robots.stanford.edu/cs223b/index.html

*

2

What is a Bayesian Network?

(random) variables A conditional

probability distribution quantifies the effects of the parents on node.

The graph is directed and acyclic.

It’s a Factored Joint Distribution and/or Causal Diagram

P(F|C) P(R|C,A)

P(W)

A joint distribution, here p(W,C,A,R,R), is everything we can know about the problem,but it grows exponentially, here 25-1=31. Factoring the distribution in a Bayesnetdecreases the number of parameters, here from 31 to 11 (note probabilities sum to onewhich decreases the number of parameters to be specified).

P(A|W)P(C|W)

causal links

dependencies

3

Causality and Bayesian Nets

Mains

Transf.

Diode

Diode

Capac.

Ammeter

Battery

Observed

Un-Observed

One can also think of Bayesian Networks as a “Circuit Diagram” of Probability Models

• The Links indicate causal effect, not direction of information flow.• Just as we can predict effects of changes on the circuit diagram, we can predict consequences of “operating” on our probability model diagram.

4

Inference

• Once we have a model, we need to make it consistent by “diffusing” the distributions around until they are all consistent with one another.

• Central algorithm for this:

Belief Propagation

5

Specifically:

9

Belief Propagation

Message

Messages

Going down arrow, sum out parent Going up arrow, Bayes Law)(

)()|()|(

BP

APABPBAP

Bayes Law:

“Causal” message “Diagnostic” message

* some figures from: Peter Lucas BN lecture course

6

Belief Propagation )(

)()|()|(

BP

APABPBAP

Bayes Law:

)|()( jijVV VVPVj

i

Diagnostic message against arrow

)()( jjVV VPVj

i

Causal message with arrow

* some figures from: Peter Lucas BN lecture course

7

Inference in general graphs• Belief propagation is only guaranteed to be

correct for trees• A general graph should be converted to a

junction tree, by clustering nodes• Computationally complexity is exponential in

size of the resulting clusters (NP-hard)

8

Junction tree: BN Junction TreeAlgorithm for turning a Bayesian Network with loops into a junction tree

* Lauritzen 96

1. “Moralize” the graph by connecting parents2. Drop the arrows.3. Triangulate (connect nodes if a loop of >3 exists)4. Put in intersection variables

X6

X4X5

X2X3

X1

X6

X4X5

X2X3

X1

X6

X4X5

X2X3

X1Graph:

X6

X4X5

X2X3

X1(1) (2) (3)

Junction Tree:

Image fromSam Roweis

9

Global message passing: Two pass

rootroot

Collect

rootroot

Distribute

Figure from P. Green

• Select one clique as the root

• Two pass message passing: first collect evidence, then distribute evidence.

10

Junction Tree Inference

Image fromCecil Huang

11

Global message passing: Parallel, distributed version

X1

X2

X3 X4

X1

X2

X3 X4

Stage 1. Stage 2.

• All nodes can simultaneously send messages out, if it has received the messages from all its parents

• Parallel processing (topology level parallelism).

12

Details

Junction Tree Algorithm

13

Junction Tree Properties

An undirected graph whose vertices (clusters) are sets of variables with three properties:1. Singly connected property (only one path)2. Potential property (all variables are represented)3. Running intersection property (variable in 2 nodes

implies that all nodes on the path have the variable)

),(),(),,(1

),,,,( ecψdcψcbaψZ

edcbap

a b

c

d e

{a,b,c}

{c}

{c} {c,e}

{c,d}

a b

c

d e

Graph: Moralized, triangular graph:

Junction Tree:

Collect and Distribute passnecessary for Inference

14

Junction Tree 1


15

Junction Tree 2


16

Message Passing in Junction Tree• Potential

– U, the space of U (subset of the set of all nodes/vertices V) is the Cartesian product of the state sets of the nodes of U

– A discrete potential on U is a mapping from U to the non-negative real numbers Ro.

– Each clique and seperator in the junction tree has a potential (actually marginalized joint distribution on the nodes in the clique/seperator)

• Propagation/message passing between two adjacent cliques C1, C2 (S0 is their seperator)

– Marginalize C1’s potential to get new potential for S0

– Update C2’s potential

– Update S0’s potential to its new potential

01

1

\

*

SC

CSo

0

022

**

S

SCC

17

Message Passing General• BayesNet forms a tree

– Pearl’s algorithm is Message Passing first out and then back in from a given node

• Not a tree (has loops)– Turn loops into cliques until net is a tree, then

use Pearl’s algorithm

• Cliques turn out to be too big– Exact computation is exponential in size of

largest cliques– Use approximation algorithms (many)

18

Towards Decisions

19

From Bayes’ Net to Decision/Influence Network

Start out with a causal Bayesian Network. In this case, Possible causes of leaf loss in an apple tree.

We want to know what to do about this.

We duplicate the network because we are going toAdd an intervention: Treating sickness

The intervention will cost us,

but might help with our utility:

Making a profit when we Harvest.

Given the cost, we can now infer the optimalTreat/no-treat policy

20

Replicate cold net and adddecision and cost/utility nodes

Influence Example

No fever means, coldless likely => Treat

No fever, no runny nosehealthy, don’t treatNo fever, runny nose =>

allergy => treat

21

General

22

Probabilistic graphical models

Probabilistic models

Directed Undirected

Graphical models

Alarm networkState-space modelsHMMsNaïve Bayes classifierPCA/ ICA

Markov Random FieldBoltzmann machineIsing modelMax-ent modelLog-linear models

(Bayesian belief nets) (Markov nets)

23

Graphical Models Taxonomy

24

Typical forms for the Conditional Probability Distributions (CPDs)

at graph nodes• For Discrete-state nodes

– Tabular (CPT) – Decision tree– Deterministic CPD– SoftMax(logistic/sigmoid)– Noisy-OR– MLP– SVM?

• For Continuous-state nodes– Gaussian– Mixture of Gaussians– Linear Gaussian– Conditional Gaussian– Regression tree

25

We can’t always compute exact inference. We then useApproximate Inference

Beam searchA* search

Importance samplingMCMC

ExpectationPropagation

Mean field

A vi's ca teg oriza tion fo r ap p roxim ate in fe ren ce a lg orith m s

S am p lin gM eth od s

S earchM eth od s

L oop yP rop ag ation

A p p roxim ate com p u ta tionon exac t m od e l

V aria tion a lm eth od s

M in ib u cke ts B oyen -K o lle rm eth od fo r D B N

P ro jec tion

E xac t com p u ta tionon ap p roxim ate m od e l

A p p roxim ate in fe ren ce a lg orith m s

26

Software

Libraries

27

Name Authors Src API Exec Free Inference Comments

Bassist U. Helsinki C++ Y U 0 MH Generates C++ for MCMC.

BayesiaLab Bayesia Ltd N N - $ jtree

"Supervised and unsupervised learning, clustering, analysis toolbox, adaptive questionnaires, dynamic models"

BNTMurphy (U.C.Berkeley) Matlab/C Y WUM 0 Many

Also handles dynamic models, like HMMs and Kalman filters.

BNJ Hsu (Kansas) Java - - 0 jtree, IS -

BUGSMRC/Imperial College N N WU 0 Gibbs -

Deal Bottcher et al R - - 0 None Structure learning.

GDAGsimWilkinson (U. Newcastle) C Y WUM 0 Exact

Bayesian analysis of large linear Gaussian directed models.

Genie U. Pittsburgh N WU WU 0 Jtree -

GMRFsimRue (U. Trondheim) C Y WUM 0 MCMC

Bayesian analysis of large linear Gaussian undirected models.

GMTkBilmes (UW), Zweig (IBM) N Y U 0 Jtree Designed for speech recognition.

Grappa Green (Bristol) R - - 0 Jtree -Hugin Expert Hugin N Y W $ Jtree -

HydraWarnes (U.Wash.) Java - - 0 MCMC -

Java Bayes Cozman (CMU) Java Y WUM 0Varelim, jtree -

MIMHyperGraph Software N N W $ Jtree Up to 52 variables.

MSBNx Microsoft N Y W 0 Jtree -Netica Norsys N WUM W $ jtree -

PMT Pavlovic (BU) Matlab/C - - 0special purpose -

PNL Eruhimov (Intel) C++ - - 0 Many A C++ version of BNT; will be released 12/03.

Pulcinella IRIDIA Lisp Y WUM 0 ? Uses valuation systems for non-probabilistic calculi.

RISODodier (U.Colorado) Java Y WUM 0 Polytree Distributed implementation.

Tetrad CMU N N WU 0 None -UnBBayes ? Java - - 0 jtree K2 for struct learning

VibesWinn & Bishop (U. Cambridge) Java Y WU 0 Variational Not yet available.

WinMine Microsoft N N W 0 None Learns BN or dependency net structure.

XBAIES 2.0 Cowell (City U.) N N W 0 Jtree -

Bayesian Net SoftwareAppend A

28

Compare All BayesNet Software

G Y ManyY

Append A

29


G Y ManyY

Append A

30


G Y ManyY

Append A

31

Compare All BayesNet SoftwareKEY

G Y ManyY

Append A

32

BN ResearchersMAJOR RESEARCHERSMicrosoft: http://www.research.microsoft.com/research/dtg/ Heckerman & Chickering are big there, currently pushing uses of Dependency NetworksProf. Russell (Berkeley): http://http.cs.berkeley.edu/~russell/ Wants more expressive probabilistic language. Currently pushing Center for Intelligent Systems at Berkeley http://www.eecs.berkeley.edu/CIS Brings together wide range of luminariesProf. Jordan (Berkeley): http://www.cs.berkeley.edu/~jordan/ Writing book, Data retrieval, structure learning, clustering. Variational methods, All.Yair Weiss (Berkely=>Hebrew U): http://www.cs.berkeley.edu/~yweiss/ Computationally tractable approximation. Vision, now at Hebrew U.Prof. Koller (Stanford): http://robotics.stanford.edu/~koller/courses.html Writing book, probabilistic relational models (PRMs) more expressive languages, All.Prof. Frey (Waterloo): http://www.cs.toronto.edu/~frey/ Vision models, machine learning reformulationsProf. Pearl (UCLA): http://bayes.cs.ucla.edu/jp_home.html Founder. Causality theoryBill Freeman (MIT, was MERL, Learning, vision): http://www.ai.mit.edu/people/wtf/ Low level vision, learning theory now at MITPeter Spirtes (CMU, Tetrad project): http://hss.cmu.edu/HTML/departments/philosophy/people/directory/Peter_Spirtes.htmlKevin Murphy (MIT, BN Toolkit): http://www.ai.mit.edu/~murphyk/ Toolboxes (BNT), computational speedups, tutorialsJonathan Yedidia (MERL): http://www.merl.com/people/yedidia/ Learning theoryPietro Perona (CalTech): http://www.erc.caltech.edu/ Vision Center for NeuroMorphic information http://www.erc.caltech.edu/ Brings together machine learning, BN, vision, design etcRon Parr (Duke University) http://www.cs.duke.edu/~parr/ Game theory, reinforcement, multi-agentNir Friedman (Hebrew U): http://www.cs.huji.ac.il/~nirf/ Computational biology, efficient inferenceAvi Pfeffer (Harvard): http://www.eecs.harvard.edu/~avi/ Richer probabilistic expressibility, intelligent systemsZoubin Ghahramani (Gatsby Institute, London): http://www.gatsby.ucl.ac.uk/~zoubin Variational BayesFinn Jensen, (Hugin, Denmark): http://www.cs.auc.dk/~fvj Classical (expert-system style) BNsUffe Kjaerulff, (Hugin, Denmark): http://www.cs.auc.dk/~uk DittoEric Horvitz, (Microsoft): http://research.microsoft.com/~horvitz/ Decision making, user interfaceTommi Jaakkola, (MIT): http://www.ai.mit.edu/people/tommi/tommi.html Theory, structure learning from bio dataRoss Shachter, (Stanford): http://www.stanford.edu/dept/MSandE/faculty/shachter/ Influence diagramsDavid Spiegelhalter, (Univ. College London): http://www.mrc-bsu.cam.ac.uk/BSUsite/AboutUs/People/davids.shtml Bayesian and medical BNsSteffan Laurizten, (Europe): http://www.math.auc.dk/~steffen/ Statisical theory Phil Dawid, (Univ College London): http://www.ucl.ac.uk/~ucak06d/ Statistical theoryKathy Laskey, (George Mason): http://www.ucl.ac.uk/~ucak06d/ Object-oriented BNs, military applicationsJeff Bilmes, (U Washington): http://www.ee.washington.edu/faculty/bilmes/ DBNs for speechHagai Attias, (Microsoft): http://research.microsoft.com/users/hagaia/ Variational and sampling for (acoustic) signal processing

World wide list of Bayesians (not just networks): http://bayes.stat.washington.edu/bayes_people.html CONFERENCESUAI: http://robotics.stanford.edu/~uai01/NIPS: http://www.cs.cmu.edu/Groups/NIPS/

Append C

http://www.research.microsoft.com/research/dtg/



http://http.cs.berkeley.edu/~russell/

http://www.eecs.berkeley.edu/CIS

http://www.cs.berkeley.edu/~jordan/

http://www.cs.berkeley.edu/~yweiss/

http://robotics.stanford.edu/~koller/courses.html

http://www.cs.toronto.edu/~frey/

http://bayes.cs.ucla.edu/jp_home.html

http://bayes.cs.ucla.edu/jp_home.html

http://www.ai.mit.edu/people/wtf/

http://www.ai.mit.edu/people/wtf/

http://hss.cmu.edu/HTML/departments/philosophy/people/directory/Peter_Spirtes.html

http://www.ai.mit.edu/~murphyk/

http://www.merl.com/people/yedidia/

http://www.erc.caltech.edu/

http://www.erc.caltech.edu/

http://www.cs.duke.edu/~parr/

http://www.cs.huji.ac.il/~nirf/

http://www.eecs.harvard.edu/~avi/

http://www.gatsby.ucl.ac.uk/~zoubin

33

PNL vs. Other Graphical Models LibrariesName Author Src Cost GUI Un/dir Utility DBN Gauss Inference Learning

.edu .comParam Struct

PNL Intel C++0 0|$ -

U,D* + +

Jtree, BP, Gibbs+ +

BNT Murphy Matlab

0 0 -

D

+ + +

Jtree, BP, Gibbs, varelim

+ +

GMTk Bilmes C++ 0 0 - D - + - Jtree + +

Hugin Hugin - $ $ + D + - + Jtree + -

BUGS MRC - 0 ∞ + D - - + Gibbs + -

Genie U. Pitt. - 0 ∞ + D + - - Jtree - -

MSBN Microsoft - 0 $ + D + - - Jtree - -

WinMine Microsoft - 0 $ + U,D - - - - + +

JavaBayes Cozman Java0 ∞ +

D- - -

Varelim- -

Present Library:

Intel Library is much more comprehensive

Append C

34

Examples of Use

Applications

35

Face Modeling and Recognition Using Bayesian Networks

Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski

Face feature finder (separate)

Learn Gabor filter “jet” at each point

System:

Add Pose switching variable

36

Face Modeling and Recognition Using Bayesian Networks

Gang Song*, Tao Wang, Yimin Zhang, Wei Hu, Guangyou Xu*, Gary Bradski

Results:

Results:Results:

BNPFR – Bayesnet with PoseBNFR – Bayesnet w/o PoseEHMM – Embedded HMMEGM – Gabor jets

Pose

37

Looking for all possible joint configuration J is computationally impractical. Therefore, segmentation takes place in two stages. First, we segment the head and torso, and determine the position of the neck. Then, we jointly segment the upper arms, forearms and hands, and determine the position of the remaining joints.

The Segmentation Problem

AHT ji

Aijijji

HTijijF qPqPPQQQJ,QJ,

JOJOQJO,,

),|(),|(maxarg),|(maxarg

HTA QQ , state assignments for the arm and head&torso regions

HTA JJ , joints for the arms and head&torso components.

Step I Step II

38

Upper Body Model

LeftHand

Hl

JointsJ

ComponentsC

ObservationsOObservations

Oij

F A J CF

F AAJAJAJOji

ijijq

ijijijji

ij uPPqPqOPuOPPij,,

})()|(),|(),,|({})({)(

Anthropological Measurements

A

LeftForearm

Fl

LeftUpper Arm

Ul

HeadH

TorsoT

RightUpper Arm

Ur

RightForearm

Fr

RightHand

Hl

HandSizeSh

HeadSizeShd

TorsoSizeSt

UpperArm Size

Sa

ForearmSize Sf

LeftWrist

Wl

LeftElbow

El

LeftShoulder

Sl

NeckN

RightShoulder

Sr

RightElbow

Er

RightWrist

Wr

HandSizeSh

UpperArm Size

Sa

ForearmSize Sf

39

Body Tracking Results

40

Audio-Visual Continuous Speech Recognition. The Overall System

Face Detection

Acoustic Features (MFCC)

Mouth Detection

Audio video signal

Visual Features

AV Model

Train

Reco

Mouth Tracking

41

Speaker Independent AVCSR

A coupled HMM for audio visual speech recognition

Audio observations of size 13, modeled with 3states, 32 mixture/state, diagonal covariance matrix (39 English phoenemes).

Visual observations of size 13, modeled with 3states, 12 mixture/state, diagonal covariance matrix (13 English visemes).

AV Speech Reco

42

AVCSR Experimental Results

•WER obtained on X2MTVS database, 300 speakers, 10 digit enumeration sentences.

The system improves by over 55% the recognition rate of the acoustic only speech recognition at SNR 0db!

43

Bill Freeman (MIT AI Lab) created a simple model of early visual processing:

MRFs for Hyper-Resolution

He presented blurred images and trained on the sharp original, thentested on new images

ActualCubic SplineInput Bayesian Net

44

MRFs for Shape from ShadingThe illumination, which changes with each frame,is factored from the reflectance which stays the same:

This model is then used to insert graphics with proper lighting:

vs.

Frames overtime =>

45

Blei, Jordan Malik

46

Blei, Jordan Malik

47

Blei, Jordan Malik

48

Blei, Jordan Malik

49

Example of learned models(from Frey)

50

Example of learned models(from Frey)

1 lecture 15 bayesian networks in computer vision gary bradski sebastian thrun *

Documents

belief propagation slide

junction tree properties

bn junction tree algorithm

arrow causal message

junction tree inference

cecil huang slide

sam roweis slide

global message passing