bayesian networks

74
Bayesian Networks Bayesian Networks Russell and Norvig: Chapter 14 CS440 Fall 2005

Upload: mara-randolph

Post on 04-Jan-2016

65 views

Category:

Documents


4 download

DESCRIPTION

Bayesian Networks. Russell and Norvig: Chapter 14 CS440 Fall 2005. sensors. environment. ?. agent. actuators. I believe that the sun will still exist tomorrow with probability 0.999999 and that it will be a sunny with probability 0.6. Probabilistic Agent. Problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian Networks

Bayesian NetworksBayesian Networks

Russell and Norvig: Chapter 14

CS440 Fall 2005

Page 2: Bayesian Networks

Probabilistic AgentProbabilistic Agent

environmentagent

?

sensors

actuatorsI believe that the sunwill still exist tomorrowwith probability 0.999999and that it will be a sunnywith probability 0.6

Page 3: Bayesian Networks

ProblemProblem

At a certain time t, the KB of an agent is some collection of beliefsAt time t the agent’s sensors make an observation that changes the strength of one of its beliefsHow should the agent update the strength of its other beliefs?

Page 4: Bayesian Networks

Purpose of Bayesian Purpose of Bayesian NetworksNetworks

Facilitate the description of a collection of beliefs by making explicit causality relations and conditional independence among beliefsProvide a more efficient way (than by using joint distribution tables) to update belief strengths when new evidence is observed

Page 5: Bayesian Networks

Other NamesOther Names

Belief networks Probabilistic networks Causal networks

Page 6: Bayesian Networks

Bayesian Networks

A simple, graphical notation for conditional independence assertions resulting in a compact representation for the full joint distribution

Syntax:

a set of nodes, one per variable

a directed, acyclic graph (link = ‘direct influences’)

a conditional distribution for each node given its parents: P(Xi|Parents(Xi))

Page 7: Bayesian Networks

Example

Cavity

Toothache Catch

Weather

Topology of network encodes conditional independence assertions:

Weather is independent of other variablesToothache and Catch are independent given Cavity

Page 8: Bayesian Networks

ExampleI’m at work, neighbor John calls to say my alarm isringing, but neighbor Mary doesn’t call. Sometime it’s set off by a minor earthquake. Is there a burglar?

Network topology reflects “causal” knowledge:- A burglar can set the alarm off- An earthquake can set the alarm off- The alarm can cause Mary to call- The alarm can cause John to call

Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

Page 9: Bayesian Networks

A Simple Belief NetworkA Simple Belief Network

Burglary Earthquake

Alarm

MaryCallsJohnCalls

causes

effects

Directed acyclicgraph (DAG)

Intuitive meaning of arrowfrom x to y: “x has direct influence on y”

Nodes are random variables

Page 10: Bayesian Networks

Assigning Probabilities to Assigning Probabilities to RootsRoots

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

Page 11: Bayesian Networks

Conditional Probability TablesConditional Probability Tables

B E P(A|B,E)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

Size of the CPT for a node with k parents: ?

Page 12: Bayesian Networks

Conditional Probability TablesConditional Probability Tables

B E P(A|B,E)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|A)

TF

0.900.05

A P(M|A)

TF

0.700.01

Page 13: Bayesian Networks

What the BN MeansWhat the BN Means

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|A)

TF

0.900.05

A P(M|A)

TF

0.700.01

P(x1,x2,…,xn) = i=1,…,nP(xi|

Parents(Xi))

Page 14: Bayesian Networks

Calculation of Joint Calculation of Joint ProbabilityProbability

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

Page 15: Bayesian Networks

What The BN EncodesWhat The BN Encodes

Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or Alarm

The beliefs JohnCalls and MaryCalls are independent given Alarm or Alarm

Burglary Earthquake

Alarm

MaryCallsJohnCalls

For example, John doesnot observe any burglariesdirectly

Page 16: Bayesian Networks

What The BN EncodesWhat The BN Encodes

Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or Alarm

The beliefs JohnCalls and MaryCalls are independent given Alarm or Alarm

Burglary Earthquake

Alarm

MaryCallsJohnCalls

For instance, the reasons why John and Mary may not call if there is an alarm are unrelated

Note that these reasons couldbe other beliefs in the network.The probabilities summarize thesenon-explicit beliefs

Page 17: Bayesian Networks

Structure of BNStructure of BN

The relation:

P(x1,x2,…,xn) = i=1,…,nP(xi|Parents(Xi))

means that each belief is independent of its predecessors in the BN given its parentsSaid otherwise, the parents of a belief Xi are all the beliefs that “directly influence” Xi

Usually (but not always) the parents of Xi are its causes and Xi is the effect of these causes

E.g., JohnCalls is influenced by Burglary, but not directly. JohnCalls is directly influenced by Alarm

Page 18: Bayesian Networks

Construction of BNConstruction of BN

Choose the relevant sentences (random variables) that describe the domainSelect an ordering X1,…,Xn, so that all the beliefs that directly influence Xi are before Xi

For j=1,…,n do: Add a node in the network labeled by Xj

Connect the node of its parents to Xj

Define the CPT of Xj

• The ordering guarantees that the BN will have no cycles

Page 19: Bayesian Networks

Cond. Independence Relations

1. Each random variable X, is conditionally independent of its non-descendents, given its parents Pa(X)Formally,I(X; NonDesc(X) | Pa(X))2. Each random variable is conditionally independent of all the other nodes in the graph, given its neighbor Descendent

Ancestor

Parent

Non-descendent

X

Y1 Y2

Non-descendent

Page 20: Bayesian Networks

Set E of evidence variables that are observed, e.g., {JohnCalls,MaryCalls}Query variable X, e.g., Burglary, for which we would like to know the posterior probability distribution P(X|E)

Distribution conditional to the observations made

Inference In BNInference In BN

?TT

P(B|…)MJ

Page 21: Bayesian Networks

Inference PatternsInference PatternsBurglary Earthquake

Alarm

MaryCallsJohnCalls

Diagnostic

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Causal

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Intercausal

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Mixed

• Basic use of a BN: Given newobservations, compute the newstrengths of some (or all) beliefs

• Other use: Given the strength ofa belief, which observation shouldwe gather to make the greatestchange in this belief’s strength

Page 22: Bayesian Networks

Types Of Nodes On A PathTypes Of Nodes On A Path

Radio

Battery

SparkPlugs

Starts

Gas

Moves

linear

converging

diverging

Page 23: Bayesian Networks

Independence Relations In Independence Relations In BNBN

Radio

Battery

SparkPlugs

Starts

Gas

Moves

linear

converging

diverging

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds:1. A node on the path is linear and in E2. A node on the path is diverging and in E3. A node on the path is converging and neither this node, nor any descendant is in E

Page 24: Bayesian Networks

Independence Relations In Independence Relations In BNBN

Radio

Battery

SparkPlugs

Starts

Gas

Moves

linear

converging

diverging

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds:1. A node on the path is linear and in E2. A node on the path is diverging and in E3. A node on the path is converging and neither this node, nor any descendant is in E

Gas and Radio are independent given evidence on SparkPlugs

Page 25: Bayesian Networks

Independence Relations In Independence Relations In BNBN

Radio

Battery

SparkPlugs

Starts

Gas

Moves

linear

converging

diverging

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds:1. A node on the path is linear and in E2. A node on the path is diverging and in E3. A node on the path is converging and neither this node, nor any descendant is in E

Gas and Radio are independent given evidence on Battery

Page 26: Bayesian Networks

Independence Relations In Independence Relations In BNBN

Radio

Battery

SparkPlugs

Starts

Gas

Moves

linear

converging

diverging

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds:1. A node on the path is linear and in E2. A node on the path is diverging and in E3. A node on the path is converging and neither this node, nor any descendant is in E

Gas and Radio are independent given no evidence, but they aredependent given evidence on

Starts or Moves

Page 27: Bayesian Networks

BN Inference

Simplest Case:

BA

P(B) = P(a)P(B|a) + P(~a)P(B|~a)

A

)A|B(P)A(P)B(P

BA C

P(C) = ???

Page 28: Bayesian Networks

BN Inference

Chain:

X2X1 Xn…

What is time complexity to compute P(Xn)?

What is time complexity if we computed the full joint?

Page 29: Bayesian Networks

Inference Ex. 2Inference Ex. 2

RainSprinkler

Cloudy

WetGrass

C,S,R

)c(P)c|s(P)c|r(P)s,r|w(P)W(P

S,R C

)c(P)c|s(P)c|r(P)s,r|w(P

S,R

C )s,r(f)s,r|w(P )S,R(fC

Algorithm is computing not individualprobabilities, but entire tables

•Two ideas crucial to avoiding exponential blowup:• because of the structure of the BN, somesubexpression in the joint depend only on a small numberof variable•By computing them once and caching the result, wecan avoid generating them exponentially many times

Page 30: Bayesian Networks

Variable Elimination

General idea:Write query in the form

Iteratively Move all irrelevant terms outside of innermost

sum Perform innermost sum, getting a new term Insert the new term into the product

kx x x i

iin paxPXP3 2

)|(),( e

Page 31: Bayesian Networks

A More Complex Example

Visit to Asia

Smoking

Lung CancerTuberculosis

Abnormalityin Chest

Bronchitis

X-Ray Dyspnea

“Asia” network:

Page 32: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: v,s,x,t,l,a,b

Initial factors

Page 33: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: v,s,x,t,l,a,b

Initial factors

Eliminate: v

Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term

Compute: v

v vtPvPtf )|()()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

Page 34: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: s,x,t,l,a,b

Initial factors

Eliminate: s

Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables

Compute: s

s slPsbPsPlbf )|()|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

Page 35: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: x,t,l,a,b

Initial factors

Eliminate: x

Note: fx(a) = 1 for all values of a !!

Compute: x

x axPaf )|()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

Page 36: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: t,l,a,b

Initial factors

Eliminate: t

Compute: t

vt ltaPtflaf ),|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

),|(),()(),( badPlafaflbf txs

Page 37: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: l,a,b

Initial factors

Eliminate: l

Compute: l

tsl laflbfbaf ),(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

),|(),()(),( badPlafaflbf txs

),|()(),( badPafbaf xl

Page 38: Bayesian Networks

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

We want to compute P(d)Need to eliminate: b

Initial factors

Eliminate: a,bCompute:

b

aba

xla dbfdfbadpafbafdbf ),()(),|()(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

),|()(),( badPafbaf xl),|(),()(),( badPlafaflbf txs

)(),( dfdbf ba

Page 39: Bayesian Networks

Variable Elimination

We now understand variable elimination as a sequence of rewriting operations

Actual computation is done in elimination step

Computation depends on order of elimination

Page 40: Bayesian Networks

Dealing with evidence

How do we deal with evidence?

Suppose get evidence V = t, S = f, D = tWe want to compute P(L, V = t, S = f, D = t)

V S

LT

A B

X D

Page 41: Bayesian Networks

Dealing with Evidence

We start by writing the factors:

Since we know that V = t, we don’t need to eliminate VInstead, we can replace the factors P(V) and P(T|V) with

These “select” the appropriate parts of the original factors given the evidenceNote that fp(V) is a constant, and thus does not appear in elimination of other variables

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

V S

LT

A B

X D

)|()()( )|()( tVTPTftVPf VTpVP

Page 42: Bayesian Networks

Dealing with Evidence Given evidence V = t, S = f, D = tCompute P(L, V = t, S = f, D = t )Initial factors, after setting evidence:

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

Page 43: Bayesian Networks

Given evidence V = t, S = f, D = tCompute P(L, V = t, S = f, D = t )Initial factors, after setting evidence:

Eliminating x, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

Dealing with Evidence

Page 44: Bayesian Networks

Dealing with Evidence Given evidence V = t, S = f, D = tCompute P(L, V = t, S = f, D = t )Initial factors, after setting evidence:

Eliminating x, we get

Eliminating t, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

Page 45: Bayesian Networks

Dealing with Evidence Given evidence V = t, S = f, D = tCompute P(L, V = t, S = f, D = t )Initial factors, after setting evidence:

Eliminating x, we get

Eliminating t, we get

Eliminating a, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

),()()( )|()|()()( lbfbflfff asbPslPsPvP

Page 46: Bayesian Networks

Dealing with Evidence Given evidence V = t, S = f, D = tCompute P(L, V = t, S = f, D = t )Initial factors, after setting evidence:

Eliminating x, we get

Eliminating t, we get

Eliminating a, we get

Eliminating b, we get

),()|(),|()()()( ),|()|()|()|()()( bafaxPltaPbflftfff badPsbPslPvtPsPvP

V S

LT

A B

X D

),()(),|()()()( ),|()|()|()|()()( bafafltaPbflftfff badPxsbPslPvtPsPvP

),()(),()()( ),|()|()|()()( bafaflafbflfff badPxtsbPslPsPvP

),()()( )|()|()()( lbfbflfff asbPslPsPvP

)()()|()()( lflfff bslPsPvP

Page 47: Bayesian Networks

Variable Elimination Algorithm

Let X1,…, Xm be an ordering on the non-query variables

For I = m, …, 1 Leave in the summation for Xi only factors mentioning Xi Multiply the factors, getting a factor that contains a

number for each value of the variables mentioned, including Xi

Sum out Xi, getting a factor f that contains a number for each value of the variables mentioned, not including Xi

Replace the multiplied factor in the summation

j

jjX XX

))X(Parents|X(P...1 m2

Page 48: Bayesian Networks

x

kxkx yyxfyyf ),,,('),,( 11

m

ilikx i

yyxfyyxf1

,1,1,11 ),,(),,,('

Complexity of variable elimination

Suppose in one elimination step we compute

This requires multiplications

For each value for x, y1, …, yk, we do m multiplications

additions

For each value of y1, …, yk , we do |Val(X)| additions

Complexity is exponential in number of variables in the intermediate factor!

i

iYXm )Val()Val(

i

iYX )Val()Val(

Page 49: Bayesian Networks

Understanding Variable Elimination

We want to select “good” elimination orderings that reduce complexity

This can be done be examining a graph theoretic property of the “induced” graph; we will not cover this in class.

This reduces the problem of finding good ordering to graph-theoretic operation that is well-understood—unfortunately computing it is NP-hard!

Page 50: Bayesian Networks

Exercise: Variable elimination

smart study

prepared fair

pass

p(smart)=.8

p(study)=.6

p(fair)=.9

Query: What is the probability that a student is smart, given that they pass the exam?

Sm

St

P(Pr|…)

TTFF

TFTF

.9

.5

.7

.1

Sm Pr F P(Pa|…)

TTTTFFFF

TTFFTTFF

TFTFTFTF

.9

.1

.7

.1

.7

.1

.2

.1

Page 51: Bayesian Networks

Approaches to inference

Exact inference Inference in Simple Chains Variable elimination Clustering / join tree algorithms

Approximate inference Stochastic simulation / sampling

methods Markov chain Monte Carlo methods

Page 52: Bayesian Networks

Stochastic simulation - direct

Suppose you are given values for some subset of the variables, G, and want to infer values for unknown variables, URandomly generate a very large number of instantiations from the BN

Generate instantiations for all variables – start at root variables and work your way “forward”

Rejection Sampling: keep those instantiations that are consistent with the values for GUse the frequency of values for U to get estimated probabilitiesAccuracy of the results depends on the size of the sample (asymptotically approaches exact results)

Page 53: Bayesian Networks

Direct Stochastic Direct Stochastic SimulationSimulation

RainSprinkler

Cloudy

WetGrass1. Repeat N times: 1.1. Guess Cloudy at random 1.2. For each guess of Cloudy, guess Sprinkler and Rain, then WetGrass

2. Compute the ratio of the # runs where WetGrass and Cloudy are True over the # runs where Cloudy is True

P(WetGrass|Cloudy)?

P(WetGrass|Cloudy) = P(WetGrass Cloudy) / P(Cloudy)

Page 54: Bayesian Networks

Exercise: Direct sampling

smart study

prepared fair

pass

p(smart)=.8

p(study)=.6

p(fair)=.9

Topological order = …?Random number generator: .35, .76, .51, .44, .08, .28, .03, .92, .02, .42

Sm

St

P(Pr|…)

TTFF

TFTF

.9

.5

.7

.1

Sm Pr F P(Pa|…)

TTTTFFFF

TTFFTTFF

TFTFTFTF

.9

.1

.7

.1

.7

.1

.2

.1

Page 55: Bayesian Networks

Likelihood weighting

Idea: Don’t generate samples that need to be rejected in the first place!Sample only from the unknown variables ZWeight each sample according to the likelihood that it would occur, given the evidence E

Page 56: Bayesian Networks

Learning Bayesian Learning Bayesian NetworksNetworks

Page 57: Bayesian Networks

Learning Bayesian networks

InducerInducerInducerInducerData + Prior information

E

R

B

A

C .9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

Page 58: Bayesian Networks

Known Structure -- Complete Data

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

InducerInducerInducerInducer

E B

A.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B) E B

A

Network structure is specified Inducer needs to estimate parameters

Data does not contain missing values

Page 59: Bayesian Networks

Unknown Structure -- Complete Data

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

InducerInducerInducerInducer

E B

A.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B) E B

A

Network structure is not specified Inducer needs to select arcs & estimate parameters

Data does not contain missing values

Page 60: Bayesian Networks

Known Structure -- Incomplete Data

InducerInducerInducerInducer

E B

A.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B) E B

A

Network structure is specifiedData contains missing values

We consider assignments to missing values

E, B, A<Y,N,N><Y,?,Y><N,N,Y><N,Y,?> . .<?,Y,Y>

Page 61: Bayesian Networks

Known Structure / Complete Data

Given a network structure G And choice of parametric family for P(Xi|Pai)

Learn parameters for network

GoalConstruct a network that is “closest” to probability that generated the data

Page 62: Bayesian Networks

Learning Parameters for a Bayesian Network

E B

A

C

][][][][

]1[]1[]1[]1[

MCMAMBME

CABE

D

Training data has the form:

Page 63: Bayesian Networks

Unknown Structure -- Complete Data

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

InducerInducerInducerInducer

E B

A.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B) E B

A

Network structure is not specified Inducer needs to select arcs & estimate parameters

Data does not contain missing values

Page 64: Bayesian Networks

Benefits of Learning Structure

Discover structural properties of the domain Ordering of events Relevance

Identifying independencies faster inferencePredict effect of actions Involves learning causal relationship among

variables

Page 65: Bayesian Networks

Why Struggle for Accurate Structure?

Increases the number of parameters to be fittedWrong assumptions about causality and domain structure

Cannot be compensated by accurate fitting of parametersAlso misses causality and domain structure

Earthquake Alarm Set

Sound

BurglaryEarthquake Alarm Set

Sound

Burglary

Earthquake Alarm Set

Sound

Burglary

Adding an arc Missing an arc

Page 66: Bayesian Networks

Approaches to Learning Structure

Constraint based Perform tests of conditional independence Search for a network that is consistent with the

observed dependencies and independencies

Pros & Cons Intuitive, follows closely the construction of BNs Separates structure learning from the form of

the independence tests Sensitive to errors in individual tests

Page 67: Bayesian Networks

Approaches to Learning StructureScore based Define a score that evaluates how well the

(in)dependencies in a structure match the observations

Search for a structure that maximizes the score

Pros & Cons Statistically motivated Can make compromises Takes the structure of conditional probabilities

into account Computationally hard

Page 68: Bayesian Networks

Heuristic SearchDefine a search space: nodes are possible structures edges denote adjacency of structures

Traverse this space looking for high-scoring structures

Search techniques: Greedy hill-climbing Best first search Simulated Annealing ...

Page 69: Bayesian Networks

Heuristic Search (cont.)

Typical operations: S C

E

D

S C

E

D

Reverse C EDelete C

E

Add C

D

S C

E

D

S C

E

D

Page 70: Bayesian Networks

Exploiting Decomposability in Local Search

Caching: To update the score of after a local change, we only need to re-score the families that were changed in the last move

S C

E

D

S C

E

D

S C

E

D

S C

E

D

Page 71: Bayesian Networks

Greedy Hill-ClimbingSimplest heuristic local search Start with a given network

empty network best tree a random network

At each iteration Evaluate all possible changes Apply change that leads to best improvement in score Reiterate

Stop when no modification improves score

Each step requires evaluating approximately n new changes

Page 72: Bayesian Networks

Greedy Hill-Climbing: Possible Pitfalls

Greedy Hill-Climbing can get struck in: Local Maxima:

All one-edge changes reduce the score Plateaus:

Some one-edge changes leave the score unchanged Happens because equivalent networks received the

same score and are neighbors in the search space

Both occur during structure searchStandard heuristics can escape both

Random restarts TABU search

Page 73: Bayesian Networks

SummarySummary

Belief updateRole of conditional independenceBelief networksCausality orderingInference in BNStochastic SimulationLearning BNs

Page 74: Bayesian Networks

A Bayesian NetworkThe “ICU alarm” network 37 variables, 509 parameters (instead of 237)

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2

PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP