introduction to bayesian networks. representation: what is the joint probability distribution p(b,...
TRANSCRIPT
Introduction to Bayesian Networks
Representation: What is the joint probability distribution
P(B, E, R, A, N) over five binary variables?
• How many states in total?
• How many free parameters do we need?
• How many table entries should we sum over to get P(R=r)?
Joint Probability DistributionJoint Probability Distribution
Joint Probability DistributionJoint Probability Distribution
• Full joint distribution is sufficient to represent the complete domain and to do any type of probabilistic inferences
• Problems: n - number of random variables, d – number of values
– Space complexity. A full joint distribution requires to remember O(dn) numbers
– Inference complexity. Computing queries requires O(dn) steps
– Acquisition problem. Who is going to define all the probabilities?
Joint Probability DistributionJoint Probability Distribution
),,,,( NAREBP
Joint Probability DistributionJoint Probability Distribution
),,,|(),,|(),|()|()(),,,,( AREBNPREBAPEBRPBEPBPNAREBP
Joint Probability DistributionJoint Probability Distribution
VS
Bayesian networksBayesian networks
• A Bayesian Network is a graph in which:– A set of random variables makes up the nodes in the network.
– A set of directed links or arrows connects pairs of nodes.
– Each node has a conditional probability table that quantifies the effects the parents have on the node.
– Directed, acyclic graph (DAG), i.e. no directed cycles.
A Bayesian network modeling the joint probability distribution P(B, E, R, A, N)
P(B) P(E)
P(R|E)P(A|B,E)
P(N|A)
Conditional probability tables (CPT):
Bayesian NetworksBayesian Networks
P(B) P(E)
P(N|A)
P(R|E)P(A|B,E)
Independences in BNsIndependences in BNs
• Three basic independence structures:
Independences in BNsIndependences in BNs
• Indirect cause: Burglary is independent of NeighborCall given Alarm
– P(N|A, B) = P(N|A)
– P(N, B|A) = P(N|A)P(B|A)
Independences in BNsIndependences in BNs
• Common cause: Alarm is independent of RadioAnnounce given Earthquake
– P(A|E, R) = P(A|E)
– P(A, R|E) = P(A|E)P(R|E)
Independences in BNsIndependences in BNs
• Common effect: – Burglary is independent of Earthquake when Alarm is not known
– Burglary and Earthquake become dependent given Alarm!!!
Path blockingPath blocking
• With linear substructure
• With common cause structure
• With common effect structure
X YC in Z
X YC in Z
X Y
C or any of its descendants not in Z
Independences in BNsIndependences in BNs
• Earthquake ┴ Burglary | NeighborCall?• Burglary ┴ NeighborCall ?• Burglary ┴ RadioAnnounce | Earthquake?• Burglary ┴ RadioAnnounce | NeighborCall?
Markov AssumptionMarkov Assumption
• Each variable is independent on its non-descendants, given its parents in Bayesian networks
• N ┴ B, E, R | A• R ┴ B, A, N | E• …
Full Joint Distribution in BNsFull Joint Distribution in BNs
),,,,( NRAEBP
)()(),|()|()|( BPEPEBAPERPANP
Chain RuleChain Rule
n
iiin XPaXPXXXP
121 ))(|(),...,,(
Types of Inference TasksTypes of Inference Tasks
Compute the probability or most likely state of a set of query variables given observed values of some other variables
• Belief updating
• Most probable explanation (MPE) queries
• Maximum A Posteriori (MAP) queries
Inference DirectionInference Direction
• Types of inference by reasoning direction
ExamplesExamples
• Diagnostic inferences: from effect to causes.
• Causal Inferences: from causes to effects.
• Intercausal Inferences:
• Mixed Inference:
Influence diagramInfluence diagram
• Add action nodes and utility nodes to Bayesian networks to enable rational decision making
• Algorithm:For each value of action node
compute expected value of utility node given action, evidenceReturn MEU action
• Idea: compute value of acquiring each possible piece of evidence– Can be done directly from influence diagram
• Example: buying oil drilling rightsTwo blocks A and B, exactly one has oil, worth kPrior probabilities 0.5 each, mutually exclusive
Current price of each block is k/2“Consultant” offers accurate survey of A. Fair price?
• Solution: compute expected value of information= expected value of best action given the information
minus expected value of best action without information
• Survey may say “oil in A” or “no oil in A”, prob. 0.5 each (given!)
= [0.5 * value of “buy A” given “oil in A”+ 0.5 * value of “buy B” given “no oil in A”]- 0
= (0.5 * k/2) + (0.5 * k/2) - 0 = k/2
Value of informationValue of information
Oil
Survey
Buy
U