oregon state university – cs430 intro to ai (c) 2003 thomas g. dietterich1 probabilistic...
Post on 15-Jan-2016
215 views
TRANSCRIPT
(c) 2003 Thomas G. Dietterich 1
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Probabilistic Representation and
Reasoning
Given a set of facts/beliefs/rules/evidence Evaluate a given statement
Determine the probability of a statement Find a statement that optimizes a set of
constraintsMost probable explanation (MPE) (Setting of
hidden variables that best explains observations.)
(c) 2003 Thomas G. Dietterich 2
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Probability Theory
Random Variables Boolean: W1,2 (just like propositional logic). Two
possible values {true, false} Discrete: Weather 2 {sunny, cloudy, rainy, snow} Continuous: Temperature 2 <
Propositions W1,2 = true, Weather = sunny, Temperature = 65 These can be combined as in propositional logic
(c) 2003 Thomas G. Dietterich 3
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Example
Consider a car described by 3 random variables: Gas 2 {true, false}: There is gas in the tank Meter 2 {empty,full}: The gas gauge shows
the tank is empty or full Starts 2 {yes,no}: The car starts when you
turn the key in the ignition
(c) 2003 Thomas G. Dietterich 4
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Joint Probability Distribution
Gas Meter Starts P(G,M,S)
false empty no 0.1386
false empty yes 0.0014
false full no 0.0594
false full yes 0.0006
true empty no 0.0240
true empty yes 0.0560
true full no 0.2160
true full yes 0.5040
Each row is called a “primitive event”
Rows are mutually exclusive and exhaustive
Corresponds to an “8-sided coin” with the indicated probabilities
(c) 2003 Thomas G. Dietterich 5
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Any Query Can Be Answered from
the Joint Distribution
Gas Meter Starts P(G,M,S)
false empty no 0.1386
false empty yes 0.0014
false full no 0.0594
false full yes 0.0006
true empty no 0.0240
true empty yes 0.0560
true full no 0.2160
true full yes 0.5040
P(Gas = false Æ Meter = full Æ Starts = yes) = 0.0006
P(Gas = false) = 0.2, this is the sum of all cells where Gas = false
In general: To compute P(Q), for any proposition Q, add up the probability in all cells where Q is true
(c) 2003 Thomas G. Dietterich 6
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Notations
P(G,M,S) denotes the entire joint distribution (In the book, P is boldface). It is a table or function that maps from G, M, and S to a probability.
P(true,empty,no) denotes a single probability value:
P(Gas=true Æ Meter=empty Æ Starts=no)
(c) 2003 Thomas G. Dietterich 7
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Operations on Probability Tables
(1)
Marginalization (“summing away”) M,S P(G,M,S) = P(G) P(G) is called a
“marginal probability” distribution. It consists of two probabilities:
Gas P(G)
false
P(false,empty,yes) +
P(false,empty,no) +
P(false,full,yes) +
P(false,full,no)
true
P(true,empty,yes) +
P(true,empty,no) +
P(true,full,yes) +
P(true,full,no)
(c) 2003 Thomas G. Dietterich 8
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Conditional Probability
Suppose we observe that M=full. What is the probability that the car will start? P(S=yes | M=full)
Definition: P(A|B) = P(A Æ B) / P(B)
(c) 2003 Thomas G. Dietterich 9
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Conditional Probability
Gas Meter Starts P(G,M,S)
false empty no 0.1386
false empty yes 0.0014
false full no 0.0594
false full yes 0.0006
true empty no 0.0240
true empty yes 0.0560
true full no 0.2160
true full yes 0.5040
Select cells that match the condition (M=full)
Delete remaining cells and M column
Renormalize the table to obtain P(S,G|M=full)
Sum away Gas: G P(S,G | M=full) = P(S|M=full)
Read answer from P(S=yes | M=full) cell
Gas Meter Starts P(G,M,S)
false empty no 0.1386
false empty yes 0.0014
false full no 0.0594
false full yes 0.0006
true empty no 0.0240
true empty yes 0.0560
true full no 0.2160
true full yes 0.5040
Gas Starts P(G,S|M=full)
false no 0.0594
false yes 0.0006
true no 0.2160
true yes 0.5040
Starts P(S|M=full)
no 0.3531
yes 0.6469
Gas Starts P(G,S|M=full)
false no 0.0762
false yes 0.0008
true no 0.2769
true yes 0.6462
Gas Meter Starts P(G,M,S)
false full no 0.0594
false full yes 0.0006
true full no 0.2160
true full yes 0.5040
Gas Meter Starts P(G,M,S)
false full no 0.0594
false full yes 0.0006
true full no 0.2160
true full yes 0.5040
(c) 2003 Thomas G. Dietterich 10
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Operations on Probability Tables (2):
Conditionalizing
Construct P(G,S | M) by normalizing the subtable corresponding to M=full and normalizing the subtable corresponding to M=empty
Gas Meter Starts P(G,M,S)
false empty no 0.1386
false empty yes 0.0014
false full no 0.0594
false full yes 0.0006
true empty no 0.0240
true empty yes 0.0560
true full no 0.2160
true full yes 0.5040
0.0762
0.0008
0.2769
0.6462
0.6300
0.0064
0.1091
0.2545
(c) 2003 Thomas G. Dietterich 11
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Chain Rule of Probability
P(A,B,C) = P(A|B,C) ¢ P(B|C) ¢ P(C) Proof:
(c) 2003 Thomas G. Dietterich 12
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Chain Rule (2)
Holds for distributions too:
P(A,B,C) = P(A | B,C) ¢ P(B | C) ¢ P(C)
This means that for each setting of A,B, and C, we can substitute into the equation, and it is true.
(c) 2003 Thomas G. Dietterich 13
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Belief Networks (1):
Independence
Defn: Two random variables X and Y are independent iff
P(X,Y) = P(X) ¢ P(Y) Example:
X is a coin with P(X=heads) = 0.4 Y is a coin with P(Y=heads) = 0.8 Joint distribution:
P(X,Y) X=heads X=tails
Y=heads 0.32 0.48
Y=tails 0.08 0.12
(c) 2003 Thomas G. Dietterich 14
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Belief Networks (2)
Conditional Independence
Defn: Two random variables X and Y are conditionally independent given Z iffP(X,Y | Z) = P(X|Z) ¢ P(Y|Z)
Example:P(S,M | G) = P(S | G) ¢ P(M | G)Intuition: G independently causes S and M
S M G=no G=yes
no empty 0.693 0.030
no full 0.297 0.270
yes empy 0.007 0.070
yes full 0.003 0.630
S G=no G=yes
no 0.99 0.30
yes 0.01 0.70
M G=no G=yes
empty 0.70 0.10
full 0.30 0.90
(c) 2003 Thomas G. Dietterich 15
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Operations on Probability Tables (3):
Conformal Product
Allocate space for resulting table and then fill in each cell with the product of the corresponding cells:
P(S,M | G) = P(S | G) ¢ P(M | G)
S M G=no G=yes
no empty a00¢b00 a01¢b01
no full a00¢b10 a01¢b11
yes empy a10¢b00 a11¢b01
yes full a10¢b10 a11¢b11
S G=no G=yes
no a00 a01
yes a10 a11
M G=no G=yes
empty b00 b01
full b10 b11= ¢
(c) 2003 Thomas G. Dietterich 16
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Properties of Conformal Products
Commutative Associative Work on normalized or unnormalized
tables Work on joint or conditional tables
(c) 2003 Thomas G. Dietterich 17
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Conditional Independence Allows
Us to Simplify the Joint Distribution
P(G,M,S) = P(M,S | G) ¢ P(G) [chain rule]
= P(M | G) ¢ P(S | G) ¢ P(G) [CI]
Bayesian Networks
One node for each random variable Each node stores a probability
distribution P(node | parents(node)) Only direct dependencies are shown Joint distribution is conformal product of
node distributions:P(G,M,S) = P(G) ¢ P(M | G) ¢ P(S | G)
Gas
StartsMeter
(c) 2003 Thomas G. Dietterich 19
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference in Bayesian Networks
Suppose we observe that M=full. What is the probability that the car will start? P(S=yes | M=full)
Before, we handled this by the following steps: Remove all rows corresponding to M=empty Normalize remaining rows to get P(S,G|M=full) Sum over G: G P(S,G|M=full) = P(S | M=full) Read answer from the S=yes entry in the table
We want to get the same result, but without constructing the joint distribution first.
(c) 2003 Thomas G. Dietterich 20
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference in Bayesian Networks (2)
Remove all rows corresponding to M=empty from all nodes P(G) – unchanged P(M | G) becomes P[G] P(S | G) – unchanged
Sum over G: G P(G) ¢ P[G] ¢ P(S | G)
Normalize to get P(S|M=full) Read answer from the S=yes entry in the table
(c) 2003 Thomas G. Dietterich 21
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
P(M|G) G=no G=yes
empty 0.70 0.10
full 0.30 0.90
P(S|G) G=false G=true
no 0.99 0.30
yes 0.01 0.70
G P(G)
false 0.20
true 0.80
G
(c) 2003 Thomas G. Dietterich 22
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
P(S|G) G=false G=true
no 0.99 0.30
yes 0.01 0.70
G P(G)
false 0.20
true 0.80
Step 1: Delete M=empty rows from all tables
G
G=false G=true
0.30 0.90
(c) 2003 Thomas G. Dietterich 23
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
P(S|G) G=false G=true
no 0.99 0.30
yes 0.01 0.70
G P(G)
false 0.20
true 0.80
Step 1: Delete M=empty rows from all tables
G
G=false G=true
0.30 0.90
Step 2: Perform algebra to push summation inwards (no-op in this case)
(c) 2003 Thomas G. Dietterich 24
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
Step 1: Delete M=empty rows from all tables
G
Step 2: Perform algebra to push summation inwards (no-op in this case)
Step 3: Form conformal product
S G=false G=true
no 0.0594 0.2160
yes 0.0006 0.5040
(c) 2003 Thomas G. Dietterich 25
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards (no-op in this case)
Step 3: Form conformal product
Step 4: Sum away G
S
no 0.2754
yes 0.5046
(c) 2003 Thomas G. Dietterich 26
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards (no-op in this case)
Step 3: Form conformal product
Step 4: Sum away G
Step 5: Normalize
S
no 0.3531
yes 0.6469
(c) 2003 Thomas G. Dietterich 27
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Inference with Tables
Step 1: Delete M=empty rows from all tables
Step 2: Perform algebra to push summation inwards (no-op in this case)
Step 3: Form conformal product
Step 4: Sum away G
Step 5: Normalize
S
no 0.3531
yes 0.6469
Step 6: Read answer from table: 0.6469
(c) 2003 Thomas G. Dietterich 28
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Notes
We never created the joint distribution Deleting the M=empty rows from the
individual table followed by conformal product has the same effect as performing the conformal product first and then deleting the M=empty rows
Normalization can be postponed to the end
(c) 2003 Thomas G. Dietterich 29
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Another Example: “Asia”
(all variables Boolean)
Suppose we observe Sneeze What is P(Cold | Sneeze) = P(Co|S)?
Cat
ScratchSneeze
AllergyCold
(c) 2003 Thomas G. Dietterich 30
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Answering the query
Joint distribution:A,Ca,Sc P(Co) ¢ P(A) ¢ P(Sn | Co,A) ¢ P(Ca) ¢ P(A | Ca) ¢ P(Sc | Ca)
Apply evidence: sn (Sneeze = true)A,Ca,Sc P(Co) ¢ P(A) ¢ P[Co,A]¢ P(Ca) ¢ P(A | Ca) ¢ P(Sc | Ca)
Push summations in as far as possible: P(Co) ¢ A P(A) ¢ P[Co,A] ¢ Ca P(A | Ca) ¢ P(Ca) ¢ Sc P(Sc | Ca)
Evaluate: P(Co) ¢ A P(A) ¢ P[Co,A] ¢ Ca P(A | Ca) ¢ P(Ca) ¢ P[Ca]
P(Co) ¢ A P(A) ¢ P[Co,A] ¢ P[A]P(Co) ¢ P[Co]P[Co]
Normalize and extract answer
(c) 2003 Thomas G. Dietterich 31
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Pruning Leaves
Leaf nodes not involved in the evidence or the query can be pruned. Example: Scratch
Cat
ScratchSneeze
AllergyCold
Evidence
Query
(c) 2003 Thomas G. Dietterich 32
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Greedy algorithm for choosing the
elimination order
nodes = set of tables (after evidence)V = variables to sum overwhile |nodes| > 1 do
Generate all pairs of tables in nodes that share at least one variableCompute size of table that would result from conformal product of each pair (summing over as many variables in V as possible)Let (T1,T2) be the pair with smallest resulting sizeDelete T1 and T2 from nodesAdd conformal product V T1¢T2 to nodes
end
(c) 2003 Thomas G. Dietterich 33
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Example of Greedy Algorithm
Given tables P(Co), P[Co,A], P(A|Ca), P(Ca) Variables to sum: A, Ca
Choose: P[A] = Ca P(A|Ca) ¢ P(Ca)
candidate pair size of product sum over size of result
P(Co) ¢ P[Co,A] 2 nil 2
P[Co,A] ¢ P(A|Ca) 3 A 2
P(A|Ca) ¢ P(Ca) 2 Ca 1
(c) 2003 Thomas G. Dietterich 34
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Example of Greedy Algorithm (2)
Given tables P(Co), P[Co,A], P[A] Variables to sum: A
Choose: P[Co] = A P[Co,A] ¢ P[A]
candidate pair size of product sum over size of result
P(Co) ¢ P[Co,A] 2 nil 2
P[Co,A] ¢ P[A] 2 A 2
(c) 2003 Thomas G. Dietterich 35
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Example of Greedy Algorithm (3)
Given tables P(Co), P[Co] Variables to sum: none
Choose: P2[Co] = P(Co) ¢ P[Co]
Normalize and extract answer
candidate pair size of product sum over size of result
P(Co) ¢ P[Co] 1 nil 1
(c) 2003 Thomas G. Dietterich 36
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Bayesian Network For WUMPUS
P(P1,1,P1,2, …, P4,4, B1,1, B1,2, …, B4,4) 1,4 2,4 3,4 4,4
1,3 2,3 3,3 4,3
1,2 2,2 3,2 4,2
1,1 2,1 3,1 4,1
P2,2P1,1 P1,2 P2,1
…
B1,1 B1,2 B2,1 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1
P1,3 P3,1 P1,4
B2,4…
(c) 2003 Thomas G. Dietterich 37
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Probabilistic Inference in
WUMPUS
Suppose we have observed No breeze in 1,1 Breeze in 1,2 and 2,1 No pit in 1,1, 1,2, and 1,3
What is the probability of a pit in 1,3?
P(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1)
1,3
????
1,2
B
OK
2,2
1,1
OK
2,1
B
OK
3,1
(c) 2003 Thomas G. Dietterich 38
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI What is
P(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1)?
P2,2P1,1 P1,2 P2,1
…
B1,1 B1,2 B2,1 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1
P1,3 P3,1 P1,4
B2,4…
false true true
false false false query
(c) 2003 Thomas G. Dietterich 39
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Prune Leaves Not Involved in Query or
Evidence
P2,2P1,1 P1,2 P2,1
…
B1,1 B1,2 B2,1 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1
P1,3 P3,1 P1,4
B2,4…
false true true
false false false query
(c) 2003 Thomas G. Dietterich 40
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Prune Independent Nodes
P2,2P1,1 P1,2 P2,1
…
B1,1 B1,2 B2,1
P1,3 P3,1 P1,4
false true true
false false false query
(c) 2003 Thomas G. Dietterich 41
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Solve Remaining Network
P2,2P1,1 P1,2 P2,1
B1,1 B1,2 B2,1
P1,3 P3,1
false true true
false false false query
P2,2,P3,1 P(B1,1|P1,1,P1,2,P2,1) ¢ P(B1,2|P1,1,P1,2,P1,3) ¢ P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢ P(P1,1) ¢ P(P1,2) ¢ P(P2,1) ¢ P(P2,2) ¢ P(P1,3) ¢ P(P3,1)
(c) 2003 Thomas G. Dietterich 42
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Performing the Inference
NORM{ P2,2,P3,1 P(B1,1|P1,1,P1,2,P2,1) ¢ P(B1,2|P1,1,P1,2,P1,3) ¢ P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢ P(P1,1) ¢ P(P1,2) ¢ P(P2,1) ¢ P(P2,2) ¢ P(P1,3) ¢ P(P3,1) }
NORM{ P2,2,P3,1 P[P1,3] ¢ P[P2,2,P3,1] ¢ P(P2,2) ¢ P(P1,3) ¢ P(P3,1) }
NORM{ P[P1,3] ¢ P(P1,3) ¢ P2,2 P(P2,2) ¢ P3,1 P[P2,2,P3,1] ¢ P(P3,1) }
We have reduced the inference to a simple computation over 2x2 tables.
P(P1,3) = h0.69, 0.31i 31% chance of WUMPUS!
(c) 2003 Thomas G. Dietterich 43
Ore
gon
Sta
te U
nive
rsit
y –
CS
430
Intr
o to
AI Summary
The Joint Distribution is analogous to the truth table for propositional logic. It exponentially large, but any query can be answered using it
Conditional independence allows us to factor the joint distribution using conformal products
Conditional independence relationships are conveniently visualized and encoded in a belief network DAG
Given evidence, we can reason efficiently by algebraic manipulation of the factored representation