csci 5582 fall 2006 csci 5582 artificial intelligence lecture 15 jim martin
Post on 21-Dec-2015
222 views
TRANSCRIPT
![Page 1: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/1.jpg)
CSCI 5582 Fall 2006
CSCI 5582Artificial
IntelligenceLecture 15Jim Martin
![Page 2: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/2.jpg)
CSCI 5582 Fall 2006
Today 10/19
• Review• Belief Net Computing• Sequential Belief Nets
![Page 3: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/3.jpg)
CSCI 5582 Fall 2006
Review
• Normalization• Belief Net Semantics
![Page 4: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/4.jpg)
CSCI 5582 Fall 2006
Normalization
• What do I know about
P(~A| something) and P(A|same something)
They sum to 1
![Page 5: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/5.jpg)
CSCI 5582 Fall 2006
Normalization
• What if I have this…P(A, Y)/P(Y) and P(~A, Y)/P(Y)
And I can compute the numerators but not the demoninator?
Ignore it and compute what you have, then normalize
P(A|Y) = P(A,Y)/(P(A,Y)+P(~A,Y))P(~A|Y) = P(~A,Y)/(P(A,Y)+P(~A,Y))
![Page 6: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/6.jpg)
CSCI 5582 Fall 2006
Normalization
• Alpha * <0.12, 0.08> = <0.6, 0.4>
![Page 7: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/7.jpg)
CSCI 5582 Fall 2006
Bayesian Belief Nets
• A compact notation for representing conditional independence assumptions and hence a compact way of representing a joint distribution.
• Syntax:– A directed acyclic graph, one node per variable
– Each node augmented with local conditional probability tables
![Page 8: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/8.jpg)
CSCI 5582 Fall 2006
Bayesian Belief Nets
• Nodes with no incoming arcs (root nodes) simply have priors associated with them
• Nodes with incoming arcs have tables enumerating the – P(Node|Conjunction of Parents)– Where parent means the node at the other end of the incoming arc
![Page 9: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/9.jpg)
CSCI 5582 Fall 2006
Bayesian Belief Nets: Semantics
• The full joint distribution for the N variables in a Belief Net can be recovered from the information in the tables.
∏==
N
iXiParentsXiPXNXP
1))(|(),...1(
![Page 10: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/10.jpg)
CSCI 5582 Fall 2006
Alarm Example
![Page 11: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/11.jpg)
CSCI 5582 Fall 2006
Alarm Example
• P(J^M^A^~B^~E)=
P(J|A)*P(M|A)*P(A|~B^~E)*P(~B)*P(~E)0.9 * 0.7 * .001 * .999 * .998
• In other words, the probability of atomic events can be read right off the network as the product of the probability of the entries for each variable
![Page 12: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/12.jpg)
CSCI 5582 Fall 2006
Events
• P(M ^J^E^B^A)+ P(M^J^E^B^~A)+ P(M^J^E^~B^A)+ …
![Page 13: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/13.jpg)
CSCI 5582 Fall 2006
Chain Rule Basis
P(B,E,A,J,M)P(M|B,E,A,J)P(B,E,A,J)
P(J|B,E,A)P(B,E,A)
P(A|B,E)P(B,E)
P(B|E)P(E)
![Page 14: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/14.jpg)
CSCI 5582 Fall 2006
Chain Rule Basis
• P(B,E,A,J,M)• P(M|B,E,A,J)P(J|B,E,A)P(A|B,E)P(B|E)P(E)
• P(M|A) P(J|A) P(A|B,E)P(B)P(E)
![Page 15: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/15.jpg)
CSCI 5582 Fall 2006
Alarm Example
![Page 16: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/16.jpg)
CSCI 5582 Fall 2006
Details
• Where do the graphs come from?– Initially, the intuitions of domain experts
• Where do the numbers come from?– Hopefully, from hard data– Sometimes from experts intuitions
• How can we compute things efficiently?– Exactly by not redoing things unnecessarily
– By approximating things
![Page 17: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/17.jpg)
CSCI 5582 Fall 2006
Computing with BBNs
• Normal scenario– You have a belief net consisting of a bunch of variables•Some of which you know to be true (evidence)
•Some of which you’re asking about (query)
•Some you haven’t specified (hidden)
![Page 18: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/18.jpg)
CSCI 5582 Fall 2006
Example
• Probability that there’s a burglary given that John and Mary are calling
• P(B|J,M)– B is the query variable– J and M are evidence variables– A and E are hidden variables
![Page 19: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/19.jpg)
CSCI 5582 Fall 2006
Example
• Probability that there’s a burglary given that John and Mary are calling
• P(B|J,M) = alpha P(B,J,M) = alpha * P(B,J,M,A,E) + P(B,J,M,~A,E)+ P(B,J,M,A,~E)+ P(B,J,M, ~A,~E)
![Page 20: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/20.jpg)
CSCI 5582 Fall 2006
From the Network
∑ ∑e aAMPAJPEBAPEPBP )|()|(),|()()(α
∑ ∑e aAMPAJPEBAPEPBP )|()|(),|()()(α
![Page 21: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/21.jpg)
CSCI 5582 Fall 2006
Expression Tree
![Page 22: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/22.jpg)
CSCI 5582 Fall 2006
Speedups
• Don’t recompute things.– Dynamic programming
• Don’t compute somethings at all– Ignore variables that can’t effect the outcome.
![Page 23: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/23.jpg)
CSCI 5582 Fall 2006
Example
• John calls given burglary
• P(J|B)
∑ ∑ ∑e a mAMPaJPEBAPEPBP )|()|(),|()()(α
![Page 24: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/24.jpg)
CSCI 5582 Fall 2006
Variable Elimination
• Every variable that is not an ancestor of a query variable or an evidence variable is irrelevant to the query– Operationally…
•You can eliminate leaf node that isn’t a query or evidence variable
•That may produce new leaves. Keep going.
![Page 25: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/25.jpg)
CSCI 5582 Fall 2006
Alarm Example
![Page 26: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/26.jpg)
CSCI 5582 Fall 2006
Break
• Questions?
![Page 27: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/27.jpg)
CSCI 5582 Fall 2006
Chain Rule Basis
P(B,E,A,J,M)P(M|B,E,A,J)P(B,E,A,J)
P(J|B,E,A)P(B,E,A)
P(A|B,E)P(B,E)
P(B|E)P(E)
![Page 28: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/28.jpg)
CSCI 5582 Fall 2006
Chain Rule
P(E1,E2,E3,E4,E5)P(E5|E1,E2,E3,E4)P(E1,E2,E3,E4)
P(E4|E1,E2,E3)P(E1,E2,E3)
P(E3|E1,E2)P(E1,E2)
P(E2|E1)P(E1)
![Page 29: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/29.jpg)
CSCI 5582 Fall 2006
Chain Rule
Rewriting that’s justP(E1)P(E2|E1)P(E3|E1,E2)P(E4|E1,E2,E3)P(E5|E1,E2,E3,E4)
The probability of a sequence of events is just the product of the conditional probability of each event given it’s predecessors (parents/causes in belief net terms).
![Page 30: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/30.jpg)
CSCI 5582 Fall 2006
Markov Assumption
• This is just a sequence based independence assumption just like with belief nets.– Not all the parents matter
•Remember P(toothache|catch, cavity)= P(toothache|cavity)
•Now P(Event_N|Event1 to Event_N-1)=P(Event_N|Event_N-1+K to
Event_N-1)
![Page 31: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/31.jpg)
CSCI 5582 Fall 2006
First Order Markov Assumption
P(E1)P(E2|E1)P(E3|E1,E2)P(E4|E1,E2,E3)P(E5|E1,E2,E3,E4)
P(E1)P(E2|E1)P(E3|E2)P(E4|E3)P(E5|E4)
![Page 32: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/32.jpg)
CSCI 5582 Fall 2006
Markov Models
• As with all our models, let’s assume some fixed inventory of possible events that can occur in time
• Let’s assume for now that any given point in time, all events are possible, although not equally likely
![Page 33: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/33.jpg)
CSCI 5582 Fall 2006
Markov Models
• You can view simple Markov assumptions as arising from underlying probabilistic state machines.
• In the simplest case (first order), events correspond to states and the probabilities are governed by probabilities on the transitions in the machine.
![Page 34: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/34.jpg)
CSCI 5582 Fall 2006
Weather
• Let’s say we’re tracking the weather and there are 4 possible events (each day, only one per day)– Sun, clouds, rain, snow
![Page 35: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/35.jpg)
CSCI 5582 Fall 2006
Example
Rain
Snow
Sun
Clouds
![Page 36: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/36.jpg)
CSCI 5582 Fall 2006
Belief Net Version
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Time
![Page 37: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/37.jpg)
CSCI 5582 Fall 2006
Example
• In this case we need a 4x4 matrix of transition probabilities.– For example P(Rain|Cloudy) or P(Sunny|Sunny) etc
• And we need a set of initial probabilities P(Rain). That’s just an array of 4 numbers.
![Page 38: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/38.jpg)
CSCI 5582 Fall 2006
Example
• So to get the probability of a sequence like – Rain rain rain snow– You just march through the state machine
– P(Rain)P(rain|rain)P(rain|rain)P(snow|rain)
![Page 39: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/39.jpg)
CSCI 5582 Fall 2006
Belief Net Version
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Time
![Page 40: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/40.jpg)
CSCI 5582 Fall 2006
Example
• Say that I tell you that– Rain rain rain snow has happened– How would you answer
•What’s the most likely thing to happen next?
![Page 41: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/41.jpg)
CSCI 5582 Fall 2006
Belief Net Version
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Rain
Snow
Sun
Clouds
Time
Rain
Snow
Sun
Clouds
Max
![Page 42: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/42.jpg)
CSCI 5582 Fall 2006
Weird Example
• What if you couldn’t actually see the weather?– You’re a security guard who lives and works in a secure facility underground.
– You watch people coming and going with various things (snow boots, umbrellas, ice cream cones)
– Can you figure out the weather?
![Page 43: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/43.jpg)
CSCI 5582 Fall 2006
Hidden Markov Models
• Add an output to the states. I.e. when a state is entered it outputs a symbol.
• You can view the outputs, but not the states directly.– States can output different symbols at different times
– Same symbol can come from many states.
![Page 44: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/44.jpg)
CSCI 5582 Fall 2006
Hidden Markov Models
• The point– The observable sequence of symbols does not uniquely determine a sequence of states.
• Can we nevertheless reason about the underlying model, given the observations?
![Page 45: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/45.jpg)
CSCI 5582 Fall 2006
Hidden Markov Model Assumptions
• Now we’re going to make two independence assumptions– The state we’re in depends probabilistically only on the state we were last in (first order Markov assumpution)
– The symbol we’re seeing only depends probabilistically on the state we’re in
![Page 46: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/46.jpg)
CSCI 5582 Fall 2006
Hidden Markov Models
• Now the model needs– The initial state priors
•P(Statei)
– The transition probabilities (as before)•P(Statej|Statek)
– The output probabilities•P(Observationi|Statek)
![Page 47: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/47.jpg)
CSCI 5582 Fall 2006
HMMs
• The joint probability of a state sequence and an observation sequence is…
∏=
−=t
i
iiiitt XEPXXPXPEEXXXP1
101,10 )|()|()(),...,...,(
![Page 48: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/48.jpg)
CSCI 5582 Fall 2006
Noisy Channel Applications
• The hidden model represents an original signal (sequence of words, letters, etc)
• This signal is corrupted probabilistically. Use an HMM to recover the original signal
• Speech, OCR, language translation, spelling correction,…
![Page 49: CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin](https://reader031.vdocuments.mx/reader031/viewer/2022032201/56649d545503460f94a3046e/html5/thumbnails/49.jpg)
CSCI 5582 Fall 2006
Three Problems
• The probability of an observation sequence given a model– Forward algorithm– Prediction falls out from this
• The most likely path through a model given an observed sequence– Viterbi algorithm– Sometimes called decoding
• Finding the most likely model (parameters) given an observed sequence– EM Algorithm