lectures 08 decision making
TRANSCRIPT
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 1/64
Fahiem Bacchus, University of Toronto1
Decision Making Under
Uncertainty.
Chapter 16 in your text.
CSC384h: Intro to Artificial Intelligence
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 2/64
Preferences
I give robot a planning problem: I want coffee
but coffee maker is broken: robot reports “No plan!”
Fahiem Bacchus, University of Toronto2
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 3/64
Preferences
We really want more robust behavior. Robot to know what to do if my primary goal can’t
be satisfied – I should provide it with some indication
of my p re fe re nc e s o ve r a lte rna t ive s e.g., coffee better than tea, tea better than water,
water better than nothing, etc.
But it’s more complex: it could wait 45 minutes for coffee maker to be fixed
what’s better: tea now? coffee in 45 minutes?
could express preferences for <beverage,time> pairs
Fahiem Bacchus, University of Toronto3
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 4/64
Preference Orderings
A p re fe re nc e o rd e ring ≽ is a ranking of all
possible states of affairs (worlds) S
these could be outcomes of actions, truth assts,states in a search problem, etc.
s≽ t: means that state s is a t le a st a s g o o d a s t
s≻ t: means that state s is stric t ly p re fe rre d to t
We insist that ≽ is
reflexive: i.e., s ≽ s for all states s transitive: i.e., if s≽ t and t ≽ w, then s ≽ w
connected: for all states s,t, either s≽ t or t ≽ s
Fahiem Bacchus, University of Toronto4
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 5/64
Why Impose These Conditions?
Structure of preference orderingimposes certain “rationality
requirements” (it is a weak ordering)E.g., why transitivity?
Suppose you (strictly) prefer coffee to
tea, tea to OJ , OJ to coffee If you prefer X to Y, you’ll trade me Y
plus $1 for X
I can construct a “money pump” andextract arbitrary amounts of moneyfrom you
Fahiem Bacchus, University of Toronto5
≻
≻
≻
Best
Worst
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 6/64
Decision Problems: Certainty
A d e c isio n p ro b lem und e r c e rta in ty is: a set of decis ions D
e.g., paths in search graph, plans, actions… a set of o u t c ome s or states S
e.g., states you could reach by executing a plan
an o u tc om e func t io n f : D→
S the outcome of any decision
a preference ordering ≽ over S
A solution to a decision problem is any d*∊ Dsuch that f(d*) ≽ f(d) for all d∊D
Fahiem Bacchus, University of Toronto6
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 7/64
Decision Problems: Certainty
A d e c isio n p ro b lem und e r c e rta in ty is: a set of decis ions D
a set of o u t c ome s or states S
an ou t c om e func t ion f : D →S
a preference ordering ≽ over S
A solution to a decision problem is any d*∊ D such
that f(d*) ≽ f(d) for all d∊D
e.g., in classical planning we that any goal state s ispreferred/equal to every other state. So d* is a solution iff
f(d*) is a solution state. I.e., d* is a solution iff it is a plan thatachieves the goal.
More generally, in classical planning we might considerdifferent goals with different values, and we want d* to be
a plan that optimizes our value.
Fahiem Bacchus, University of Toronto7
w w w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 8/64
Decision Making under Uncertainty
Suppose actions don’t have deterministic outcomes e.g., when robot pours coffee, it spills 20% of time, making a
mess preferences: chc, ¬mess≻ ¬chc,¬mess≻ ¬chc, mess
What should robot do? decision ge t co f f ee leads to a good outcome and a bad
outcome with some probability
decision dono th ing leads to a medium outcome for sure Should robot be optimistic? pessimistic? Really odds of success should influence decision
but how?
Fahiem Bacchus, University of Toronto8
getcoffee
chc, ¬mess
¬chc, mess
donothing ¬chc, ¬mess.8
.2
ww w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 9/64
Utilities
Rather than just ranking outcomes, we mustquantify our degree of preference
e.g., how much more important is having coffeethan having tea?
A u tility func tio n U: S→ℝ associates a real-
valued utility with each outcome (state). U(s) quantifies our degree of preference for s
Note: U induces a preference ordering ≽U over
the states S defined as: s≽U t iff U(s) ≥ U(t)
≽U is reflexive, transitive, connected
Fahiem Bacchus, University of Toronto9
ww w . j n t u w or l d . c o
m
w w w .j n t u w or l d . c o m
w
w w . j w j o b s . n e t
w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 10/64
Expected Utility
With utilities we can compute expected utilities!
In decision making under uncertainty, each
decision d induces a distribution Prd overpossible outcomes
Prd(s) is probability of outcome s under decision d
The e xp e c te d ut ilit y of decision d is defined
Fahiem Bacchus, University of Toronto10
∑∈
=
S s
d sU sd EU )()(Pr )(
ww w . j n t u w or l d . c o
m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 11/64
Expected Utility
Say U(chc,¬ms) = 10, U(¬chc,¬ms) = 5,U(¬chc,ms) = 0,
Then EU(getcoffee) = 8
EU(donothing) = 5
If U(chc,¬ms) = 10, U(¬chc,¬ms) = 9,U(¬chc,ms) = 0,
EU(getcoffee) = 8
EU(donothing) = 9
Fahiem Bacchus, University of Toronto11
w w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 12/64
The MEU Principle
The p rinc ip le o f m a xim um e xp e c te d ut ilit y(MEU) states that the optimal decision under
conditions of uncertainty is the decision that hasgreatest expected utility.
In our example
if my utility function is the first one, my robot shouldget coffee
if your utility function is the second one, your robot
should do nothing
Fahiem Bacchus, University of Toronto12
w w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 13/64
Computational Issues
At some level, solution to a dec. prob. is trivial complexity lies in the fact that the decisions and outcome
function are rarely specified explicitly
e.g., in planning or search problem, you const ruct the set ofdecisions by constructing paths or exploring search paths. Then we have to evaluate the expected utility of each.Computationally hard!
e.g., we find a plan achieving some expected utility e
Can we stop searching?
Must convince ourselves no better plan exists
Generally requires searching entire plan space, unlesswe have some c lever tricks
Fahiem Bacchus, University of Toronto13
w w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 14/64
Decision Problems: Uncertainty
A d e c isio n p ro b lem und e r unc e rta in ty is: a set of decis ions D
a set of o u t c ome s or states S an o u tc om e func t io n Pr : D →Δ(S)
Δ(S) is the set of distributions over S (e.g., Prd)
a utility function U over SA solution to a decision problem under
uncertainty is any d*∊ D such that EU(d*) ≽ EU(d)
for all d∊D
Fahiem Bacchus, University of Toronto14
w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 15/64
Expected Utility: Notes
Note that this viewpoint accounts for both:
uncertainty in action outcomes
uncertainty in state of knowledge any combination of the two
Fahiem Bacchus, University of Toronto15
s0
s1
s2a
0.80.2
s3
s4
b 0.30.7
0.7 s1
0.3 s2
0.7 t1
0.3 t2
0.7 w1
0.3 w2
a
b
Stochastic actions Uncertain knowledge
w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 16/64
Expected Utility: Notes
Why MEU? Where do utilities come from?
underlying foundations of utility theory tightlycouple utility with action/choice
a utility function can be determined by askingsomeone about their preferences for actions inspecific scenarios (or “lotteries” over outcomes)
Utility functions needn’t be unique
if I multiply U by a positive constant, all decisionshave same relative utility
if I add a constant to U, same thing
U is un iq ue up to p o sit ive a ffine tra nsfo rm a t io n
Fahiem Bacchus, University of Toronto16
w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 17/64
So What are the Complications?
Outcome space is large
like all of our problems, states spaces can be huge
don’t want to spell out distributions like Prd explicitly Soln: Bayes nets (or related: in flue nc e d ia g ra m s )
Fahiem Bacchus, University of Toronto17
w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 18/64
So What are the Complications?
Decision space is large
usually our decisions are not one-shot actions
rather they involve sequential choices (like plans) if we treat each plan as a distinct decision, decision
space is too large to handle directly
Soln: use dynamic programming methods toconst ruct optimal plans (actually generalizations ofplans, called policies… like in game trees)
Fahiem Bacchus, University of Toronto18
w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 19/64
An Simple Example
Suppose we have two actions: a, b
We have time to execute tw o actions in
sequence This means we can do either:
[a,a], [a,b], [b,a], [b,b]
Actions are stochastic: action a inducesdistribution Pra(si | s j) over states
e.g., Pra(s2 | s1) = .9 means prob. of moving to state
s2 when a is performed at s1 is .9 similar distribution for action b
How good is a particular sequence of actions?
Fahiem Bacchus, University of Toronto19
w . j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w .
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 20/64
Distributions for Action Sequences
Fahiem Bacchus, University of Toronto20
s1
s13s12s3s2
a b
.9 .1 .2 .8
s4 s5
.5 .5
s6 s7
.6 .4
a b
s8 s9
.2 .8
s10 s11
.7 .3
a b
s14 s15
.1 .9
s16 s17
.2 .8
a b
s18 s19
.2 .8
s20 s21
.7 .3
a b
. j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w .j
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 21/64
Distributions for Action Sequences
Sequence [a,a] gives distribution over “final states” Pr(s4) = .45, Pr(s5) = .45, Pr(s8) = .02, Pr(s9) = .08
Similarly: [a,b]: Pr(s6) = .54, Pr(s7) = .36, Pr(s10) = .07, Pr(s11) = .03 and similar distributions for sequences [b,a] and [b,b]
Fahiem Bacchus, University of Toronto21
s1
s13s12s3s2
a b
.9 .1 .2 .8
s4 s5
.5 .5
s6 s7
.6 .4
a b
s8 s9
.2 .8
s10 s11
.7 .3
a b
s14 s15
.1 .9
s16 s17
.2 .8
a b
s18 s19
.2 .8
s20 s21
.7 .3
a b
j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w . j
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 22/64
How Good is a Sequence?
We associate ut ilit ie s w ith the “fina l” o utc om e s how good is it to end up at s4, s5, s6, …
Now we have: EU(aa) = .45u(s4) + .45u(s5) + .02u(s8) + .08u(s9)
EU(ab) = .54u(s6) + .36u(s7) + .07u(s10) + .03u(s11)
etc…
Fahiem Bacchus, University of Toronto22
j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w . j
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 23/64
Utilities for Action Sequences
Fahiem Bacchus, University of Toronto23
s1
s13s12s3s2
a b
.9 .1 .2 .8
u(s4) u(s5)
.5 .5u(s6)
.6 .4
a b
.2 .8 .7 .3
a b
.1 .9 .2 .8
a b
.2 .8u(s21)
.7 .3
a b
etc….Looks a lot like a game tree, but with chance nodes
instead of min nodes. (We average instead of minimizing)
j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w . j
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 24/64
Ac t io n Se q uenc e s are not sufficient
Suppose we do a first; we could reach s2 or s3: At s2, assume: EU(a) = .5u(s4) + .5u(s5) > EU(b) = .6u(s6) + .4u(s7)
At s3: EU(a) = .2u(s8) + .8u(s9) < EU(b) = .7u(s10) + .3u(s11)
After doing a first, we want to do a next if w e re a c h s2 , but wewant to do b second if w e re a c h s3
Fahiem Bacchus, University of Toronto24
s1
s13s12s3s2
a b
.9 .1 .2 .8
s4 s5
.5 .5
s6 s7
.6 .4
a b
s8 s9
.2 .8
s10 s11
.7 .3
a b
s14 s15
.1 .9
s16 s17
.2 .8
a b
s18 s19
.2 .8
s20 s21
.7 .3
a b
j n t u w or l d . c o m
w w w .j n t u w o
r l d . c o m
w
w w . j w j o b s . n e t
w w w . j n
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 25/64
Policies
This suggests that when dealing with uncertaintywe want to consider pol ic ies , not just sequencesof actions (plans)
We have eight policies for this decision tree:
[a; if s2 a, if s3 a] [b; if s12 a, if s13 a]
[a; if s2 a, if s3 b] [b; if s12 a, if s13 b][a; if s2 b, if s3 a] [b; if s12 b, if s13 a]
[a; if s2 b, if s3 b] [b; if s12 b, if s13 b]
Contrast this with four “plans” [a; a], [a; b], [b; a], [b; b]
note: each plan corresponds to a policy, so we can
only ga in by allowing decision maker to use policiesFahiem Bacchus, University of Toronto25
n t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 26/64
Evaluating Policies
Number of plans (sequences) of length k exponential in k : | A | k if A is our action set
Number of policies is even larger if we have n=| A | actions and m=| O | outcomes
per action, then we have ( nm) k policies
Fortunately, d ynam ic p ro g ra mm ing can be used e.g., suppose EU(a) > EU(b) at s2
never consider a policy that does anything else at
s2How to do this?
back values up the tree much like minimax search
Fahiem Bacchus, University of Toronto26
n t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 27/64
Decision TreesSquares denote cho i ce nodes
these denote action choicesby decision maker (decis ionnodes )
Circles denote c h a n c e nodes these denote uncertainty
regarding action effects “Nature” will choose the child
with specified probability
Terminal nodes labeled withutilities
denote utility of final state (or itcould denote the utility of“trajectory” (branch) todecision maker
Fahiem Bacchus, University of Toronto27
s1a
b
.9 .1 .2 .8
5 2 4 3
n t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 28/64
Evaluating Decision Trees
Procedure is exactly like game trees, except… key difference: the “opponent” is “nature” who
simply chooses outcomes at chance nodes with
specified probability: so we take expectationsinstead of minimizing
Back values up the tree
U(t) is defined for all terminals (part of input) U(n) =exp {U(c ) : c a child of n } if n is a chance
node
U(n) =max {U(c ) : c a child of n } if n is a choicenode
At any choice node (state), the decision makerchooses action that leads to h ig he st u tility c h ild
Fahiem Bacchus, University of Toronto28
n t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 29/64
Evaluating a Decision Tree
U(n3) = .9*5 + .1*2
U(n4) = .8*3 + .2*4
U(s2) = max{U(n3), U(n4)} decision a or b (whichever is
max)
U(n1) = .3U(s2) + .7U(s3)U(s1) = max{U(n1), U(n2)}
decision: max of a, b
Fahiem Bacchus, University of Toronto29
s2
n3a b
.9 .1
5 2
n4.8 .2
3 4
s1
n1
a b
.3 .7n2
s3
t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n t
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 30/64
Decision Tree Policies
Note that we don’t just compute values,but policies for the
treeA p o lic y assigns adecision to each
choice node in tree
Fahiem Bacchus, University of Toronto30
Some policies can’t be distinguished in terms of theirexpected values
e.g., if policy chooses a at node s1, choice at s4 doesn’tmatter because it won’t be reached
Two policies are im p lem enta t io na lly ind ist ing uisha b le if theydisagree only at unreachable decision nodes
reachability is determined by policy themselves
s2
n3
a b
n4
s1
n1
a b
.3 .7n2
s3 s4
a bab
t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n t
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 31/64
Key Assumption: Observability
Full observability:we must know the initial stateand outcome of each action
specifically, to implement the policy,w e m ust b ea b le to re so lve the unc e rta in ty o f a ny c ha nc e no d e
tha t is fo llo w e d b y a d e c isio n no d e
e.g., after doing a at s1, we must know which of the
outcomes (s2 or s3) was realized so we know whataction to do next (note: s2 and s3 may prescribedifferent ations)
Fahiem Bacchus, University of Toronto31
t u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n t u
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 32/64
Computational Issues
Savings compared to explicit policy evaluation issubstantial
Evaluate only O(( nm) d ) nodes in tree of depth d total computational cost is thus O(( nm) d )
Note that this is how many p o lic ie s there are but evaluating a single policy explicitly requires
substantial computation: O(nm d ) total computation for explicity evaluating each
policy would be O(n d m 2d ) !!!
Tremendous value to dynamic programmingsolution
Fahiem Bacchus, University of Toronto32
u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n t u
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 33/64
Computational Issues
Tree size:grows exponentially with depth
Possible solutions:
bounded lookahead with heuristics (like gametrees)
heuristic search procedures (like A*)
Fahiem Bacchus, University of Toronto33
u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n t u
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 34/64
Other Issues
Specification:suppose each state is anassignment to variables; then representingaction probability distributions is complex (andbranching factor could be immense)
Possible solutions:
represent distribution using Bayes nets solve problems using d e c isio n ne tw o rks (or
influence diagrams)
Fahiem Bacchus, University of Toronto34
u w or l d . c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
w w w . j n t u
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 35/64
Large State Spaces (Variables)
To represent outcomes of actions or decisions,we need to specify distributions
Pr(s| d) : probability of outcome s given decision d
Pr(s| a,s’): prob. of state s given that action aperformed in state s’
But state space exponential in # of variables spelling out distributions explicitly is intractable
Bayes nets can be used to represent actions
this is just a joint distribution over variables,conditioned on action/decision and previous state
Fahiem Bacchus, University of Toronto35
u w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
w w w . j n t u w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 36/64
Decision Networks
De c isio n ne tw o rks (more commonly known asin flue nc e d ia g ra m s ) provide a way ofrepresenting sequential decision problems
basic idea: represent the variables in the problemas you would in a BN
add decision variables – variables that you
“control”
add utility variables – how good different states are
Fahiem Bacchus, University of Toronto36
w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
w w w . j n t u w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 37/64
Sample Decision Network
Fahiem Bacchus, University of Toronto37
Disease
TstResult
Chills
Fever
BloodTst Drug
U
optional
w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
w w w . j n t u w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 38/64
Decision Networks: Chance Nodes
Chance nodes
random variables, denoted by circles
as in a BN, probabilistic dependence on parents
Fahiem Bacchus, University of Toronto38
Disease
Fever
Pr(flu) = .3Pr(mal) = .1
Pr(none) = .6
Pr(f|flu) = .5Pr(f|mal) = .3
Pr(f|none) = .05
TstResult
BloodTst
Pr(pos|flu,bt) = .2Pr(neg|flu,bt) = .8
Pr(null|flu,bt) = 0Pr(pos|mal,bt) = .9Pr(neg|mal,bt) = .1Pr(null|mal,bt) = 0Pr(pos|no,bt) = .1
Pr(neg|no,bt) = .9Pr(null|no,bt) = 0Pr(pos|D,¬bt) = 0Pr(neg|D,¬bt) = 0Pr(null|D,¬bt) = 1
w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
w w w . j n t u w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 39/64
Decision Networks: Decision Nodes
Decision nodes variables decision maker sets, denoted by squares
parents reflect in fo rm a t io n a va ila b le at time
decision is to be made In example decision node: the actual values ofCh and Fev will be observed before the decision
to take test must be made agent can make d if fe re nt d e c isio ns for each
instantiation of parents
Fahiem Bacchus, University of Toronto39
Chills
FeverBloodTst BT ∊ {bt, ¬bt}
w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
w w w . j n t u w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 40/64
Decision Networks: Value Node
Value node
specifies utility of a state, denoted by a diamond
utility dependso n ly o n sta te o f p a re n ts of valuenode
generally: only one value node in a decisionnetwork
Utility depends only on disease and drug
Fahiem Bacchus, University of Toronto40
Disease
BloodTst Drug
U
optional
U(fludrug, flu) = 20U(fludrug, mal) = -300
U(fludrug, none) = -5U(maldrug, flu) = -30U(maldrug, mal) = 10U(maldrug, none) = -20U(no drug, flu) = -10
U(no drug, mal) = -285U(no drug, none) = 30
w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
w w w . j n t u w
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 41/64
Decision Networks: Assumptions
Decision nodes are totally ordered decision variables D1, D2, …, Dn
decisions are made in sequence
e.g., BloodTst (yes,no) decided before Drug(fd,md,no)
Fahiem Bacchus, University of Toronto41
w or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
i i k i
w w w . j n t u w o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 42/64
Decision Networks: Assumptions
No -fo rg e t t ing p ro p e rty any information available when decision Di is made
is available when decision D j is made (for i < j)
thus all parents of Di are parents of D j Network does not show these “implicit parents”, but the
links are present, and must be considered whenspecifying the network parameters, and when
computing.
Fahiem Bacchus, University of Toronto42
Chills
Fever
BloodTst DrugDashed arcsensure theno-forgettingproperty
or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
P li i
w w w . j n t u w o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 43/64
Policies
Let Par(D i ) be the parents of decision node D i Dom(Par(D i )) is the set of assignments to parents
A policyδ
is a set of mappingsδ i , one for eachdecision node D i
δ i :Dom(Par(D i )) →Dom(D i )
δ i associates a decision with each parent asst forD i
For example, a policy for BT might be: δ BT (c ,f) = b t
δ BT (c ,¬f) = ¬b t δ BT (¬c ,f) = b t δ BT (¬c ,¬f) = ¬b t
Fahiem Bacchus, University of Toronto43
Chills
FeverBloodTst
or l d . c o m
w w w .j n t u w or
l d . c o m
w w
w . j w j o b s . n e t
V l f P li
w w w . j n t u w o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 44/64
Value of a Policy
Va lue o f a p o lic y δ is the expected utility giventhat decision nodes are executed according toδ
Given asst x to the set X of all chance variables,let δ (x) denote the asst to decision variablesdictated by δ
e.g., asst to D 1 determined by it’s parents’ asst in x e.g., asst to D 2 determined by it’s parents’ asst in x
along with whatever was assigned to D 1 etc.
Value of δ : EU(δ ) =ΣX P(X, δ (X)) U(X, δ (X))
Fahiem Bacchus, University of Toronto44
or l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
O ti l P li i
w w w . j n t u w o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 45/64
Optimal Policies
An o p t im a l p o lic y is a policy δ * such thatEU(δ* ) ≥ EU(δ ) for all policiesδ
We can use the dynamic programming principleto avoid enumerating all policies
We can also use the structure of the decision
network to use variable elimination to aid in thecomputation
Fahiem Bacchus, University of Toronto45
or l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
C ti th B t P li
w w w . j n t u w o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 46/64
Computing the Best Policy
We can work backwards as follows
First compute optimal policy for Drug (last dec’n)
for each asst to parents (C,F,BT,TR) and for eachdecision value (D = md,fd,none), c om p ute theexp ec ted va lue of choosing that value of D
set policy choice for each
value of parents to bethe value of D thathas max value
eg: δ D (c ,f,b t ,p o s) = m d
Fahiem Bacchus, University of Toronto46
Disease
TstResultChills
FeverBloodTst Drug
U
optional
r l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Computing the Best Policy
w w w . j n t u w or
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 47/64
Computing the Best Policy
Next compute policy for BT given policyδ D (C ,F,BT,TR) just determined for Drug
since δ D (C ,F,BT,TR) is fixed, we can treat Drug as a
normal random variable with deterministicprobabilities
i.e., for any instantiation of parents, value of Drug is
fixed by policy δ D
this means we can solve for optimal policy for BT justas before
only uninstantiated vars are random vars (once wefix its parents)
Fahiem Bacchus, University of Toronto47
r l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Computing the Best Policy
w w w . j n t u w or
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 48/64
Computing the Best Policy How do we compute these expected values?
suppose we have asst <c,f ,bt ,pos> to parents of Drug we want to compute EU of deciding to set Drug = m d we can run variable elimination!
Treat C ,F,BT,TR,D r as evidence this reduces factors (e.g., U restricted to b t ,md : depends onDis )
eliminate remaining variables (e.g., only Dise a se left)
left with factor:U() =ΣDis P(Dis| c,f,bt,pos,md)U(Dis)
Fahiem Bacchus, University of Toronto48
Disease
TstResult
Chills
Fever
BloodTst Drug
U
optional
r l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Computing the Best Policy
w
w w . j n t u w or
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 49/64
Computing the Best Policy
We now know EU of doing Dr=md when c , f ,b t ,pos true
Can do same for fd ,no to
decide which is best
Fahiem Bacchus, University of Toronto49
Disease
TstResult
Chills
Fever
BloodTst Drug
U
optional
r l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Computing Expected Utilities
w
w w . j n t u w or l
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 50/64
Computing Expected Utilities
The preceding illustrates a general phenomenon computing expected utilities with BNs is quite easy
utility nodes are just factors that can be dealt withusing variable elimination
EU =ΣA,B,C P(A,B,C) U(B,C)
=ΣA,B,C P(C| B) P(B| A) P(A) U(B,C) J ust eliminate variables
in the usual way
Fahiem Bacchus, University of Toronto50
U
C
B
A
l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Optimizing Policies: Key Points
w
w w . j n t u w or l
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 51/64
Optimizing Policies: Key Points
If a decision node D has no decisions that followit, we can find its policy by instantiating each ofits parents and computing the expected utility of
each decision for each parent instantiation
Fahiem Bacchus, University of Toronto51
l d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Optimizing Policies: Key Points
w
w w . j n t u w or l d
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 52/64
Optimizing Policies: Key Points
no-forgetting means that all other decisions areinstantiated (they must be parents)
its easy to compute the expected utility using VE
the number of computations is quite large: we runexpected utility calculations (VE) for each parent
instantiation together with each possible decision Dmight allow
policy: choose max decision for each parent
instant’n
Fahiem Bacchus, University of Toronto52
d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Optimizing Policies: Key Points
w
w w . j n t u w or l d
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 53/64
Optimizing Policies: Key Points
When a decision D node is optimized, it can betreated as a random variable
for each instantiation of its parents we now know
what value the decision should take just treat policy as a new CPT: for a given parent
instantiation x, D getsδ (x) with probability 1(allother decisions get probability zero)
If we optimize from last decision to first, at eachpoint we can optimize a specific decision by (abunch of) simple VE calculations
it’s successor decisions (optimized) are just normalnodes in the BNs (with CPTs)
Fahiem Bacchus, University of Toronto53
d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Decision Network Notes
w
w w . j n t u w or l d
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 54/64
Decision Network Notes
Decision networks commonly used by decisionanalysts to help structure decision problems
Much work put into computationally effectivetechniques to solve these
Complexity much greater than BN inference
we need to solve a number of BN inferenceproblems
one BN problem for each setting of decision nodeparents and decision node value
Fahiem Bacchus, University of Toronto54
d . c o m
w w w .j n t u w or l d . c o m
w w
w . j w j o b s . n e t
Real Estate Investment
w
w w . j n t u w or l d
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 55/64
Real Estate Investment
Fahiem Bacchus, University of Toronto55
d . c o m
w w w .j n t u w or l
d . c o m
w w
w . j w j o b s . n e t
DBN-Decision Nets for Planning
w
w w . j n t u w or l d
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 56/64
DBN Decision Nets for Planning
Fahiem Bacchus, University of Toronto56
T t
Lt
Ct
Rt
T t+1
Lt+1
Ct+1
Rt+1
Mt Mt+1
Actt
U
T t-1
Lt-1
Ct-1
Rt-1
Mt-1
Actt-1
T t-2
Lt-2
Ct-2
Rt-2
Mt-2
Actt-2
d . c o m
w w w .j n t u w or l
d . c o m
w w
w . j w j o b s . n e t
A Detailed Decision Net Example
w
w w . j n t u w or l d .
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 57/64
A Detailed Decision Net Example
Setting: you want to buy a used car, but there’sa good chance it is a “lemon” (i.e., prone tobreakdown). Before deciding to buy it, you can
take it to a mechanic for inspection. They willgive you a report on the car, labeling it either“good” or “bad”. A good report is positivelycorrelated with the car being sound, while abad report is positively correlated with the carbeing a lemon.
Fahiem Bacchus, University of Toronto57
. c o m
w w w .j n t u w or l
d . c o m
w w
w . j w j o b s . n e t
A Detailed Decision Net Example
w
w w . j n t u w or l d .
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 58/64
A Detailed Decision Net Example
However the report costs $50. So you could riskit, and buy the car without the report.
Owning a sound car is better than having no
car, which is better than owning a lemon.
Fahiem Bacchus, University of Toronto58
c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
Car Buyer’sNetwork Rep: good bad none
w
w w . j n t u w or l d . c
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 59/64
Car Buyer s Network
Fahiem Bacchus, University of Toronto59
Lemon
Report
Inspect Buy
U
l ¬l0.5 0.5
g b n
l i 0.2 0.8 0¬l i 0.9 0.1 0l ¬i 0 0 1¬l ¬i 0 0 1
Rep: good,bad,none
b l -600 b ¬l 1000
¬b l -300
¬b¬l -300
Utility
-50 ifinspect
c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
Evaluate Last Decision: Buy (1)
w
w w . j n t u w or l d . c
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 60/64
y ( )
EU(B|I,R) = ΣL P(L|I,R,B)U(L,B) The probability of the remaining variables in the Utility function, times
the utility function. Note P(L|I,R,B) = P(L|I,R), as B is a decisionvariable that does not influence L.
I = i, R = g:
P(L|i,g): use variable elimination. Query variable L is only remainingvariable, so we only need to normalize (no summations).
P(L,i,g) = P(L)P(g|L,i)HENCE: P(L|i,g) = normalized [P(l)P(g|l,i),P(¬l)P(g|¬l,i)= [0.5*.2, 0.5*0.9] = [.18, .82]
EU(buy) = P(l|i,g)U(buy,l) + P(¬l)P(¬l|i,g) U(buy,¬l)-50= .18*-600 + .82*1000 – 50 = 662
EU(¬buy) = P(l|i, g) U(¬buy,l) + P(¬l|i, g) U(¬buy,¬l) – 50= .18*-300 + .82*-300 -50 = -350
So optimal δ Buy (i,g) = buy
Fahiem Bacchus, University of Toronto60
c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
Evaluate Last Decision: Buy (2)
w
w w . j n t u w or l d . c
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 61/64
y ( )
I = i, R = b:
P(L,i,b) = P(L)P(b|L,i)
P(L|i,b) = normalized [P(l)P(b|l,i),P(¬l)P(b|¬l,i)= [0.5*.8, 0.5*0.1] = [.89, .11]
EU(buy) = P(l|i, b) U(l,buy) + P(¬l|i, b) U(¬l,buy) - 50
= .89*-600 + .11*1000 - 50 = -474
EU(¬buy) = P(l|i, b) U(l,¬buy) + P(¬l|i, b) U(¬l,¬buy) – 50= .89*-300 + .11*-300 -50 = -350
So optimal δ Buy (i,b) = ¬buy
Fahiem Bacchus, University of Toronto61
c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
Evaluate Last Decision: Buy (3)
w
w w . j n t u w or l d . c
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 62/64
y ( )
I = ¬i, R = n
P(L,¬i,n) = P(L)P(n|L,¬i)
P(L|¬i,n) = normalized [P(l)P(n|l,¬i),P(¬l)P(n|¬l,¬i)
= [0.5*1, 0.5*1] = [.5,.5]
EU(buy) = P(l|¬i,n) U(l,buy) + P(¬l|¬i,n) U(¬l,buy)
= .5*-600 + .5*1000 = 200 (no inspection cost)
EU(¬buy) = P(l|¬i, n) U(l,¬buy) + P(¬l|¬i, n) U(¬l,¬buy)
= .5*-300 +.5*-300 = -300 So optimal δ Buy (¬i,n) = buy
Overall optimal policy for Buy is: δ Buy (i,g) = buy ; δ Buy (i,b) = ¬buy ; δ Buy (¬i,n) = buy
Note: we don’t bother computing policy for (i,¬n), (¬i, g), or (¬i, b), since
these occur with probability 0
Fahiem Bacchus, University of Toronto62
c o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
Evaluate First Decision: Inspect
w
w w . j n t u w or l d . c o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 63/64
p
EU(I) = ΣL,R P(L,R|I) U(L, Buy (I,R))
where P(R,L|I) = P(R|L,I)P(L|I)
EU(i) = .1*-600 + .4*-300 + .45*1000 + .05*-300 - 50
= 237.5 – 50 = 187.5
EU(¬i) = P(l|¬i, n) U(l,buy) + P(¬l|¬i, n) U(¬l,buy)
= .5*-600 + .5*1000 = 200
So optimal δ Inspect (¬i) = buy
P(R,L |i)
y
U( L,y
)
g,l 0.2*.5 = .1 buy -600 - 50 = -650
b,l 0.8*.5 = .4 ¬buy -300 - 50 = -350
g,¬l 0.9*.5 = .45 buy 1000 - 50 = 950
b,¬l 0.1*.5 = .05 ¬buy -300 - 50 = -350
Fahiem Bacchus, University of Toronto63
o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t
Value of Information
w
w w . j n t u w or l d . c o
8/21/2019 Lectures 08 Decision Making
http://slidepdf.com/reader/full/lectures-08-decision-making 64/64
So optimal policy is: don’t inspect, buy the car EU = 200
Notice that the EU of inspecting the car, then buying it iff
you get a good report, is 237.5 less the cost of theinspection (50). So inspection not worth the improvement inEU.
But suppose inspection cost $25: then it would be worth it
(EU = 237.5 – 25 = 212.5 > EU(¬i)) The exp e c ted va lue o f in fo rm a t io n associated with
inspection is 37.5 (it improves expected utility by thisamount ignoring cost of inspection). How? Gives
opportunity to change decision (¬buy if bad).
You should be willing to pay up to $37.5 for the report
Fahiem Bacchus, University of Toronto64
o m
w w w .j n t u w or l d . c o m
w w w . j w j o b s . n e t