computer science cpsc 502 lecture 12 decisions under uncertainty (ch. 9, up to 9.3)

68
Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Upload: melodie-foreman

Post on 31-Dec-2015

31 views

Category:

Documents


2 download

DESCRIPTION

Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3). Representational Dimensions. Representation. Environment. Reasoning Technique. Stochastic. Deterministic. Problem Type. Arc Consistency. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Computer Science CPSC 502

Lecture 12

Decisions Under Uncertainty

(Ch. 9, up to 9.3)

Page 2: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Representational Dimensions

• Environment

Problem Type

Query

Planning

Deterministic Stochastic

Constraint Satisfaction Search

Arc Consistency

Search

Search

Logics

STRIPS

Vars + Constraints

Value Iteration

Variable Elimination

Belief Nets

Decision Nets

Markov Processes

Static

Sequential

Representation

ReasoningTechnique

Variable Elimination

Approximate

Temporal Inference

This concludes the module on answering queries in stochastic environments

Page 3: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Representational Dimensions

• Environment

Problem Type

Query

Planning

Deterministic Stochastic

Constraint Satisfaction Search

Arc Consistency

Search

Search

Logics

STRIPS

Vars + Constraints

Value Iteration

Variable Elimination

Belief Nets

Decision Nets

Markov Processes

Static

Sequential

Representation

ReasoningTechnique

Variable Elimination

Approximate

Temporal Inference

Now we will look at acting in stochastic environments

Page 4: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Lecture Overview

• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision

• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE

4

Page 5: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Decisions Under Uncertainty: Intro• Earlier in the course, we focused on decision making in

deterministic domains – Search/CSPs: single-stage decisions– Planning: sequential decisions

• Now we face stochastic domains– so far we've considered how to represent and update beliefs– what if an agent has to make decisions (act) under uncertainty?

• Making decisions under uncertainty is important– We represent the world probabilistically so we can use our beliefs as the

basis for making decisions

5

Page 6: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Decisions Under Uncertainty: Intro• An agent's decision will depend on

– What actions are available– What beliefs the agent has– Which goals the agent has

• Differences between deterministic and stochastic setting– Obvious difference in representation: need to represent our uncertain

beliefs– Actions will be pretty straightforward: represented as decision variables– Goals will be interesting: we'll move from all-or-nothing goals to a richer

notion: • rating how happy the agent is in different situations.

– Putting these together, we'll extend Bayesian Networks to make a new representation called Decision Networks

6

Page 7: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Delivery Robot Example• Robot needs to reach a certain room

• Robot can go

• the short way - faster but with more obstacles, thus more prone to accidents that can damage the robot

• the long way - slower but less prone to accident• Which way to go? Is it more important for the robot to arrive fast, or to minimize

the risk of damage?• The Robot can choose to wear pads to protect itself in case of accident, or not

to wear them. Pads slow it down• Again, there is a tradeoff between reducing risk of damage and arriving fast

• Possible outcomes

• No pad, no accident

• Pad, no accident

• Pad, Accident

• No pad, accident

Page 8: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Next• We’ll see how to represent and reason about situations of this

nature using Decision Trees, as well as

• Probability to measure the uncertainty in action outcome

• Utility to measure agent’s preferences over the various outcomes

• Combined in a measure of expected utility that can be used to identify the action with the best expected outcome

• Best that an intelligent agent can do when it needs to act in a stochastic environment

Page 9: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Decision Tree for the Delivery Robot Example• Decision variable 1: the robot can choose to wear pads

– Yes: protection against accidents, but extra weight– No: fast, but no protection

• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow

• Random variable: is there an accident?

Agent decides

Chance decides

9

Page 10: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Delivery Robot Example• Decision variable 1: the robot can choose to wear pads

– Yes: protection against accidents, but extra weight– No: fast, but no protection

• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow

• Random variable: is there an accident?

Agent decides

Chance decides

10

Page 11: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Possible worlds and decision variables• A possible world specifies a value for each random variable and each decision

variable• For each assignment of values to all decision variables

– the probabilities of the worlds satisfying that assignment sum to 1.

0.2

0.8

11

Page 12: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Possible worlds and decision variables

0.01

0.99

0.2

0.8

12

• A possible world specifies a value for each random variable and each decision variable

• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.

Page 13: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Possible worlds and decision variables

0.01

0.99

0.2

0.8

0.2

0.8

13

• A possible world specifies a value for each random variable and each decision variable

• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.

Page 14: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Possible worlds and decision variables

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

14

• A possible world specifies a value for each random variable and each decision variable

• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.

Page 15: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Utility• Utility: a measure of desirability of possible worlds to an agent

– Let U be a real-valued function such that U(w) represents an agent's degree of preference for world w

– Expressed by a number in [0,100]

• Simple goals can still be specified– Worlds that satisfy the goal have utility 100– Other worlds have utility 0

• Utilities can be more complicated– For example, in the robot delivery domains, they could involve

• Reached the target room?• Time taken• Amount of damage• Energy left

15

Page 16: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Utility for the Robot Example• Which would be a reasonable utility function for our robot?

• Which are the best and worst scenarios?

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

16

Utilityprobability

Page 17: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Utility / Preferences• Utility: a measure of desirability of possible worlds to an agent

– Let U be a real-valued function such that U (w) represents an agent's degree of preference for world w .

Would this be a reasonable utility function for our Robot?

Which way Accident Wear Pads

Utility World

short true trueshort false truelong true true long false trueshort true falseshort false falselong true falselong false false

3595 3075 31000 80

w0, moderate damagew1, reaches room, quick, extra weight w2, moderate damage, low energy w3, reaches room, slow, extra weight w4, severe damage w5, reaches room, quickw6, severe damage, low energy w7, reaches room, slow

Page 18: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Utility: Simple Goals

– Can simple (boolean) goals still be specified?

Which way Accident Wear Pads

Utility

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

Slide 18

Page 19: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Utility for the Robot Example• Now, how do we combine utility and probability to decide what to do?

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

19

Utility

3535

95

probability

3530

75

353

100

350

80

Page 20: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Optimal decisions: combining Utility and Probability• Each set of decisions defines a probability distribution over

possible outcomes• Each outcome has a utility• For each set of decisions, we need to know their expected utility

– the value for the agent of achieving a certain probability distribution over outcomes (possible worlds)

• The expected utility of a set of decisions is obtained by • weighting the utility of the relevant possible worlds by their probability.

35

95

0.2

0.8

• We want to find the decision with maximum expected utility

value of this scenario?

Page 21: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Expected utility of a decision

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

21

Utility

3535

95

probability E[U|D]

3530

75

353

100

350

80

• The expected utility of decision D = di is

• What is the expected utility of Wearpads=yes, Way=short ?

E(U | D = di ) = w╞ (D = di )P(w)

U(w)

Page 22: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Expected utility of a decision

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

22

Utility

3535

95

probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2

• The expected utility of decision D = di is

• What is the expected utility of Wearpads=yes, Way=short ?0.2 * 35 + 0.8 * 95 = 83

E(U | D = di ) = w╞ (D = di )P(w)

U(w)

Page 23: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Lecture Overview

• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision

• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE

23

Page 24: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Single Action vs. Sequence of Actions• Single Action (aka One-Off Decisions)

– One or more primitive decisions that can be treated as a single macro decision to be made before acting

– E.g., “WearPads” and “WhichWay” can be combined into macro decision (WearPads, WhichWay) with domain {yes,no} × {long, short}

• Sequence of Actions (Sequential Decisions)– Repeat:

• make observations• decide on an action• carry out the action

– Agent has to take actions not knowing what the future brings• This is fundamentally different from everything we’ve seen so far• Planning was sequential, but we still could still think first and then act

24

Page 25: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Optimal single-stage decision

• Given a single (macro) decision variable D– the agent can choose D=di for any value di dom(D)

25

 

Page 26: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

What is the optimal decision in the example?

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

Utility

3535

95

Conditional probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2

 

Which is the optimal decision here?

Page 27: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Optimal decision in robot delivery example

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

Utility

3535

95

Conditional probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2

 

Best decision: (wear pads, short way)

Page 28: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Single-Stage decision networks

• Extend belief networks with:– Decision nodes, that the agent chooses the value for

• Parents: only other decision nodes allowed• Domain is the set of possible actions • Drawn as a rectangle

– Exactly one utility node• Parents: all random & decision variables on which the utility depends• Does not have a domain• Drawn as a diamond

• Explicitly shows dependencies– E.g., which variables affect the probability of an accident?

28

Page 29: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Types of nodes in decision networks• A random variable is drawn as an ellipse.

– Arcs into the node represent probabilistic dependence– As in Bayesian networks:

a random variable is conditionally independent of its non-descendants given its parents

• A decision variable is drawn as an rectangle. – Arcs into the node represent

information available when the decision is made

• A utility node is drawn as a diamond.– Arcs into the node represent variables that the utility

depends on.– Specifies a utility for each instantiation of its parents

Page 30: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Example Decision Network

Decision nodes do not have an associated table.

The utility node does not have a domain.

30

Which way Accident Wear Pads

Utility

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

300 75 8035395 100

Which Way W

Accident A

P(A|W)

longlong shortshort

true false true false

0.01 0.99 0.2 0.8

Page 31: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Computing the optimal decision: we can use VE

• Denote– the random variables as X1, …, Xn

– the decision variables as D– the parents of node N as pa(N)

• To find the optimal decision we can use VE:1. Create a factor for each conditional probability and for the utility

2. Sum out all random variables, one at a time• This creates a factor on D that gives the expected utility for each di

3. Choose the di with the maximum value in the factor

nXX

n UpaUDXXPUE,...,

1

1

))(()|,...,()(

nXX

n

i,..., 11

31

Page 32: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Computing the optimal decision: we can use VE

• Denote– the random variables as X1, …, Xn

– the decision variables as D– the parents of node N as pa(N)

• To find the optimal decision we can use VE:1. Create a factor for each conditional probability and for the utility

2. Sum out all random variables, one at a time• This creates a factor on D that gives the expected utility for each di

3. Choose the di with the maximum value in the factor

nXX

n UpaUDXXPUE,...,

1

1

))(()|,...,()(

nXX

n

iii UpaUXpaXP

,..., 11

))(())(|(

32

Includes decision vars

Page 33: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

VE Example: Step 1, create initial factors

33

Which way W Accident A Pads P Utility

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

300 75 8035395 100

Which Way W

Accident A

P(A|W)

longlong shortshort

true false true false

0.01 0.99 0.2 0.8

f1(A,W)

f2(A,W,P)

A

PWAUWAPUE ),,()|()(

A

PWAfWAf ),,(),( 21

Abbreviations:

W = Which WayP = Wear PadsA = Accident

Page 34: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Which way W Accident A Pads P f(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

0.01 * 30

???

VE example: step 2, sum out A

34

Which way W Accident A Pads P

f2(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

300 75 8035395 100

Which way W

Accident A

f1(A,W)

longlong shortshort

true false true false

0.01 0.99 0.2 0.8

f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)

Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

Page 35: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Which way W Accident A Pads P f(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

VE example: step 2, sum out AStep 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

35

Which way W Accident A Pads P

f2(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

300 75 8035395 100

f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)Which way

WAccident A

f1(A,W)

longlong shortshort

true false true false

0.01 0.99 0.2 0.8

Page 36: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

VE example: step 2, sum out A

36

Step 2b: sum A out of the product f(A,W,P):

Which way W

Pads P f3(W,P)

longlong shortshort

true false true false

A

3 )PW,A,(f P)(W,f

Which way W Accident A Pads P f(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

The final factor encodes the expected utility of each decision

Page 37: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

VE example: step 2, sum out A

37

Which way W

Pads P f3(W,P)

longlong shortshort

true false true false

0.01*30+0.99*75=74.55

0.2*35+0.8*95=83

Which way W Accident A Pads P f(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

Step 2b: sum A out of the product f(A,W,P):

A3 )PW,A,(f P)(W,f

The final factor encodes the expected utility of each decision

Page 38: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Expected utility of a decision

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

38

Utility

3535

95

probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2

• The expected utility of decision D = di is

• What is the expected utility of Wearpads=yes, Way=short ?0.2 * 35 + 0.8 * 95 = 83

E(U | D = di ) = w╞ (D = di )P(w)

U(w)

Page 39: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

VE example: step 2, sum out A

39

Which way W

Pads P f3(W,P)

longlong shortshort

true false true false

0.01*30+0.99*75=74.550.01*0+0.99*80=79.20.2*35+0.8*95=830.2*3+0.8*100=80.6

Which way W Accident A Pads P f(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

Step 2b: sum A out of the product f(A,W,P):

A3 )PW,A,(f P)(W,f

The final factor encodes the expected utility of each decision

Page 40: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

The final factor encodes the expected utility of each decision• Thus, taking the short way but wearing pads is the best choice, with an

expected utility of 83

VE example: step 3, choose decision with max E(U)

40

Which way W

Pads P f3(W,P)

longlong shortshort

true false true false

0.01*30+0.99*75=74.550.01*0+0.99*80=79.20.2*35+0.8*95=830.2*3+0.8*100=80.6

Which way W Accident A Pads P f(A,W,P)

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

Step 2b: sum A out of the product f(A,W,P):

A3 )PW,A,(f P)(W,f

Page 41: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Variable Elimination for Single-Stage Decision Networks: Summary

1. Create a factor for each conditional probability and for the utility

2. Sum out all random variables, one at a time– This creates a factor on D that gives the expected utility for each di

3. Choose the di with the maximum value in the factor

41

Page 42: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Lecture Overview

• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision

• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE

42

Page 43: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Sequential Decision Problems• An intelligent agent doesn't make a multi-step decision

and carry it out blindly– It would take new observations it makes into account

• A more typical scenario:– The agent observes, acts, observes, acts, …

• Subsequent actions can depend on what is observed– What is observed often depends on previous actions– Often the sole reason for carrying out an action is to provide information

for future actions• For example: diagnostic tests, spying

• General Decision networks:– Just like single-stage decision networks, with one exception:

the parents of decision nodes can include random variables

43

Page 44: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Sequential decisions : Simplest possible

• Only one decision! (but different from one-off decisions)• Early in the morning. Shall I take my umbrella today? (I’ll have

to go for a long walk at noon)• Relevant Random Variables?

Slide 44

Page 45: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

• Each decision Di has an information set of variables pa(Di), whose value will be known at the time decision Di is made– pa(CheckSmoke) = {Report}– pa(Call) = {Report, CheckSmoke, See Smoke}

Sequential Decision Problems: Example

Decision node: Agent decides

Chance node: Chance decides

• In our Fire Alarm domain• If there is a report you can decide to call the fire

department• Before doing that, you can decide to check if you can see

smoke, but this takes time and will delay calling• A decision (e.g. Call) can

depends on a random variable

(e.g. SeeSmoke )

Page 46: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Sequential Decision Problems• What should an agent do?

– The agent observes, acts, observes, acts, …– Subsequent actions can depend on what is observed

• What is observed often depends on previous actions

• The agent needs a conditional plan of what it will do given every possible set of circumstances

• We will formalize this conditional plan as a policy

46

Page 47: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Policies for Sequential Decision Problems

This policy means that when the agent has observed

o dom(pa(Di )) , it will do δi(o)

CheckSmoke

Report δcs1 δcs2 δcs3 δcs4

Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions

δi : dom(pa(Di )) → dom(Di)

There are 22=4 possible decision functions δcs

for Check Smoke: - Decision function needs to specify a value for

each instantiation of parents

Page 48: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Policies for Sequential Decision Problems

This policy means that when the agent has observed

o dom(pa(Di )) , it will do δi(o)

CheckSmoke

Report δcs1 δcs2 δcs3 δcs4

T T T F F

F T F T F

Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions

δi : dom(pa(Di )) → dom(Di)

There are 22=4 possible decision functions δcs

for Check Smoke: - Decision function needs to specify a value for

each instantiation of parents

Page 49: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

Policies for Sequential Decision Problems

Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions

δi : dom(pa(Di )) → dom(Di)

There are possible decision functions δcs for Call:

when the agent has observed o dom(pDi ) , it will do δi(o)

49

Page 50: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

Policies for Sequential Decision Problems

Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions

δi : dom(pa(Di )) → dom(Di)

There are 28=256 possible decision functions δcs for Call:

when the agent has observed o dom(pDi ) , it will do δi(o)

50

Page 51: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Policies for Sequential Decision Problems • Policies for sequential decisions are the counterpart of Single

Decisions for One-Off decisions– They are just much more complex, because they tell the agent what to do

for any step of the decision sequence given any possible combination of available information

Which way

Wear Pads

d1d2d3d4

longlong shortshort

true false true false

CheckSmoke

Report

δcs1 δcs2 δcs3 δcs4

T t t f f

F t f t f

Call

Report

CheckS

SeeS

δcall1 ……

δcallk δcalln

true TrueTrueTruefalsefalsefalsefalse

TrueTrueFalseFalseTrueTrueFalsefalse

TrueFalseTrueFalseTrueFalseTrueFalse

true false true falsetrue false false false

………….

true false true falsetrue false false false

true true true truetruetruetruetrue

Sample policy

Sample policy

Sample decision

Page 52: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

How many policies are there?• If a decision D has k binary parents,

• there are 2k different assignments of values to the parents• If there are b possible value for a decision variable with k binary parents

• There are b2k different decision functions for that variable• because there are 2k possible instantiations for the parents and for every

instantiation of those parents, the decision function could pick any of b values

• If there are d decision variables, each with k binary parents and b possible values– There are different policies

52

Page 53: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

How many policies are there?• If a decision D has k binary parents,

• there are 2k different assignments of values to the parents• If there are b possible value for a decision variable with k binary parents

• There are b2k different decision functions for that variable• because there are 2k possible instantiations for the parents and for every

instantiation of those parents, the decision function could pick any of b values

• If there are d decision variables, each with k binary parents and b possible values– There are (b2k)d different policies– because there are b2k possible decision functions for each decision, and a

policy is a combination of d such decision functions• We want to find the “most promising” policy given the inherent uncertainty,

i.e. the policy with the maximum expected utility– But of course we want to avoid enumerating all the policies: VE

53

Page 54: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Lecture Overview

• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision

• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE

54

Page 55: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Expected Value of a Policy• Like One-Off decisions, policies are selected based on their

expected utility

• Each possible world w has a probability P(w) and a utility U(w)

• The expected utility of policy π is

• The optimal policy is one with the maximum expected utility.

∑ w ╞ π =P(w)*U(w)

Possible worlds that satisfy the policy

Page 56: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

When does a possible world satisfy a policy?• Possible world w satisfies policy π , written w ╞ π if the value of

each decision variable is the value selected by its decision function in the policy (when applied in w).

Report Check Smoke

true false

true false

Report CheckSmoke SeeSmoke

Call

true true true true true falsetrue false truetrue false falsefalse true truefalse true falsefalse false truefalse false false

falsefalse true falsetrue false false false

VARs

Fire Tampering AlarmLeaving ReportSmoke SeeSmoke CheckSmoke Call

truefalse truetruetrue true truetruetrue

Decision function δcs2 …

Decision function δcall20

Policy π

w2 ╞ π ?

w2

Page 57: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

One last operation on factors: maxing out a variable

• Maxing out a variable is similar to marginalization– But instead of taking the sum of some values, we take the max

B A C f3(A,B,C)

t t t 0.03

t t f 0.07

f t t 0.54

f t f 0.36

t f t 0.06

t f f 0.14

f f t 0.48

f f f 0.32

A C f4(A,C)

t t 0.54

t f 0.36

f t ?

f f

maxB f3(A,B,C) = f4(A,C)

),,,(max,,max 21)(2 11 jXdomxjX XXxXfXX

57

Page 58: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

One last operation on factors: maxing out a variable

• Maxing out a variable is similar to marginalization– But instead of taking the sum of some values, we take the max

B A C f3(A,B,C)

t t t 0.03

t t f 0.07

f t t 0.54

f t f 0.36

t f t 0.06

t f f 0.14

f f t 0.48

f f f 0.32

A C f4(A,C)

t t 0.54

t f 0.36

f t 0.48

f f 0.32

maxB f3(A,B,C) = f4(A,C)

),,,(max,,max 21)(2 11 jXdomxjX XXxXfXX

58

Page 59: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

The no-forgetting property

• A decision network has the no-forgetting property if– Decision variables are totally ordered: D1, …, Dm

– If a decision Di comes before Dj ,then • Di is a parent of Dj

• any parent of Di is a parent of Dj

59

Page 60: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Idea for finding optimal policies with VE• Idea for finding optimal policies with variable elimination (VE):

Dynamic programming: precompute optimal future decisions– Consider the last decision D to be made

• Find optimal decision D=d for each instantiation of D’s parents – For each instantiation of D’s parents, this is just a single-stage decision problem

• Create a factor of these maximum values: max out D– I.e., for each instantiation of the parents, what is the best utility I can achieve by making

this last decision optimally?• Recurse to find optimal policy for reduced network (now one less decision)

60

Page 61: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Finding optimal policies with VE1. Create a factor for each CPT and a factor for the utility

2. While there are still decision variables– 2a: Sum out random variables that are not parents of a decision node.

• E.g – 2b: Max out last decision variable D in the total ordering

• Keep track of decision function

3. Sum out any remaining variable: this is the expected utility of the optimal policy.

This is Algorithm VE_DN in P&M, Section 9.3.3, p. 393

61

Page 62: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Finding optimal policies with VE1. Create a factor for each CPT and a factor for the utility

2. While there are still decision variables– 2a: Sum out random variables that are not parents of a decision node.

• E.g Tampering, Fire, Alarm, Smoke, Leaving– 2b: Max out last decision variable D in the total ordering

• Keep track of decision function

3. Sum out any remaining variable: this is the expected utility of the optimal policy.

This is Algorithm VE_DN in P&M, Section 9.3.3, p. 393

62

Page 63: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Eliminate the decision Variables: step3 details• Select a variable D that corresponds to the latest decision to be made

– this variable will appear in only one factor with (some of) its parents

• Eliminate D by maximizing. This returns:

– The optimal decision function for D, arg maxD f

– A new factor to use in VE, maxD f

• Repeat till there are no more decision nodes.

Report CheckSmoke Value

true truetrue falsefalse truefalse false

-5.0 -5.6 -23.7 -17.5

Example: Eliminate CheckSmoke

Report CheckSmoke

true false

Report

Value

truefalse

New factor

Decision Function

Page 64: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Computational complexity of VE for finding optimal policies

• We saw:For d decision variables (each with k binary parents and b possible actions), there are (b2k)d policies– All combinations of (b2k) decision functions per decision

• Variable elimination saves the final exponent:– Dynamic programming: consider each decision functions only once– Resulting complexity: O(d * b2k)– Much faster than enumerating policies (or search in policy space), but still

doubly exponential– Solution: approximation algorithms for finding optimal policies

64

Page 65: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Value of Information

• What would help the agent make a better Umbrella decision?

• The value of information of a random variable X for decision D is:• the utility of the network with an arc from X to D minus the

utility of the network without the arc.

Page 66: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Slide 66

Value of Information (cont.)• The value of information provides a bound on how much you should be prepared to pay for a sensor. How much is a perfect weather forecast worth?

• Original maximum expected utility:• Maximum expected utility when we know

Weather:• Better forecast is worth at most:

77

91

Page 67: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Value of Information• The value of information provides a bound on how much you

should be prepared to pay for a sensor. How much is a perfect fire sensor worth?

• Original maximum expected utility:• Maximum expected utility when we

know Fire:• Perfect fire sensor is worth:

-22.6

-2

Page 68: Computer Science CPSC  502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

• One-Off decisions– Compare and contrast stochastic single-stage (one-off) decisions vs. multistage (sequential)

decisions– Define a Utility Function on possible worlds– Define and compute optimal one-off decisions– Represent one-off decisions as single stage decision networks – Compute optimal decisions by Variable Elimination

• Sequential decision networks– Represent sequential decision problems as decision networks– Explain the non forgetting property

• Policies– Verify whether a possible world satisfies a policy– Define the expected utility of a policy – Compute the number of policies for a decision problem– Compute the optimal policy by Variable Elimination

• Define and compute value of information

68

Learning Goals For Decision Under Uncertainty