1 (c) 2002-3, c. boutilier, e. hansen, d. weld logistics reading for wed project meetings thanks...

69
1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen

Upload: opal-davis

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

1(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Logistics

Reading for WedProject MeetingsThanks to Craig Boutilier, Eric Hansen

Page 2: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

2(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Outline

BDDs & ADDsMDP Review

Page 3: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

3(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD Definition

Defn• A Boolean Decision Diagram (BDD) is a directed

acyclic graph with two terminal nodes (0-terminal, 1-terminal). Each non-terminal node has an index to identify an input variable of the Boolean function and has two outgoing edges, called the 0-edge and the 1-edge.

Why care?• Compact representation of Boolean functions

• Bn B

Page 4: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

4(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

OBDD Definition

A OBDD is a BDD where input variables appear in a fixed order in all paths of the graph and no variable appears more than once on a path.

Page 5: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

5(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Example (x3 and x2) or not x1

x2

x1

0 1

00

01

1

1

x3

OBDD

10

1 1 1 1 1

Binary decision tree

x3

x2

x1

Page 6: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

6(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

1

(x3 and x2) or not x1

10

1 1 1 1 1

Binary decision tree

x3

x2

x1

Page 7: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

7(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

(x3 and x2) or not x1

10

1

Binary decision diagram

x3

x2

x1

0

After ELIMINATION

Page 8: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

8(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

(x3 and x2) or not x1

10

1

Binary decision diagram

x3

x2

x1

0

MERGING

Page 9: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

9(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

(x3 and x2) or not x1

10

1

Binary decision diagram

x3

x2

x1

0

After MERGING

Page 10: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

10(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

(x3 and x2) or not x1

10

1

Binary decision diagram

x3

x2

x1

0

MERGING

Page 11: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

11(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

(x3 and x2) or not x1

10

1

Binary decision diagram

x3

x2

x1

0

After MERGING

Page 12: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

12(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

(x3 and x2) or not x1

10

1

Binary decision diagram

x3

x2

x1

0

ELIMINATION

Page 13: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

13(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BDD reduction example

x2

x1

0 1

00

01

1

1

(x3 and x2) or not x1

x3 1

1

Binary decision diagram

OBDD x3

x2

x1

0

After ELIMINATION

Page 14: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

14(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Unary and Binary Operations

Negation: • Computing not f

• Just exchange 0-terminal and 1-terminal.

• Constant time

• No increase in size!

Page 15: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

15(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

0 1

x1

0 1

x2

x1 x2

x1 and x2

0

x2

0 1

x1

0

1

1

(x1 and x2) or x3

0

x2

0 1

x1

0

1

1

x3

1

0

Page 16: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

16(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Set function argument xi to constant k (0 or 1).

k F xi –1

xi +1

xn 

x1

F [xi =k]

Fx equivalent to F [x = 1]

Fx equivalent to F [x = 0]

Restriction Operation

Page 17: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

17(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Argument F

Restriction Execution Example

0

a

b

c

d

1 0

a

c

d

1

Restriction F[b=1]

0

c

d

1

Reduced Result

Page 18: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

18(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Properties Uniqueness

• With respect to each fixed variable order

• Reduced OBDD of a Boolean function f is unique

Is f satisfiable? Operations

• Function=, apply, restrict, compose Could just merge and reduce Faster to walk both trees, building new one

• Polynomial time in size of BDD

• Fast C library implementations

Popular• Model checking … And now AI…

Page 19: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

19(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Size of BDDs

n-input Boolean functions Require 2n bits in worst-case

• Truth tables always require 2n bits

Many practical functions require much less space in BDD representation.

2 2

n

Page 20: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

20(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Good Ordering Bad Ordering

Linear Growth

0

b3

a3

b2

a2

1

b1

a1

Exponential Growth

a3 a3

a2

b1 b1

a3

b2

b1

0

b3

b2

1

b1

a3

a2

a1

)()()( 332211 bababa

Finding Good Ordering = NPC

Page 21: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

21(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Symbolic Manipulation with OBDDs

Strategy• Represent data as set of OBDDs

Identical variable orderings

• Express solution method as sequence of symbolic ops Sequence of constructor & query operations

• Implement each operation by OBDD manipulation Do all the work in the constructor operations

Key Algorithmic Properties• Arguments are OBDDs with identical variable orderings

• Result is OBDD with same ordering

• Each step polynomial complexity (in |OBDD|)

Page 22: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

22(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

A0 /1

Set Operations

A

B

UnionA

B

Intersection

Characteristic Functions• A {0,1}n

Set of bit vectors of length n

• Represent set A as Boolean function A of n variables X A if and only if A(X ) = 1

Page 23: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

23(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Algebraic Decision Diagram (ADD)

Defn• A directed acyclic graph with k terminal nodes, each a

real number. • Each non-terminal node has an index to identify an input

variable of the Boolean function and has two outgoing edges, called the 0-edge and the 1-edge.

Why care?• Compact representation of functions: Bn R

Efficient operations• Add, Multiply, Max, …

Page 24: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

24(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Markov Decision ProcessesAn MDP has four components, S, A, R, Pr:

• (finite) state set S (|S| = n)• (finite) action set A (|A| = m)• transition function Pr(s,a,t)

each Pr(s,a,-) is a distribution over Srepresented by set of n x n stochastic matrices

• bounded, real-valued reward function R(s)represented by an n-vectorcan be generalized to include action costs: R(s,a)can be stochastic (but replacable by expectation)

Model easily generalizable to countable or continuous state and action spaces

Page 25: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

25(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

System Dynamics

Finite State Space S

State s1013: Loc = 236 Joe needs printout Craig needs coffee ...

Page 26: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

26(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

System Dynamics

Finite Action Space APick up Printouts?Go to Coffee Room?Go to charger?

Page 27: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

27(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

System Dynamics

Transition Probabilities: Pr(si, a, sj)

Prob. = 0.95

Page 28: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

28(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

System Dynamics

Transition Probabilities: Pr(si, a, sk)

Prob. = 0.05

s1 s2 ... sn

s1 0.9 0.05 ... 0.0s2 0.0 0.20 ... 0.1

sn 0.1 0.0 ... 0.0

...

Page 29: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

29(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Reward Process

Reward Function: R(si)- action costs possible

Reward = -10

Rs1 12s2 0.5

sn 10

...

.

.

Page 30: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

30(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Graphical View of MDP

St

Rt

St+1

At

Rt+1

St+2

At+1

Rt+2

Page 31: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

31(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Assumptions

Markovian dynamics (history independence)• Pr(St+1|At,St,At-1,St-1,..., S0) = Pr(St+1|At,St)

Markovian reward process• Pr(Rt|At,St,At-1,St-1,..., S0) = Pr(Rt|At,St)

Stationary dynamics and reward• Pr(St+1|At,St) = Pr(St’+1|At’,St’) for all t, t’

Full observability• though we can’t predict what state we will reach when

we execute an action, once it is realized, we know what it is

Page 32: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

32(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Policies

Nonstationary policy •π:S x T → A•π(s,t) is action to do at state s with t-stages-to-go

Stationary policy •π:S → A•π(s) is action to do at state s (regardless of time)• analogous to reactive or universal plan

These assume or have these properties:• full observability• history-independence• deterministic action choice

Page 33: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

33(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Value Iteration (Bellman 1957)Markov property allows exploitation of DP principle for optimal policy construction

• no need to enumerate |A|Tn possible policies

Value Iteration

)'(' )',,Pr(max)()( 1 ss VsassRsV kk

a

ssRsV ),()(0

)'(' )',,Pr(maxarg),(* 1 ss Vsasks k

a

Vk is optimal k-stage-to-go value function

Bellman backup

Page 34: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

34(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Value Iteration

0.3

0.7

0.4

0.6

s4

s1

s3

s2

Vt+1Vt

0.4

0.3

0.7

0.6

0.3

0.7

0.4

0.6

Vt-1Vt-2

0.7 Vt+1 (s1) + 0.3 Vt+1 (s4)

0.4 Vt+1 (s2) + 0.6 Vt+1 (s3)

Vt(s4) = R(s4)+max {

}

Page 35: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

35(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Value Iteration

s4

s1

s3

s2

0.3

0.7

0.4

0.6

0.3

0.7

0.4

0.6

0.3

0.7

0.4

0.6

Vt+1VtVt-1Vt-2

t(s4) = max { }

Page 36: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

36(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Value Iteration

Note how DP is used• optimal soln to k-1 stage problem can be used without

modification as part of optimal soln to k-stage problem

Because of finite horizon, policy nonstationaryIn practice, Bellman backup computed using:

ass VsassRsaQ kk ),'(' )',,Pr()(),( 1

),(max)( saQsV ka

k

Page 37: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

37(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Complexity

T iterationsAt each iteration |A| computations of n x n matrix times n-vector: O(|A|n3)

Total O(T|A|n3)Can exploit sparsity of matrix: O(T|A|n2)

Page 38: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

38(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Summary

Resulting policy is optimal

• convince yourself of this; convince that nonMarkovian, randomized policies not necessary

Note: optimal value function is unique, but optimal policy is not

kssVsV kk ,,),()(*

Page 39: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

39(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Discounted Infinite Horizon MDPsTotal reward problematic (usually)

• many or all policies have infinite expected reward

• some MDPs (e.g., zero-cost absorbing states) OK

“Trick”: introduce discount factor 0 ≤ β < 1• future rewards discounted by β per time step

Note:

Motivation: economic? failure prob? convenience?

],|[)(0

sREsVt

ttk

max

0

max

1

1][)( RREsV

t

t

Page 40: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

40(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Some Notes

Optimal policy maximizes value at each state

Optimal policies guaranteed to exist (Howard60)

Can restrict attention to stationary policies

• why change action at state s at new time t?

We define for some optimal π)()(* sVsV

Page 41: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

41(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Value Equations (Howard 1960)

Value equation for fixed policy value

Bellman equation for optimal value function

)'(' )'),(,Pr()()( ss VsssβsRsV

)'(' *)',,Pr(max)()(* ss VsasβsRsVa

Page 42: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

42(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Policy Iteration

Given fixed policy, can compute its value exactly:

Policy iteration exploits this

)'(' )'),(,Pr()()( ss VssssRsV

1. Choose a random policy π2. Loop:

(a) Evaluate Vπ

(b) For each s in S, set (c) Replace π with π’Until no improving action possible at any state

)'(' )',,Pr(maxarg)(' ss Vsassa

Page 43: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

43(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Policy Iteration Notes

Convergence assured (Howard)• intuitively: no local maxima in value space, and each

policy must improve value; since finite number of policies, will converge to optimal policy

Very flexible algorithm• need only improve policy at one state (not each state)

Gives exact value of optimal policyGenerally converges much faster than VI

• each iteration more complex, but fewer iterations

• quadratic rather than linear rate of convergence

Page 44: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

44(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Outline

Page 45: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

45(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Logical or Feature-based Problems

AI problems are most naturally viewed in terms of logical propositions, random variables, objects and relations, etc. (logical, feature-based)

E.g., consider “natural” spec. of robot example• propositional variables: robot’s location, Craig wants

coffee, tidiness of lab, etc.

• could easily define things in first-order terms as well

|S| exponential in number of logical variables• Spec./Rep’n of problem in state form impractical

• Explicit state-based DP impractical

• Bellman’s curse of dimensionality

Page 46: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

46(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Solution?

Require structured representations• exploit regularities in probabilities, rewards

• exploit logical relationships among variables

Require structured computation• exploit regularities in policies, value functions

• can aid in approximation (anytime computation)

We start with propositional represnt’ns of MDPs• probabilistic STRIPS

• dynamic Bayesian networks

• BDDs/ADDs

Page 47: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

47(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Propositional Representations

States decomposable into state variables

Structured representations the norm in AI• STRIPS, Sit-Calc., Bayesian networks, etc.

• Describe how actions affect/depend on features

• Natural, concise, can be exploited computationally

Same ideas can be used for MDPs

nXXXS 21

Page 48: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

48(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Robot Domain as Propositional MDP

Propositional variables for single user version• Loc (robot’s locat’n): Off, Hall, MailR, Lab, CoffeeR• T (lab is tidy): boolean• CR (coffee request outstanding): boolean• RHC (robot holding coffee): boolean• RHM (robot holding mail): boolean• M (mail waiting for pickup): boolean

Actions/Events• move to an adjacent location, pickup mail, get coffee, deliver

mail, deliver coffee, tidy lab• mail arrival, coffee request issued, lab gets messy

Rewards• rewarded for tidy lab, satisfying a coffee request, delivering mail• (or penalized for their negation)

Page 49: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

49(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

State Space

State of MDP: assignment to these six variables• 160 states

• grows exponentially with number of variables

Transition matrices• 25600 (or 25440) parameters required per matrix

• one matrix per action (6 or 7 or more actions)

Reward function• 160 reward values needed

Factored state and action descriptions will break this exponential dependence (generally)

Page 50: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

50(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Probabilistic STRIPS

PSTRIPS is a generalization of STRIPS that allows compact action (trans. matrix) represent’n

Intuition:• state = a list of variable values (one per variable)• state transitions = changes in variable values• actions tend to affect only a small number of variables

PSTRIPS gains compactness by describing only how particular variables change under an action

• each distinct outcome of a stochastic action will be described by a “change list” w/ associated probability

• changes/probs can vary with initial conditions

Page 51: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

51(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Example of PSTRIPS Action

Procedural semantics: replace state valuesMuch more concise than explicit transition matrix

Condition Outcome Probability

Off, RHC -CR, -HRC 0.8

-HRC 0.1

0.1

-Off, RHC -HRC 0.8

0.2

-RHC 1.0

Action: DelC

Page 52: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

56(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Dynamic Bayesian Networks (DBNs)

Bayesian networks (BNs) a common representation for probability distributions

• A graph (DAG) represents conditional independence

• Tables (CPTs) quantify local probability distributions

Recall Pr(s,a,-) a distribution over S (X1 x ... x Xn)• BNs can be used to represent this too

Before discussing dynamic BNs (DBNs), we’ll have a brief excursion into Bayesian networks

Page 53: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

57(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Bayes Nets

In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential

space for representation inference

BNs provide a graphical representation of conditional independence relations in P

• usually quite compact

• requires assessment of fewer parameters, those being quite natural (e.g., causal)

• efficient (usually) inference: query answering and belief update

Page 54: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

58(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Extreme Independence

If X1, X2,... Xn are mutually independent, then

P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn)

Joint can be specified with n parameters• cf. the usual 2n-1 parameters required

Though such extreme independence is unusual, some conditional independence is common in most domains

BNs exploit this conditional independence

Page 55: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

59(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

An Example Bayes Net

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Pr(B=t) Pr(B=f) 0.05 0.95

Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)

Radio

Page 56: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

60(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Earthquake Example (con’t)

If I know whether Alarm, no other evidence influences my degree of belief in Nbr1Calls

• P(N1|N2,A,E,B) = P(N1|A)

• also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E)

By the chain rule we haveP(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)·

P(A|E,B) ·P(E|B) ·P(B)

= P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B)

Full joint requires only 10 parameters (cf. 32)

Earthquake Burglary

Alarm

Nbr2CallsNbr1Calls

Radio

Page 57: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

61(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BNs: Qualitative Structure

Graphical structure of BN reflects conditional independence among variables

Each variable X is a node in the DAGEdges denote direct probabilistic influence

• usually interpreted causally• parents of X are denoted Par(X)

X is conditionally independent of all

nondescendents given its parents• Graphical test exists for more general independence• “Markov Blanket”

Page 58: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

62(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

BNs: Quantification

To complete specification of joint, quantify BNFor each variable X, specify CPT: P(X | Par(X))

• number of params locally exponential in |Par(X)|

If X1, X2,... Xn is any topological sort of the

network, then we are assured:

P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1)

… P(X2 | X1) · P(X1)

= P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1)

Page 59: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

63(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Inference in BNs

The graphical independence representation

gives rise to efficient inference schemes

We generally want to compute Pr(X) or Pr(X|E)

where E is (conjunctive) evidence

Computations organized network topology

One simple algorithm: variable elimination (VE)

Page 60: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

64(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Variable Elimination

A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1)

• CPTs are factors, e.g., P(A|E,B) function of A,E,B

VE works by eliminating all variables in turn until there is a factor with only query variable

To eliminate a variable:• join all factors containing that variable (like DB)

• sum out the influence of the variable on new factor

• exploits product form of joint distribution

Page 61: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

65(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Example of VE: P(N1) Earthqk Burgl

Alarm

N2N1

P(N1)

= N2,A,B,E P(N1,N2,A,B,E)

= N2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E)

= AP(N1|A) N2P(N2|A) BP(B) EP(A|B,E)P(E)

= AP(N1|A) N2P(N2|A) BP(B) f1(A,B)

= AP(N1|A) N2P(N2|A) f2(A)

= AP(N1|A) f3(A)

= f4(N1)

Page 62: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

66(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Notes on VE

Each operation is a simply multiplication of factors and summing out a variable

Complexity determined by size of largest factor• e.g., in example, 3 vars (not 5)

• linear in number of vars, exponential in largest factor

• elimination ordering has great impact on factor size

• optimal elimination orderings: NP-hard

• heuristics, special structure (e.g., polytrees) exist

Practically, inference is much more tractable using structure of this sort

Page 63: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

67(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Dynamic BNs

Dynamic Bayes net action representation• one Bayes net for each action a, representing the set

of conditional distributions Pr(St+1|At,St)

• each state variable occurs at time t and t+1

• dependence of t+1 variables on t variables and other t+1 variables provided (acyclic)

• no quantification of time t variables given (since we don’t care about prior over St)

Page 64: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

68(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

DBN Representation: DelC

Tt

Lt

CRt

RHCt

Tt+1

Lt+1

CRt+1

RHCt+1

fCR(Lt,CRt,RHCt,CRt+1)

fT(Tt,Tt+1)

L CR RHC CR(t+1) CR(t+1)

O T T 0.2 0.8

E T T 1.0 0.0

O F T 0.0 1.0

E F T 0.0 1.0

O T F 1.0 0.1

E T F 1.0 0.0

O F F 0.0 1.0

E F F 0.0 1.0

T T(t+1) T(t+1)

T 0.91 0.09

F 0.0 1.0

RHMt RHMt+1

Mt Mt+1

fRHM(RHMt,RHMt+1)RHM R(t+1) R(t+1)

T 1.0 0.0

F 0.0 1.0

Page 65: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

69(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Benefits of DBN Representation

Pr(Rmt+1,Mt+1,Tt+1,Lt+1,Ct+1,Rct+1 | Rmt,Mt,Tt,Lt,Ct,Rct)

= fRm(Rmt,Rmt+1) * fM(Mt,Mt+1) * fT(Tt,Tt+1) * fL(Lt,Lt+1) * fCr(Lt,Crt,Rct,Crt+1) * fRc(Rct,Rct+1)

- Only 48 parameters vs. 25440 for matrix

-Removes global exponential dependence

s1 s2 ... s160

s1 0.9 0.05 ... 0.0s2 0.0 0.20 ... 0.1

s160 0.1 0.0 ... 0.0

...

Tt

Lt

CRt

RHCt

Tt+1

Lt+1

CRt+1

RHCt+1

RHMt RHMt+1

Mt Mt+1

Page 66: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

70(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Structure in CPTs

Notice that there’s regularity in CPTs• e.g., fCr(Lt,Crt,Rct,Crt+1) has many similar entries

• corresponds to context-specific independence in BNs

Compact function representations for CPTs can be used to great effect

• decision trees

• algebraic decision diagrams (ADDs/BDDs)

• Horn rules

Page 67: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

71(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Action Representation – DBN/ADD

CR

0.0 1.0 0.8

RHC

L

CR(t+1)CR(t+1)CR(t+1)

0.2

Algebraic Decision Diagram (ADD)Tt

Lt

CRt

RHCt

Tt+1

Lt+1

CRt+1

RHCt+1

RHMt RHMt+1

Mt Mt+1

f

t

t

o

t

e

f

ffft

t

fCR(Lt,CRt,RHCt,CRt+1)

Page 68: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

72(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Reward Representation

Rewards represented with ADDs in a similar fashion

• save on 2n size of vector rep’n

JC

10 012

CP

CC

JP BC JP

9

Page 69: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics  Reading for Wed  Project Meetings  Thanks to Craig Boutilier, Eric Hansen

73(c) 2002-3, C. Boutilier, E. Hansen, D. Weld

Reward Representation

Rewards represented similarly • save on 2n size of vector rep’n

Additive independent reward also very common

• as in multiattribute utility theory

• offers more natural and concise representation for many types of problems

10 0

CP

CC

CT

20 0

+