probabilistic temporal planning with uncertain durations

Probabilistic Temporal Planning with Uncertain Durations

Mausam Joint work with Daniel S. WeldUniversity of WashingtonSeattle

Motivation

Three features of real world planning domains

Concurrency

Calibrate while rover moves Uncertain Effects

‘Grip a rock’ may fail Uncertain Durative actions

Wheels spin, so speed uncertain

Contributions

Novel Challenges Large number of decision epochs

Results to manage this blowup in different cases Large branching factors

Approximation algorithms Five planning algorithms

DURprun : optimal DURsamp : near-optimal DURhyb : anytime with user defined error DURexp : super-fast DURarch : balance between speed and quality

Identify fundamental issues for future research

Outline of the talk

Background Theory Algorithms and Experiments Summary and Future Work

Outline of the talk

Background MDP Decision Epochs: happenings, pivots

Theory Algorithms and Experiments Summary and Future Work

Markov Decision Process

S : a set of states, factored into Boolean

variables.A : a set of actionsPr (S£A£S! [0,1]): the transition modelC (A! R) : the cost models0 : the start stateG : a set of absorbing goals

unit duration

GOAL of an MDP

Find a policy (S ! A) which:minimises expected cost of reaching a

goal for a fully observable Markov decision process if the agent executes for indefinite

horizon.Algorithms

Value iteration, Real Time Dynamic Programming, etc.

iterative dynamic programming algorithms

Definitions (Durative Actions)

Assumption: (Prob.) TGP Action modelPreconditions must hold until end of action.Effects are usable only at the end of action.

Decision epochs: time point when a new action may be started.

Happenings: A point when action finishes.

Pivot: A point when action could finish.

Outline of the talk

Background Theory

Explosion of Decision Epochs Algorithms and Experiments Summary and Future Work

Decision Epochs (TGP Action Model)

Deterministic Durations [Mausam&Weld05] :Decision Epochs = set of happenings

Uncertain Durations:Non-termination has information!Theorem: Decision Epochs = set of

pivots

Illustration: A bimodal distribution

Duration distribution of aExpect

ed C

om

ple

tion T

ime

Conjecture

if all actions haveduration distributions independent of

effectsunimodal duration distributions

thenDecision Epochs = set of happenings

Outline of the talk

Background Theory Algorithms and Experiments

Expected Durations Planner Archetypal Durations Planner

Summary and Future Work

Planning with Durative Actions

MDP in an augmented state space

<X,;>

<X1,{(a,4), (c,4)}>X1 : Application of b on X.

0 2 4 6

X

a

b

c

Time

Uncertain Durations: Transition Fn

<X,;>

<Xa, {(b,1)}>

<Xb, {(a,1)}>

<Xab, ;>

a, b0.2

5

a

b

b

a

b

a

a

b

0.2

5

0.25

0.25

<Xab, ;>

action a : uniform(1,2)action b : uniform(1,2)

Branching Factor

If n actionsm possible durationsr probabilistic effects

Then Potential Successors(m-1)[(r+1)n – rn – 1] +

rn

Algorithms

Five planning algorithms DURprun : optimal

DURsamp : near-optimal

DURhyb : anytime with user defined error

DURexp : super-fast

DURarch : balance between speed and quality

Expected Durations Planner (DURexp)

assign each action a deterministic duration equal to the expected value of its distribution.

build a deterministic duration policy for this domain.

repeat execute this policy and wait for interrupt

(a) action terminated as expected – do nothing (b) action terminated early – replan from this state (c) action terminated late – revise a’s deterministic duration and replan for this domain

until goal is reached

Planning Time

Planning Time for Rover and Machine-Shop

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10 Problems

Pla

nn

ing

Tim

e (

in s

ec

)

Pruned

Sampled

Hybrid

Exp-Dur

Rover Machine-Shop

DURprun

DURsamp

DURhyb

DURexp

Multi-modal distributions

Recall: conjecture holds only for unimodal distributions

happenings if unimodal

Decision epochs =pivots if

multimodal

Multi-modal Durations: Transition Fn

<X,;>

<Xa, {(b,1)}>

<Xb, {(a,1)}>

<Xab, ;>

a, b0.2

5

a

b

b

a

b

a

a

b

0.2

5

0.25

0.25

<X, {(a,1), (b,1)>

action a : uniform(1,2)action b : 50% : 1

50% : 3

Multi-modal Distributions

Expected Durations Planner (Durexp)One deterministic duration per actionBig approximation for multi-modal

distribution

Archetypal Durations Planner (Durarch)Limited uncertainty in durationsOne duration per mode of distribution

Planning Time (multi-modal)

Planning time in MachineShop (multi-modal)

100

1000

10000

11 12 13 14 15 16 Problems

Pla

nn

ing

tim

e (

log

sca

le)

Pruned

Sampled

Hybrid

Arch-Dur

Exp-Dur

DURsamp

DURprun

DURhyb

DURarch

DURexp

Expected Make-span (multi-modal)

Make-span in MachineShop (multi-modal)

14

16

18

20

22

24

26

28

11 12 13 14 15 16 Problems

J*(s

0)

DUR-prun

DUR-samp

DUR-hyb

DUR-arch

DUR-expDURexp

DURarch

DURhyb

DURprunDURsamp

Outline of the talk

Background Theory Algorithms and Experiments Summary and Future Work

Observations on Concurrency

Summary

Large number of Decision EpochsResults to manage explosion in

specific cases

Large branching factors Expected Durations Planner Archetypal Durations Planner (multi-

modal)

Handling Complex Action Models

So Far: Probabilistic TGPPreconditions hold over-all.Effects usable only at end.

What about: Probabilistic PDDL2.1 ?Preconditions at-start, over-all, at-endEffects at-start, at-end

Decision epochs must be arbitrary points.

Ramifications

Result independent of uncertainty!! Existing decision epoch planners are

incomplete. SAPA, Prottle, etc. All IPC winners

p,: q

a

b

GG

q : p

qp preconditions

effects

Related Work

Tempastic (Younes and Simmons’ 04)Generate, Test and Debug

Prottle (Little, Aberdeen, Thiebaux’ 05)Planning Graph based heuristics

Uncertain Durations w/o concurrencyFoss and Onder’05Boyan and Littman’00Bresina et.al.’02, Dearden et.al.’03

probabilistic temporal planning with uncertain durations

Documents

action b

b action

end of action

action finishes

new action

tgp action modelpreconditions

state c action

xabctimeuncertain durations