a hybridized planner for stochastic domains mausam and daniel s. weld university of washington,...

A Hybridized Planner for Stochastic Domains

Mausam and Daniel S. WeldUniversity of Washington, Seattle

Piergiorgio BertoliITC-IRST, Trento

Planning under Uncertainty

(ICAPS’03 Workshop)

Qualitative (disjunctive) uncertainty

Which real problem can you solve?

Quantitative (probabilistic) uncertainty

Which real problem can you model?

The Quantitative View

Markov Decision Processmodels uncertainty with probabilistic

outcomesgeneral decision-theoretic frameworkalgorithms are slow

do we need the full power of decision theory?

is an unconverged partial policy any good?

The Qualitative View

Conditional PlanningModel uncertainty as logical disjunction of

outcomesexploits classical planning techniques FASTignores probabilities poor solutions

how bad are pure qualitative solutions?can we improve the qualitative policies?

HybPlan: A Hybridized Planner

combine probabilistic + disjunctive plannersproduces good solutions in intermediate timesanytime: makes effective use of resourcesbounds termination with quality guarantee

Quantitative View completes partial probabilistic policy by using qualitative

policies in some states

Qualitative View improves qualitative policies in more important regions

Outline

MotivationPlanning with Probabilistic Uncertainty

(RTDP)Planning with Disjunctive Uncertainty (MBP)Hybridizing RTDP and MBP (HybPlan)ExperimentsConclusions and Future Work

Markov Decision Process

< S, A, Pr, C, s0, G > S : a set of states

A : a set of actions

Pr : prob. transition model

C : cost model

s0 : start state

G : a set of goals

Find a policy (S ! A) minimizes expected cost

to reach a goal for an indefinite horizon for a fully observable Markov decision process.

Optimal cost function, J*, ~ optimal policy

s0 Goal

Example

Longer path

Wrong direction,but goal still reachable

All states aredead-ends

Optimal State Costs

1 Goal

Optimal Policy

Bellman Backup: Create better approximation to cost function @ s

Trial=simulate greedy policy & update visited states

Bellman Backup: Create better approximation to cost function @ s

Real Time Dynamic Programming

(Barto et al. ’95; Bonet & Geffner’03)

Repeat trials until cost function converges

Trial=simulate greedy policy & update visited states

Planning with Disjunctive Uncertainty

< S, A, T, s0, G > S : a set of states

A : a set of actions

T : disjunctive transition model

s0 : the start state

G : a set of goals

Find a strong-cyclic policy (S ! A) that guarantees

reaching a goal for an indefinite

horizon for a fully observable planning problem

Model Based Planner (Bertoli et. al.)

States, transitions, etc. represented logicallyUncertainty multiple possible successor states

Planning Algorithm Iteratively removes “bad” states.Bad = don’t reach anywhere or reach other bad states

MBP Policy

Sub-optimal solution

Outline

(RTDP)Planning with Disjunctive Uncertainty

(MBP)Hybridizing RTDP and MBP (HybPlan)ExperimentsConclusions and Future Work

HybPlan Top Level Code0. run MBP to find a solution to goal1. run RTDP for some time2. compute partial greedy policy

(rtdp)

3. compute hybridized policy (hyb) by1. hyb(s) = rtdp(s) if visited(s) > threshold

2. hyb(s) = mbp(s) otherwise

4. clean hyb by removing1. dead-ends2. probability 1 cycles

5. evaluate hyb 6. save best policy obtained so far

repeat until1) resources exhaust or2)a satisfactory policy found

First RTDP Trial

1. run RTDP for some time

Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0Q1(s,N) = 1Q1(s,S) = Q1(s,W) = Q1(s,E) = 1J1(s) = 1

Let greedy action be North

Bellman Backup

Simulation of Greedy Action

Continuing First Trial

Finishing First Trial

Cost Function after First Trial

Partial Greedy Policy

2. compute greedy policy (rtdp)

Construct Hybridized Policy w/ MBP

3. compute hybridized policy (hyb)

(threshold = 0)

After first trial

J(hyb) = 5

Evaluate Hybridized Policy

5. evaluate hyb

6. store hyb

Second Trial

Absence of MBP Policy

MBP Policy doesn’t exist!no path to goal

Third Trial

Probability 1 Cycles

repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken

After 1st trial

J(hyb) = 5

Error Bound

J*(s0) · 5J*(s0) ¸ 1) Error(hyb) = 5-1 = 4

Terminationwhen a policy of required error bound is foundwhen the planning time exhaustswhen the available memory exhausts

Propertiesoutputs a proper policyanytime algorithm (once MBP terminates)HybPlan = RTDP, if infinite resources availableHybPlan = MBP, if extremely limited resourcesHybPlan = better than both, otherwise

Outline

(RTDP)Planning with Disjunctive Uncertainty

(MBP)Hybridizing RTDP and MBP (HybPlan)Experiments

Anytime Properties Scalability

Conclusions and Future Work

Domains

NASA Rover Domain Factory Domain Elevator domain

Anytime Properties

Scalability

Problems Time before memory exhausts

J(rtdp) J(mbp) J(hyb)

Rov5 ~1100 sec 55.36 67.04 48.16

Rov2 ~800 sec 1 65.22 49.91

Mach9 ~1500 sec 143.95 66.50 48.49

Mach6 ~300 sec 1 71.56 71.56

Elev14 ~10000 sec

1 46.49 44.48

Elev15 ~10000 sec

1 233.07 87.46

Conclusions

First algorithm that integrates disjunctive and probabilistic planners.

Experiments show that HybPlan is anytime scales better than RTDP produces better quality solutions than MBP can interleaved planning and execution

Hybridized Planning: A General Notion

Hybridize other pairs of planners an optimal or close-to-optimal planner a sub-optimal but fast planner

to yield a planner that produces a good quality solution in intermediate running times

Examples POMDP : RTDP/PBVI with POND/MBP/BBSP Oversubscription Planning : A* with greedy solutions Concurrent MDP : Sampled RTDP with single-action RTDP

a hybridized planner for stochastic domains mausam and daniel s. weld university of washington,...

s slide

goal optimal policy

trento slide

disjunctive uncertainty

set of states

cost model s

states qualitative view

bad states

Documents

bertoli amicus brief

miriam bertoli - travelnext - aprile 2014 - sharing economy

yeh hansta hua mausam farhat ishtiaq

a hybridized planner for stochastic domains

gramin krishi mausam sewa - kiran.nic.in

juliana bertoli da silva dez/2006

p. bertoli - italia d'oro

mausam - cse.iitd.ernet.inmausam/mausam-cv.pdf · mausam...

barishon k mausam main by surraya shahab

hybridized electromagnetic article triboelectric ... ·...

gramin krishi mausam sewa - kiran

open language learning for information...

zard mausam by rah tja been

(slides from mausam) - github...

mausam 27 final - imd

20111027 esri euc bertoli smart energy

bertoli ambiente

mausam october - december 2013 - the corner house ·...

bertoli, sonata settima2 - international double reed...

surkh gulabon kay mausam