a hybridized planner for stochastic domains mausam and daniel s. weld university of washington,...
Post on 18-Dec-2015
218 Views
Preview:
TRANSCRIPT
A Hybridized Planner for Stochastic Domains
Mausam and Daniel S. WeldUniversity of Washington, Seattle
Piergiorgio BertoliITC-IRST, Trento
Planning under Uncertainty
(ICAPS’03 Workshop)
Qualitative (disjunctive) uncertainty
Which real problem can you solve?
Quantitative (probabilistic) uncertainty
Which real problem can you model?
The Quantitative View
Markov Decision Processmodels uncertainty with probabilistic
outcomesgeneral decision-theoretic frameworkalgorithms are slow
do we need the full power of decision theory?
is an unconverged partial policy any good?
The Qualitative View
Conditional PlanningModel uncertainty as logical disjunction of
outcomesexploits classical planning techniques FASTignores probabilities poor solutions
how bad are pure qualitative solutions?can we improve the qualitative policies?
HybPlan: A Hybridized Planner
combine probabilistic + disjunctive plannersproduces good solutions in intermediate timesanytime: makes effective use of resourcesbounds termination with quality guarantee
Quantitative View completes partial probabilistic policy by using qualitative
policies in some states
Qualitative View improves qualitative policies in more important regions
Outline
MotivationPlanning with Probabilistic Uncertainty
(RTDP)Planning with Disjunctive Uncertainty (MBP)Hybridizing RTDP and MBP (HybPlan)ExperimentsConclusions and Future Work
Markov Decision Process
< S, A, Pr, C, s0, G > S : a set of states
A : a set of actions
Pr : prob. transition model
C : cost model
s0 : start state
G : a set of goals
Find a policy (S ! A) minimizes expected cost
to reach a goal for an indefinite horizon for a fully observable Markov decision process.
Optimal cost function, J*, ~ optimal policy
Bellman Backup: Create better approximation to cost function @ s
Trial=simulate greedy policy & update visited states
Bellman Backup: Create better approximation to cost function @ s
Real Time Dynamic Programming
(Barto et al. ’95; Bonet & Geffner’03)
Repeat trials until cost function converges
Trial=simulate greedy policy & update visited states
Planning with Disjunctive Uncertainty
< S, A, T, s0, G > S : a set of states
A : a set of actions
T : disjunctive transition model
s0 : the start state
G : a set of goals
Find a strong-cyclic policy (S ! A) that guarantees
reaching a goal for an indefinite
horizon for a fully observable planning problem
Model Based Planner (Bertoli et. al.)
States, transitions, etc. represented logicallyUncertainty multiple possible successor states
Planning Algorithm Iteratively removes “bad” states.Bad = don’t reach anywhere or reach other bad states
Outline
MotivationPlanning with Probabilistic Uncertainty
(RTDP)Planning with Disjunctive Uncertainty
(MBP)Hybridizing RTDP and MBP (HybPlan)ExperimentsConclusions and Future Work
HybPlan Top Level Code0. run MBP to find a solution to goal1. run RTDP for some time2. compute partial greedy policy
(rtdp)
3. compute hybridized policy (hyb) by1. hyb(s) = rtdp(s) if visited(s) > threshold
2. hyb(s) = mbp(s) otherwise
4. clean hyb by removing1. dead-ends2. probability 1 cycles
5. evaluate hyb 6. save best policy obtained so far
repeat until1) resources exhaust or2)a satisfactory policy found
0
0
0
00
0
0
0 0
0
00
0 0
0
00
0
0
Goal
0
0
2
2
Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0Q1(s,N) = 1Q1(s,S) = Q1(s,W) = Q1(s,E) = 1J1(s) = 1
Let greedy action be North
Bellman Backup
1. run RTDP for some time
10
0
0
00
0
0
0 0
0
00
0 0
0
00
0
0
Goal
0
0
2
2
Simulation of Greedy Action
1. run RTDP for some time
0
0
0
0
0
0
0 0
0
00
0
0
00
2
0
Goal
0
0
2
2
1
1
1
Cost Function after First Trial
1. run RTDP for some time
0
2
Goal
2
1
1
1
0
Construct Hybridized Policy w/ MBP
3. compute hybridized policy (hyb)
(threshold = 0)
0
2
Goal
2
1
1
1
After first trial
0
J(hyb) = 5
5
4
3
2
4
3
Evaluate Hybridized Policy
5. evaluate hyb
6. store hyb
1 0
1
3
2
1
Probability 1 Cycles
0
repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken
1 0
1
3
2
1
Probability 1 Cycles
0
repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken
1 0
1
3
2
1
Probability 1 Cycles
0
repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken
1 0
1
3
2
1
Probability 1 Cycles
0
repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken
1 0
1
3
2
1
Probability 1 Cycles
0
01
01
2
Goal
2
repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken
0
2
Goal
2
1
1
1
After 1st trial
0
J(hyb) = 5
5
4
3
2
4
3
Error Bound
J*(s0) · 5J*(s0) ¸ 1) Error(hyb) = 5-1 = 4
Terminationwhen a policy of required error bound is foundwhen the planning time exhaustswhen the available memory exhausts
Propertiesoutputs a proper policyanytime algorithm (once MBP terminates)HybPlan = RTDP, if infinite resources availableHybPlan = MBP, if extremely limited resourcesHybPlan = better than both, otherwise
Outline
MotivationPlanning with Probabilistic Uncertainty
(RTDP)Planning with Disjunctive Uncertainty
(MBP)Hybridizing RTDP and MBP (HybPlan)Experiments
Anytime Properties Scalability
Conclusions and Future Work
Scalability
Problems Time before memory exhausts
J(rtdp) J(mbp) J(hyb)
Rov5 ~1100 sec 55.36 67.04 48.16
Rov2 ~800 sec 1 65.22 49.91
Mach9 ~1500 sec 143.95 66.50 48.49
Mach6 ~300 sec 1 71.56 71.56
Elev14 ~10000 sec
1 46.49 44.48
Elev15 ~10000 sec
1 233.07 87.46
Conclusions
First algorithm that integrates disjunctive and probabilistic planners.
Experiments show that HybPlan is anytime scales better than RTDP produces better quality solutions than MBP can interleaved planning and execution
Hybridized Planning: A General Notion
Hybridize other pairs of planners an optimal or close-to-optimal planner a sub-optimal but fast planner
to yield a planner that produces a good quality solution in intermediate running times
Examples POMDP : RTDP/PBVI with POND/MBP/BBSP Oversubscription Planning : A* with greedy solutions Concurrent MDP : Sampled RTDP with single-action RTDP
top related