![Page 1: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/1.jpg)
A Framework for Planning in Continuous-time Stochastic Domains
Håkan L. S. YounesCarnegie Mellon University
David J. Musliner Reid G. SimmonsHoneywell Laboratories Carnegie Mellon
University
![Page 2: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/2.jpg)
Introduction
Policy generation for complex domains Uncertainty in outcome and timing of
actions and events Time as a continuous quantity Concurrency
Rich goal formalism Achievement, maintenance, prevention Deadlines
![Page 3: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/3.jpg)
Motivating Example
Deliver package from CMU to Honeywell
CMU PIT
PittsburghMSP
Honeywell
Minneapolis
![Page 4: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/4.jpg)
Elements of Uncertainty
Uncertain duration of flight and taxi ride
Plane can get full without reservation
Taxi might not be at airport when arriving in Minneapolis
Package can get lost at airports
![Page 5: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/5.jpg)
Modeling Uncertainty
Associate a delay distribution F(t) with each action/event a
F(t) is the cumulative distribution function for the delay from when a is enabled until it triggers
driving at airport
arrive
U(20,40)
Pittsburgh Taxi
![Page 6: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/6.jpg)
Concurrency
Concurrent semi-Markov processes
driving at airport
U(20,40)
Pittsburgh Taxi
at airport moving
arrive
move
return
Exp(1/40)
U(10,20)Minneapolis Taxi
PT drivingMT at airport
PT drivingMT moving
t=0
t=24
MT move
Generalized semi-Markov process
![Page 7: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/7.jpg)
Rich Goal Formalism
Goals specified as CSL formulae ::= true | a | | | Pr≥ () ::= U≤t | ◊≤t | □≤t
![Page 8: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/8.jpg)
Goal for Motivating Example
Probability at least 0.9 that the package reaches Honeywell within 300 minutes without getting lost on the way Pr≥0.9(pkg lost U≤300 pkg@Honeywell)
![Page 9: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/9.jpg)
Problem Specification
Given: Complex domain model
Stochastic discrete event system Initial state Probabilistic temporally extended goal
CSL formula
Wanted: Policy satisfying goal formula in initial
state
![Page 10: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/10.jpg)
Generate, Test and Debug [Simmons 88]
Generate initial policy
Test if policy is good
Debug and repair policy
good
badrepeat
![Page 11: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/11.jpg)
Generate
Ways of generating initial policy Generate policy for relaxed problem Use existing policy for similar problem Start with null policy Start with random policy
Not focus of this talk! Generate
Test
Debug
![Page 12: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/12.jpg)
Test
Use discrete event simulation to generate sample execution paths
Use acceptance sampling to verify probabilistic CSL goal conditions
Generate
Test
Debug
![Page 13: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/13.jpg)
Debug
Analyze sample paths generated in test step to find reasons for failure
Change policy to reflect outcome of failure analysis
Generate
Test
Debug
![Page 14: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/14.jpg)
More on Test Step
Generate initial policy
Test if policy is good
Debug and repair policy
Test
![Page 15: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/15.jpg)
Error Due to Sampling
Probability of false negative: ≤ Rejecting a good policy
Probability of false positive: ≤ Accepting a bad policy
(1−)-soundness
![Page 16: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/16.jpg)
Acceptance Sampling
Hypothesis: Pr≥()
![Page 17: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/17.jpg)
Performance of Test
Actual probability of holding
Pro
bab
ility
of
acc
ep
tin
gPr ≥
(
) as
tru
e
1 –
![Page 18: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/18.jpg)
Ideal Performance
Actual probability of holding
Pro
bab
ility
of
acc
ep
tin
gPr ≥
(
) as
tru
e
1 –
False negatives
False positives
![Page 19: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/19.jpg)
Realistic Performance
– +
Indifference region
Actual probability of holding
Pro
bab
ility
of
acc
ep
tin
gPr ≥
(
) as
tru
e
1 –
False negatives
False positives
![Page 20: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/20.jpg)
SequentialAcceptance Sampling [Wald 45]
Hypothesis: Pr≥()
True, false,or anothersample?
![Page 21: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/21.jpg)
Graphical Representation of Sequential Test
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 22: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/22.jpg)
Graphical Representation of Sequential Test
We can find an acceptance line and a rejection line given , , , and
Reject
Accept
Continue sampling
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s A,,,(n)
R,,,(n)
![Page 23: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/23.jpg)
Graphical Representation of Sequential Test
Reject hypothesis
Reject
Accept
Continue sampling
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 24: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/24.jpg)
Graphical Representation of Sequential Test
Accept hypothesis
Reject
Accept
Continue sampling
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 25: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/25.jpg)
Anytime Policy Verification
Find best acceptance and rejection lines after each sample in terms of and
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 26: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/26.jpg)
Verification Example
Initial policy for example problem
CPU time (seconds)
Err
or
pro
babili
ty
0
0.1
0.2
0.3
0.4
0.5
0 0.02 0.04 0.06 0.08 0.1 0.12
rejectaccept
=0.01
![Page 27: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/27.jpg)
More on Debug Step
Generate initial policy
Test if policy is good
Debug and repair policyDebug
![Page 28: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/28.jpg)
Role of Negative Sample Paths
Negative sample paths provide evidence on how policy can fail “Counter examples”
![Page 29: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/29.jpg)
Generic Repair Procedure
1. Select some state along some negative sample path
2. Change the action planned for the selected state
Need heuristics to make informed state/action choices
![Page 30: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/30.jpg)
Scoring States
Assign –1 to last state along negative sample path and propagate backwards
Add over all negative sample pathss9 failures1 s5s2
s1 s5 s3 failure
−1−−2−3−4
−1−−2−3
s5
s5
![Page 31: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/31.jpg)
Example
Package gets lost at Minneapolis airport while waiting for the taxi
Repair: store package until taxi arrives
![Page 32: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/32.jpg)
Verification of Repaired Policy
CPU time (seconds)
Err
or
pro
babili
ty
=0.01
0
0.1
0.2
0.3
0.4
0.5
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
rejectaccept
![Page 33: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/33.jpg)
Comparing Policies
Use acceptance sampling: Pair samples from the verification of
two policies Count pairs where policies differ Prefer first policy if probability is at
least 0.5 of pairs where first policy is better
![Page 34: A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell](https://reader035.vdocuments.mx/reader035/viewer/2022070323/56649dc75503460f94abc028/html5/thumbnails/34.jpg)
Summary
Framework for dealing with complex stochastic domains
Efficient sampling-based anytime verification of policies
Initial work on debug and repair heuristics