decision-theoretic planning with asynchronous events
DESCRIPTION
Decision-Theoretic Planning with Asynchronous Events. Håkan L. S. Younes Carnegie Mellon University. Introduction. Asynchronous processes are abundant in the real world Discrete-time models are inappropriate for systems with asynchronous events - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/1.jpg)
Decision-Theoretic Planning with Asynchronous Events
Håkan L. S. YounesCarnegie Mellon University
![Page 2: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/2.jpg)
2
Introduction
Asynchronous processes are abundant in the real world
Discrete-time models are inappropriate for systems with asynchronous events
Generalized semi-Markov (decision) processes are great for this!
![Page 3: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/3.jpg)
3
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
t = 0
![Page 4: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/4.jpg)
4
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
m1 upm2 down
m2 crashes
t = 0 t = 2.5
![Page 5: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/5.jpg)
5
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 crashesm2 crashes
t = 0 t = 2.5 t = 3.1
![Page 6: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/6.jpg)
6
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 downm2 up
m1 crashesm1 crashes m2 rebooted
t = 0 t = 2.5 t = 3.1 t = 4.9
![Page 7: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/7.jpg)
7
A Model of StochasticDiscrete Event Systems
Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S
![Page 8: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/8.jpg)
8
Events
In a state s, events Es E are enabled
With each event e is associated: A distribution Ge governing the time e
must remain enabled before it triggers
A next-state probability distribution pe(s′|s)
![Page 9: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/9.jpg)
9
Semantics of GSMP Model Associate a real-valued clock te with e For each e Es sample te from Ge
Let e* = argmine Es te, t* = mine Es te
Sample s′ from pe*(s′|s) For each e Es′
te′ = te – t* if e Es \ {e*} sample te′ from Ge otherwise
Repeat with s = s′ and te = te′
![Page 10: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/10.jpg)
10
Semantics: Example
m1 upm2 up
m1 upm2 down
m2 crashes
m2 crashes: 2.5m1 crashes: 3.1
m1 crashes: 0.6reboot m2: 2.4
![Page 11: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/11.jpg)
11
Notes on Semantics
Events that remain enabled across state transitions without triggering are not rescheduled Asynchronous events! Differs from semi-Markov process in
this respect Continuous-time Markov chain if all Ge
are exponential distributions
![Page 12: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/12.jpg)
12
General State-Space Markov Chain (GSSMC)
Model is Markovian if we include the clocks in the state space Extended state space X Next-state distribution f(x′|x) well-defined
Clock values are not know to observer Time events have been enabled is know
![Page 13: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/13.jpg)
13
Observation Model
An observation o is a state s and a real value ue for each event representing the time e has currently been enabled f(x|o) is well-defined
![Page 14: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/14.jpg)
14
Observations: Example
m1 upm2 up
m1 upm2 down
m2 crashes
m2 crashes: 2.5m1 crashes: 3.1
m1 crashes: 0.6reboot m2: 2.4
m1 upm2 up
m1 upm2 down
m2 crashes
m2 crashes: 0.0m1 crashes: 0.0
m1 crashes: 2.5reboot m2: 0.0
Observed model:
Actual model:
![Page 15: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/15.jpg)
15
Actions and Policies (GSMDPs)
Identify a set A E of controllable events (actions)
A policy is a mapping from observations to sets of actions Action choice can change at any time
in a state s
![Page 16: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/16.jpg)
16
Rewards and Discounts
Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e
Continuous reward rate r(s,a) associated with a being enabled in s
Discount factor Unit reward earned at time t counts as e
–
t
![Page 17: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/17.jpg)
17
Value Function for GSMDPs
X Sse
tt
X X
tt
t
dxsoxobsVsesksspeosceoxf
dxxdsoxobsVseskoxxfedtosceoxfoV
)),,(()*,,()|())(,(11
)|(
)),,(()*,,(),|())(,()|()(
***
**
0
![Page 18: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/18.jpg)
18
GSMDP Solution Method [Younes & Simmons 2004]
Continuous-time MDPGSMDP Discrete-time MDP
Phase-type distributions Uniformization[Jensen 1953]
GSMDP Continuous-time MDP
![Page 19: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/19.jpg)
19
Continuous Phase-Type Distributions [Neuts 1981]
Time to absorption in a continuous-time Markov chain with n transient states
eetX t Q 1]Pr[
rows
1
1
ne
![Page 20: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/20.jpg)
20
Exponential Distribution
1
0
Q
tetX 1]Pr[
![Page 21: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/21.jpg)
21
Two-Phase Coxian Distribution
01
10p1
(1 – p)1
2
2
11
0 p
Q
![Page 22: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/22.jpg)
22
Generalized Erlang Distribution
n – 1 10 …p
(1 – p)
001
000
00
0
0
00
p
Q
![Page 23: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/23.jpg)
23
Method of Moments
Approximate general distribution G with phase-type distribution PH by matching the first n moments
![Page 24: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/24.jpg)
24
Moments of a Distribution
The ith moment: i = E[X i]
Mean: 1
Variance: 2 = 2 – 1
2 Coefficient of variation: cv = /1
![Page 25: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/25.jpg)
25
Matching One Moment
Exponential distribution: = 1/1
![Page 26: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/26.jpg)
26
Matching Two Moments
cv 2
0 1
1
1
Exponential Distribution
![Page 27: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/27.jpg)
27
Matching Two Moments
cv 2
0 1
1
1
Exponential Distribution
2
1
cvn
1
1
npp
)1)(1(2
44221
2
222
cvn
cvnnncvnp
Generalized Erlang Distribution
![Page 28: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/28.jpg)
28
cv 2
Matching Two Moments
0 1
2
1
cvn
1
1
npp
)1)(1(2
44221
2
222
cvn
cvnnncvnp
Generalized Erlang Distribution
21
2
21
2
1
cv
22
1
cvp
Two-Phase Coxian Distribution
1
1
Exponential Distribution
![Page 29: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/29.jpg)
29
Matching Moments: Example 1
Weibull distribution: W(1,1/2) 1 = 2, cv2 = 5 10
1/10
9/10
1/10
one momenttwo moments
W(1,1/2)
1 2 3 4 5 6 7 8
0.5
1.0
t
F(t)
![Page 30: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/30.jpg)
30
Matching Moments: Example 2
Uniform distribution: U(0,1) 1 = 1/2, cv2 = 1/3 10
6 62
6
0.5
1.0F(t)
U(0,1)
t
one momenttwo moments
1 2
![Page 31: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/31.jpg)
31
Matching More Moments
Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] Combination of Erlang distribution
and two-phase Coxian distribution
![Page 32: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/32.jpg)
32
Approximating GSMDP with Continuous-time MDP
Each event with a non-exponential distribution is approximated by a set of events with exponential distributions Phases become part of state
description
![Page 33: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/33.jpg)
33
Policy Execution
Phases represent discretization into random-length intervals of the time events have been enabled
Phases are not part of real model Simulate phase transition during
execution
![Page 34: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/34.jpg)
34
The Foreman’s Dilemma
When to enable “Service” action in “Working” state?
Workingc = 1
Failedc = 0
Servicedc = 0.5
ServiceExp(10)
ReturnExp(1)
FailG
ReplaceExp(1/100)
![Page 35: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/35.jpg)
35
The Foreman’s Dilemma: Optimal Solution
Find t0 that maximizes v0
0201
0
210
2
1
1
1
1001
1
11
)(1)(11
)(1)(
vvvv
dtveetFtfveetFtfv ttXY
ttYX
t
XX
ttX
dxxftF
tte
tttf
0
0)(10
0
)()(
10
0)(
0
Y is the time to failure in “Working” state
![Page 36: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/36.jpg)
36
The Foreman’s Dilemma: SMDP Solution
Same formulas, but restricted choice: Action is immediately enabled (t0 = 0) Action is never enabled (t0 = ∞)
![Page 37: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/37.jpg)
37
3 moments2 momentsSMDP1 moment
The Foreman’s Dilemma: Performance
Failure-time distribution: U(5,x)
x
Perc
ent
of
opti
mal
100
90
80
70
60
505 10 15 20 25 30 35 40 45 50
![Page 38: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/38.jpg)
38
3 moments2 momentsSMDP1 moment
The Foreman’s Dilemma: Performance
Failure-time distribution: W(1.6x,4.5)
x
Perc
ent
of
opti
mal
100
90
80
70
60
505 10 15 20 25 30 35 400
![Page 39: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/39.jpg)
39
System Administration
Network of n machines Reward rate c(s) = k in states where
k machines are up One crash event and one reboot
action per machine At most one action enabled at any
time
![Page 40: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/40.jpg)
40
System Administration: Performance
3 moments2 moments1 moment
Rew
ard
50
45
40
35
30
25
20
15
1 2 3 4 5 6 7 8 9 10 11 12 13n
Reboot-time distribution: U(0,1)
![Page 41: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/41.jpg)
41
System Administration: Performance
1 moment 2 moments 3 moments
size
states
q time (s)
states q time (s)
states q time (s)
4 16 5 0.36 32 9 3.57 112 21.0
10.30
5 32 6 0.82 80 10 7.72 272 22.0
22.33
6 64 7 1.89 192 11 16.24 640 23.0
40.98
7 128 8 3.65 448 12 28.04 1472 24.0
69.06
8 256 9 6.98 1024 13 48.11 3328 25.0
114.63
9 512 10 16.04 2304 14 80.27 7424 26.0
176.93
10 1024 11 33.58 5120 15 136.4 16384 27.0
291.70
11 2048 12 66.00 24576 16 264.17 35840 28.0
481.10
12 4096 13 111.96 53248 17 646.97 77824 29.0
1051.33
13 8192 14 210.03 114688
18 2588.95
167936
30.0
3238.16
2n (n+1)2n (1.5n+1)2n
![Page 42: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/42.jpg)
42
The Role of Phases
Foreman’s dilemma: Phases permit delay in enabling of
actions System administration:
Phases allow us to take into account the time an action has already been enabled
![Page 43: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/43.jpg)
43
Summary
Generalized semi-Markov (decision) processes allow asynchronous events
Phase-type distributions can be used to approximate a GSMDP with an MDP Allows us to approximately solve GSMDPs
using existing MDP techniques Phase does matter!
![Page 44: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/44.jpg)
44
Future Work
Discrete phase-type distributions Handles deterministic distributions Avoids uniformization step
Value function approximation Take advantage of GSMDP structure
Other optimization criteria Finite horizon, etc.
![Page 45: Decision-Theoretic Planning with Asynchronous Events](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813948550346895da0e3a1/html5/thumbnails/45.jpg)
45
Tempastic-DTP
A tool for GSMDP planning:http://www.cs.cmu.edu/~lorens/tempastic-
dtp.html