a call admission control for service differentiation and fairness management in wdm grooming...
TRANSCRIPT
![Page 1: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/1.jpg)
A Call Admission Control for Service Differentiation and
Fairness Managementin WDM Grooming Networks
Kayvan Mosharaf, Jerome Talim and Ioannis LambadarisBroadNet 2004 proceeding
Presented by ZhanxiangFebruary 7, 2005
![Page 2: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/2.jpg)
Goal & Contribution
• Goal: – Fairness control and service differentiation in a WDM
grooming network. Also maximizing the overall utilization.
• Contributions: – An optimal CAC policy providing fairness control by
using a Markov Decision Process approach;– A heuristic decomposition algorithm for multi-link and
multi-wavelength network.
![Page 3: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/3.jpg)
Quick Review of MDP
• DTMC
• DTMDP– We focus on DTMDP because CTMDP
usually solved by discretization.
![Page 4: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/4.jpg)
DTMC
0 1 -1 -1
-1
A DTMC { | 0,1,2...} is a discrete time discrete value random sequence such that
given , ... , the next random variable depends only on through the
transition probability
[ |
n
n n n
n n
X n
X X X X X
P X j X
2 2 0 0 -1, ,..., ] [ | ]
of the : ( ) ( )
Transition probability: ( ) ( | ) , 0
Chapman-Kolmogorov equation: ( ) ( ) ( ), , , , 0
n n n n
n j n
jk m n m
ij ik kjk
i X i X i P X j X i
pmf X p n P X j
p n P X k X j m
p m n p m p n m n i j
Originate from Professor Malathi Veeraraghavan’s slides.
![Page 5: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/5.jpg)
DTMC0 1
00 01 02
0 1 2
Initial distribution: (0) [ (0), (0), ... ]
Transition probability matrix P:
...
... ... ... ...
...
... ... ... ...
: ( )
i i i
n
p p p
p p p
Pp p p
n step transition matrix P n P
probabilities that system
:
( ) (0) && ( ) (0)n nj i ij
i
is in state j after n transitions
p n p P p n p P Originate from Professor Malathi Veeraraghavan’s slides.
![Page 6: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/6.jpg)
DTMC
• Two states i and j communicate if for some n and n’, pij(n)>0 and pji(n’)>0.
• A MC is Irreducible, if all of its states communicate.
• A state of a MC is periodic if there exists some integer m>0 such that pii(m)>0 and some integer d>1 such that pii(n)>0 only if d|n.
Originate from Professor Malathi Veeraraghavan’s slides.
![Page 7: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/7.jpg)
DTMC - ( ) ,
. lim ( ),
: - ,
lim ( ) lim (0)
ij
j ijn
j
nj j j ij jn n
i
the n step transition probabilities p n of finite
irreducible and aperiodic MCs become independent
of i and n as n Let q p n
v long run proportion
v p n p P q
0
lim ( - )
, ( 0,1,...), & 1
n
n
j i ij ji j
P V steady state probability vector
v v p j v
Originate from Professor Malathi Veeraraghavan’s slides.
![Page 8: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/8.jpg)
Decision Theory
• Probability Theory
+• Utility Theory
=
• Decision Theory
Describes what an agent should believe based on evidence.
Describes what an agent wants.
Describes what an agent should do.
Originate from David W. Kirsch’s slides
![Page 9: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/9.jpg)
Markov Decision Process
• MDP is defined by:
State Space: SAction Space: AReward Function: R: S {real number}Transition Function: T: SXA S (deterministic)
T: SXA Power(S) (stochastic)
The transition function describe the effect of an action in state s. In this second case the transition function has a probability distribution P(s’|s,a) on the range.
Originate from David W. Kirsch’s slides and modified by Zhanxiang
![Page 10: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/10.jpg)
MDP differs DTMC
• MDP is like a DTMC, except the transition matrix depends on the action taken by the decision maker (a.k.a. agent) at each time step.
Ps,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a]
Next state s’
Action a
DTMC
MDP
Current state s
![Page 11: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/11.jpg)
MDP Actions
• Stochastic Actions:– T : S X A PowerSet(S)
For each state and action we specify a probability distribution over next states, P( s’ | s, a).
• Deterministic Actions:– T : S X A S
For each state and action we specify a new state. Hence the transition probabilities will be 1 or 0.
![Page 12: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/12.jpg)
Action Selection & Maximum Expected Utility
• Assume we assign reward U(s) to each state s• Expected Utility for an action a in state s is
• MEU Principle: An agent should choose an action that maximizes the agent’s EU.
EU(a|s) = s’ P(s’ | s, a) U(s’)
Originate from David W. Kirsch’s slides and modified by Zhanxiang
![Page 13: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/13.jpg)
Policy & Following a Policy
• Policy: a mapping from S to A, π : SA
• Following policy procedure:
1. Determine current state s
2. Execute action π(s)
3. Repeat 1-2
Originate from David W. Kirsch’s slides modified by Zhanxiang
![Page 14: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/14.jpg)
Solution to an MDP
• In deterministic processes, solution is a plan.
• In observable stochastic processes, solution is a policy
• A policy’s quality is measured by its EU
Notation: π ≡ a policy
π(s) ≡ the recommended action in state s
π* ≡ the optimal policy
(maximum expected utility)
Originate from David W. Kirsch’s slides and modified by Zhanxiang
![Page 15: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/15.jpg)
Should we let U(s)=R(s)?
• In the definition of MDP we introduce R(s), which obviously depends on some specific properties of a state.
• Shall we let U(s)=R(s)?– Often very good at choosing single action decisions.– Not feasible for choosing action sequences, which
implies R(s) is not enough to solve MDP.
![Page 16: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/16.jpg)
Assigning Utility to Sequences
• How to add rewards?
- simple sum
- mean reward rate
Problem: Infinite Horizon infinite reward
- discounted rewards
R(s0,s1,s2…) = R(s0) + cR(s1) + c2R(s2)… where 0<c≤1
Originate from David W. Kirsch’s slides modified by Zhanxiang
![Page 17: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/17.jpg)
How to define U(s)?
• Define Uπ(s) is specific to each π
Uπ(s) = E(tR(st)| π, s0=s)
• Define U(s)= Maxπ {Uπ(s) }= Uπ*(s)
• We can calculate U(s) on the base of R(s)
U(s)=R(s) + max P(s’|s,π(s))U(s’) π s’
Bellman equation
If we solve the Bellman equation for each state, we will have solved the optimal policy π* for the given MDP on the base of U(s).
Originate from David W. Kirsch’s slides and modified by Zhanxiang
![Page 18: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/18.jpg)
Value Iteration Algorithm
• We have to solve |S| simultaneous Bellman equations
• Can’t solve directly, so use an iterative approach:
1. Begin with an arbitrary utility function U0
2. For each s, calculate U(s) from R(s) and U0
3. Use these new utility values to update U0
4. Repeat steps 2-3 until U0 converges
This equilibrium is a unique solution! (see R&N for proof)Originate from David W. Kirsch’s slides
![Page 19: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/19.jpg)
State Space and Policy Definition in this paper
• The author’s idea of using MDP is great, I’m not comfortable with state space definition and the policy definition.
• If I were the author, I will define system state space and policy as follows:– S’ = S X E
where S={(n1, n2, … , nk) | tknk<=T} and E={ck class call arrivals} U {ck class call departures} U {dummy events}
– Policy π : SA
![Page 20: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/20.jpg)
Network Model :: Definitions
OADM: Optical Add/Drop MultiplexerWC: wavelength converterTSI: time-slot interchangerL: # of links a WDM grooming network containsM: # of origin-destination pairs the network includesW: # of wavelengths in a fiber in each linkT: # of time slots each wavelength includesK: # of classes of traffic streamsck: traffic stream classes differ by their b/w requirementstk: # of time slots required by class ck traffic to be
establishednk: # of class ck calls currently in the system
![Page 21: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/21.jpg)
Network model :: assumptions
• For each o-d pair, class ck arrivals are distributed according to a Poisson process with rate λk.
• The call holding time of class ck is exponentially distributed with mean 1/μk . Unless otherwise stated, we assume 1/μk = 1.
• Any arriving call from any class is blocked when no wavelength has tk available time slots.
• Blocked calls do not interfere with the system.• The switching nodes are non-blocking
No preemption
![Page 22: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/22.jpg)
Fairness definition
• There is no significant difference between the blocking probabilities experienced by different classes of users;
![Page 23: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/23.jpg)
CS & CP
• Complete Sharing (CS)– No resources reserved for any class of calls;– Lower b/w requirement & higher arrival rate
calls may starve calls with higher b/w requirement and lower arrival rate;
• Complete Partitioning– A portion of resources is dedicated to each
class of calls;– May not maximize the overall utilization of
available resources.
Not Fair
Fair but
![Page 24: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/24.jpg)
Single-link single-wavelength(0)
• System stat space S:
S={(n1, n2, … , nk) | tknk <= T} k
• Operators:
– Aks = (n1, n2, … , nk+1, … , nK)
– Dks = (n1, n2, … , nk-1, … , nK)
– AkPas = (n1, n2, … , nk+a, … , nK)
![Page 25: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/25.jpg)
Single-link single-wavelength(1)• Sampling rate
v = ([T/tk]μk+k) k
• Only one single transition can occur during each time slot.
• A transition can correspond to an event of
– 1) Class ck call arrival
– 2) Class ck call departure
– 3) Fictitious or dummy event (caused by high sampling rate)
![Page 26: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/26.jpg)
Single-link single-wavelength(2)
• Reward function R:
• Value function
![Page 27: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/27.jpg)
Single-link single-wavelength(3)
• Optimal value function:
• Optimal Policy:
![Page 28: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/28.jpg)
Single-link single-wavelength(4)
• Value iteration to compute Vn(s)
![Page 29: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/29.jpg)
Single-link single-wavelength(5)
• Action decision: If Vn(AkP1s) >= Vn(AkP0s)then a=1;else a=0;
Basing on the equation below.
![Page 30: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/30.jpg)
My understanding
• The author’s idea of using MDP is great
![Page 31: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/31.jpg)
Example
![Page 32: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/32.jpg)
Matlab toolbox calculation
![Page 33: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/33.jpg)
Heuristic decomposition algorithm
• Step 1: For each hop i, partition the set of available wavelengths into subsets, dedicated to each of o-d pairs using hop i.
• Step 2: Assume uniformly distributed among the Wm wavelengths, thus, the arrival rate of class ck for each of the Wm wavelengths is given by: λk/Wm.
![Page 34: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/34.jpg)
Heuristic decomposition algorithm (2)
• Step 3: Compute the CAC policy with respect to λk/Wm.
• Step 4: Using the CAC policy computed in Step 3, we determine the optimal action for each of the Wm wavelengths, individually.
![Page 35: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/35.jpg)
Performance comparison
1
k k
i j
We define as the offered load per o-d pair;
BP as the blocking performance of class c calls;
Suppose that c and c calls experience the highest and lowest
blocking probabilities in the ne
Kk
k k
ir
j
twork, then we define fairness ratio
BPas f := ;
BP
![Page 36: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/36.jpg)
Performance comparison
![Page 37: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/37.jpg)
Performance comparison
![Page 38: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/38.jpg)
Performance comparison
![Page 39: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/39.jpg)
Relation to our work
• We can utilize MDP to model our bandwidth allocation problem in call admission control to achieve fairness;
• But in heterogeneous network the bandwidth granularity problem is still there;
![Page 40: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/40.jpg)
Possible Constrains
• Under some conditions the optimal policy of an MDP exists.
![Page 41: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/41.jpg)
![Page 42: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/42.jpg)
Backup
• Other MDP representations
![Page 43: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/43.jpg)
Markov Assumption
• Markov Assumption:The next state’s conditional
probability depends only on a finite history of previous states (R&N)
kth order Markov Process
• Andrei Markov (1913)
The definitions are equivalent!!!
Any algorithm that makes the 1st order Markov Assumption can be applied to any Markov Process
• Markov Assumption:The next state’s conditional
probability depends only on its immediately previous state (J&B)
1st order Markov Process
Originate from David W. Kirsch’s slides
![Page 44: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/44.jpg)
MDP
• A Markov Decision Process (MDP) model contains:– A set of possible world states S– A set of possible actions A– A real valued reward function R(s,a)– A description T(s,a) of each action’s effects in
each state.
![Page 45: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/45.jpg)
MDP differs DTMC
• A Markov Decision Process (MDP) is just like a Markov Chain, except the transition matrix depends on the action taken by the decision maker (agent) at each time step.
Ps,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a]
• The agent receives a reward R(s,a), which depends on the action and the state.
• The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some function of the sequence of rewards (e.g., the mean or expected discounted sum).
![Page 46: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/46.jpg)
MDP Actions
• Stochastic Actions:– T : S X A PowerSet(S)
For each state and action we specify a probability distribution over next states, P( s’ | s, a).
• Deterministic Actions:– T : S X A S
For each state and action we specify a new state. Hence the transition probabilities will be 1 or 0.
![Page 47: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/47.jpg)
Transition Matrix
Next state s’
Current state s
Action a
DTMC
MDP
![Page 48: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/48.jpg)
MDP Policy
• A policy π is a mapping from S to Aπ : S A
• Assumes full observability: the new state resulting from executing an action will be known to the system
![Page 49: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/49.jpg)
Evaluating a Policy
• How good is a policy π in the term of a sequence of actions?
– For deterministic actions just total the rewards obtained... but result may be infinite.
– For stochastic actions, instead expected total reward obtained… again typically yields infinite value.
• How do we compare policies of infinite value?
![Page 50: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/50.jpg)
Discounting to prefer earlier rewards
• A value function, Vπ : S Real, represents the expected objective value obtained following policy from each state in S .
• Bellman equations relate the value function to itself via the problem dynamics.
![Page 51: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/51.jpg)
Bellman Equations
'
* *
'
( ) ( , ( )) ( , ( ), ') ( ')
( ) ( , ( )) ( , ( ), ') ( ')
.
.
| | tan
s S
s S
V s R s s T s s s V s
V s MAX R s s T s s s V s
is the discount factor
There is one equation for each state in S
Thus we have to solve S simul eo
.us Bellman equations
![Page 52: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/52.jpg)
Value Iteration Algorithm
Can’t solve directly, so use an iterative approach:
1. Begin with an arbitrary utility vector V;
2. For eacheach s, calculate V*(s) from R(s,π) and V;
3. Use these new utility values V*(s) to update V;
4. Repeat steps 2-3 until V converges;
This equilibrium is a unique solution!
![Page 53: A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris](https://reader036.vdocuments.mx/reader036/viewer/2022081519/56649ec55503460f94bcf430/html5/thumbnails/53.jpg)
MDP Solution
* *
'
*
'
| |
( ) ( , ( )) ( , ( ), ') ( ')
arg ( , ( )) ( , ( ), ') ( ')
.
s S
s S
Solution to the S Bellman equations
V s MAX R s s T s s s V s
is policy
MAX R s s T s s s V s
when the V converges