discrete mdp problem with admission and inventory control

Discrete MDP problem with Admission and Inventory Control in Service

Facility Systems

C. Selvakumar, P. Maheswari, and C. Elango

Research Department of Mathematics, Cardamom Planters’ Association College,

Bodinayakanur- 625 513.

E - mail: [email protected]

Abstract

In this article, we study a discrete-time MDP model of service facility system

maintaining inventory. Decisions are taken at discrete time epochs to control both admission

to service facility and inventory replenishment management. Here the queue before the server

is divided into eligible queue and potential queue. Control system is used to transfer

customers from potential queue to eligible queue. The MDP based on average cost criteria be

used to find the optimal policy to be implemented for the system. Numerical example is

provided to illustrate the problem vividly.

Keywords:

Markov Decision Processes, Inventory Control, Admission Control, Service Facility

System, Average Cost Criteria. 1. Introduction

Markov decision model is a versatile and powerful tool for analyzing probabilistic sequential

decision processes with an infinite planning horizon. This model is a fusion of two concepts

Markov Process and Dynamic programming. The Dynamic programming(DP) concept being

developed by Bellman in the early 1950s. Basic principles of DP are states, the principle of

optimality and functional equations.

At much as the same time Bellman(1957) developed the theory of dynamic

programming, Howard(1960) used basic principles of Markov Chain Theory and Dynamic

programming to develop a policy-iteration algorithm for Markov decision processes

problems: means sequential decision process with an infinite planning horizon. A theoretical

foundations to Howard Policy-iterations method has been developed by Blackwell(1962),

Denardo & Fox(1968) and Veinolt(1966). Markov decision model with Linear Programming

method has been first given by De Ghellinck(1960) and Manne(1960), Derman(1970) and

Hordijk and Kallenberg(1979, 1984). Another method for ordering MDP problem is Value-

iteration algorithm which was developed by Odani(1969) and Hastings(1971). He developed

International Journal of Computational and Applied Mathematics. ISSN 1819-4966 Volume 12, Number 1 (2017) © Research India Publications http://www.ripublication.com

478

lower and upper bounds for the minimal average cost.

The Markov decision models finds applications in a wide variety of fields. Some

important application in the fields of Machine maintenance has been done in the last

eighties(Golabi et al. (1982), Kawai(1983), Stengos and Thomas(1980) and Tijims and Van der

Duyn Schouten(1985)). A survey of real application of MDP models can be found in

White(1985).

In this article, we considered a discrete time MDP in a service facility system in which

inventory is maintained to complete the service. The arrival of customers to the system is

controlled by taking decisions at discrete decision epochs. The demand for service exist

throughout the period, and they are waiting in a queue when there is a customer in the counter

for service.

The revenue/cost and demand distribution are constant throughout the time period. The

maximum inventory of the system is assumed to be M. In the last section a numerical examples

are provided to illustrate the model.

2. Model description

(i)The system is observed every 0t unit of time and the decision epochs are 0, ,2 ,...t t

(ii)Admissions to the service facility is controlled, by spliting the queue into Eligible queue

and Potential queue. At each decision epoch the controller observes the number of items in

stock and number of customers in the system(Eligible queue + Server).

(iii) Number of customers to be admitted at time epoch t = Number of items in stock - Number

of customer in the eligible queue at time t . Other customers are rejected.


479

(iv) Arriving customers to service facility system follows a probability distribution g and

the arriving customers are placed in potential customers queue.

(v) Only the eligible queue(main queue) customers get service.

(vi) No partial service completion allowed during any period.

(vii) All serviced customers take unit item from inventory and depart the system at end of the

period.

(viii) The 1,M M policy is adopted for replenishing inventory (one for- one policy).

Replenishment is instantaneous.

(ix) Decision to order additional stock is made at the beginning of each period and delivery

occurs instantaneously.

Let tX denote the number of customer in the system immediately prior to the decision

epoch t and tZ is the number of customers arrive in the period t . Customer arriving in the

period 1t enter the potential customer queue. At the decision epoch t the controller admits

t t tI X u of customers from the potential customer queue into the system. Let tY denote

the number of “possible service completions” during period t . Let tI denote the number of

items in stock at time epoch t .

Time Potential Queue System

t

t

1t

1tZ

0

tZ

tX

t tX u

1

t t t t t t

t

t t t tt

X Z Y Z Iif

I

XX

fY Z I Xi

Hence t denotes a point in time immediately after the control has been implemented but

prior to any service completions.

The system state is denoted by the pair ( , I ).t tX

The two component of the system state is given by

1

t t t t t t

t

t t t tt

X Z Y Z Iif

I

XX

fY Z I Xi

1 .t t tI I Y


480

We can admit only tt tu I X customers, so that 0 .t tu I

The random variable tY assume non-negative integer values and follows a time invariant

probability distribution ( ) Pr , 0,1,2,...tf n Y n t and tZ assumes a non-negative values

which follows a time invariant probability distribution g( ) Pr , 0,1,2,...tn Z n t

Reward/ cost structure:

The stationary cost structure consist of three components: a constant cost of R units

for every completed service and an expected holding cost h x per period when there are x

items in inventory and a waiting cost k y per period when there are y customers in the

system.

3. MDP formulation

We consider the problem on MDP having five components (tuples) , p |, ,, s t tT S A r

Decision Epochs:

0, ,2 t,...T t

States:

S1 - the number of customers in the system

S2 - the number of items in stock.

1 20,1,2,..., N 0,1,2,..., ,S M S S where .N M

Actions:

1 2,

, , , 0,1,2ss sA l m A l m

0,02,2A


481

20, 1 s 1

2,0 , 2,1M

A

0,M2,0A

1 s 1 1 s 11 2,

0,1 , 1,0 , 0,0 , 1,1N M

A

1 s 1 ,1

0,0 , 0,1N M

A

,M0,0 .

NA

Cost:

1 1 1 2( , ) min( , ) ( ) ( ), , ( , ).t t s

s S

c s a R E Y s a h s a k y a A A s s s

Transition Probability:

1

1 1 2 1 1

2 1 1

2 1 1

1 1

( ') ( ') ' 0

( ) ( ') ' 0, 0' | ,

( ) ' 0

0 ' 0.

i s at

f s a s g s if a s s

f i g s if s a sP s s a

g s if s a s

if s a s

where 1 2 1 2( , ), ' ( ', ').s s s s s s

The expected number of service completion in period t is

1

1

1

1 1

1

min( , ) ( ) ( ) ( ).s a

t

i i s a

E Y s a i f i s a f i

4. Analysis

The one step costs are given by, 1 2,, ,t s a s sc s .

Let , It tX denote the state of the system of decision epoch t (beginning of tht period).

Assume the stationary policy R and hence the transition probability

1 1 1 2 1 2' | s, , I , Ir = ' | , , ' ' , , .',t t t t ts a P X s X s a s s s sp s s

regardless the past history of the system up to time epoch t .

Then ; : 0t tX I t is a Markov chain with discrete state space 1 2.S S S The t - step

transition probabilities of the Markov chain under policy R is given by

0 0 1 2 1 2Pr = ' | ' | s , I , I ', ', ' , , .t t ts X s X s s s sp R s s s


482

Define 1 2, ,,tV s R s s s denote the total expected cost over the first t decision epochs

with initiate state 1 2,s s and policy R is adopted.

Then

1

' ' 1 2 1 2

0 '

, ' , 's,R ' ' , ,,t

k

t s s

k s S

p s s R c R s s s s s sV

where ,

sC R service cost per period + holding cost of inventory/period + waiting cost of

customer/period.

1 2 +C K h I C L

where K means number of customers served per period, I the average inventory in

stock during the tht period and L denotes the number of customers in the eligible queue + 1 in

service counter.

5. Cost Analysis

The average cost function sg R is given by 1 2

1lim s,R , ,st

tg R V s s St

. The

elements of the above average cost function is due to the Theorem (Puterman(1994) &

Tijims(2003)).

Theorem 5.1

For all 1 2 1 2' ' , ,', ,s s s s Ss s

1

1lim ' | ,

tk

tt

k

p s s Rt

always exists and for any

1 2',' ' Ss s s .

( )

1

1'1

'lim ' |

0 ' .

tk

st

k

if state s is recurrentp s s

tif state s is transient

Where 's denote the mean recurrent time from state 1 2', 's s to itself.

Also ( ) ( ') (k)

( ) 2

1

1 2

1

1

1 1lim ' | lim ' , , ',, ' ' .

t tk s

s tt t

k k

s s sp s s f p s st

st

s

Since the Markov Chain , I : 0,1,2,...t tX t is a unichain which is irreducible,

all its states are ergodic and have a unique equilibrium distribution.


483

Thus, 1 2' 1 2

( )

1

1lim ' | , , , ' , ,''

tk

st

k

s s s s sR p s s Rt

s

exist and is independent of

initial state , such that P and 1.

ss S

6. Optimal Policy

A stationary policy *R is said to be an average cost optimal policy if

1 2 1 2,s ,s*s sg R g R for each stationary policy R uniformly in the initial state 1 2,s s .

The relative value associated with a given policy R provides a tool for constructing a

new policy *R whose average cost is more than that of the current policy R .

The objective is to improve the given policy R whose average cost is g R and

relative value 1 2

2, 1, , .s s

s sv R S

By constructing a new policy R such that for each 1 2,s s S ,

( ) , ' '

'

* * .................(1) s s s s

s

ss sS

c p v vR g R R

Where 1 2,s s s and 1 2'' ',s s s .

We obtain an improved rule *R with *g R g R . We have to find the optimal policy

*sR satisfying (1) is to minimize the cost functions '

' ' | s,ai t

s

s

S

c g R p s v Ra

over

all actions ( )a A s .

7 Algorithm

Step 0: (Initialization)

Choose a stationary policy R for the periodic review based admission control in

service facility system maintaining inventory.

Step 1: (Value determination step)

For the current policy R , compute the unique solution , v ( )sg R R to the following

linear equations

1 2

'

' | s ', , ,t s s

s S

s s s p s R v s s sv c R g S

1 20, ,ssv where s s is arbitarily chosen state in S

Step 2:(Policy Improvement)

For each state 1 2,s s s S determine the actions yielding


484

'

' | s, 'r aa g mins

s t s

s Sa A

c a g p s v R

The new stationary policy *R is obtained by choosing *s sR a .

Step 3:(Convergence test)

If the new policy *R R , the old one. Then the process of searching stops with policy

R . Otherwise go to Step 1 with R replaced by new *R .

8. Numerical Example:

Consider a MDP formulation of a service facility system with inventory maintenance which

controls the customers admission to the system and controls the ordering level of inventory.

Decisions at equidistant time epochs are taken to admit the eligible number of customers by

observing the inventory level of the system.

The inventory maintained in the system in reviewed at decision epochs and refilled upto

maximum level M. Decisions are made at each time epoch for inventory.

For the system we are

5N and 5M . Let the state space be 1 0,1,2,3,4,5S and 2 0,1,2,3,4,5S

1 2 { 0,0 , 0,1 , 0,2 , 0,3 , 0,4 , 0,4 , 0,5 , 1,1 , 1,2 , 1,3 , 1,4 , 1,5 , 2,2 ,S S

2,3 , 2,4 , 2,5 , 3,3 , 3,4 , 3,5 , 4,4 , 4,5 , 5,5 }, where .N M

Admission Control:

Assume that the costs for holding at level 1s S are respectively: c4=3, c3=5, c2=7,

c1=9, c0=cf=10.

1 1\ 's s 5 4 3 2 1 0

5 0.2 0.55 0.25 0 0 0

4 0 0.15 0.65 0.20 0 0

3 0 0 0.25 0.6 0.15 0

2 0 0 0 0.15 0.75 0.1

1 0 0 0 0 0.15 0.85

0 0 0 0 0 0.9 0.1

Inventory Control:

Let us assume that costs for ordering inventory at level 2s S are respectively

cp4=5, cp3=4, cp2=3.2, cp1=2, cp0=cf=1.5


485

Then Ch(Holding cost)=0.1(per inventory) and Inventory cost= 0.3 (per item).

2 2\ 's s 5 4 3 2 1 0

5 0.3 0.4 0.2 0.1 0 0

4 0 0.2 0.4 0.3 0.1 0

3 0 0 0.25 0.55 0.15 0.05

2 0 0 0 0.25 0.65 0.1

1 0 0 0 0 0.4 0.6

0 0 0 0 0 0 1.0

Computational Procedure:

For any given policy ,R the policy improvement quantity is given by

'

'

a, ' | , a, .s s t s s s s

s S

T R c a g R p s s a v a where T R v R for a R

Iteration 1:

For the given policy 1

0,0,0,0,0,2 (Admission )R control the linear equations connecting

the average cost 1

g R and the relative values are given by

v5 = -g + 0.2v5 + 0.55v4 + 0.25v3

v4 = -g + 0.15v4 + 0.65v3 + 0.2v2

v3 = -g + 0.25v3 + 0.6v2 + 0.15v1

v2 = -g + 0.15v2 + 0.75v1 + 0.1v0

v1 = -g + 0.15v1 + 0.85v0

v0 = 10- g + 0.9v1 + 0.1v0

By assuming v5=0 and solving we get,

1

5v R =0, 1

4v R = 4.687757456, 1

3v R = 9.115505026, 1

2v R = 14.58329214,

1

1v R = 19.62530895, 1

0v R =25.33959466, 1g R = 4.857142857.

For the given policy 1

0,0,0,0,0,2 (Inventory )R control the linear equations connecting

the average cost 1

m R and the relative values are given by

w5 = -m + 0.3w5 + 0.4w4 + 0.2w3 + 0.1w2

w4 = -m + 0.2w4 + 0.4w3 + 0.3w2 + 0.1w1

w3 = -m + 0.25w3 + 0.55w2 + 0.15w1 + 0.05w0


486

w2 = -m + 0.25w2 + 0.65w1 + 0.1w0

w1 = -m + 0.4w1 + 0.6w0

w0 = 1.5 – m + 1.0w0

By assuming w5=0 and solving we get,

1

5w R =0, 1

4w R = 1.527777778, 1

3w R = 2.500000000, 1

2w R = 3.888888889,

1

1w R = 5.555555556, 1

0w R =8.055555556, 1m R = 1.500000000.

The test quantity ,sT a R for admission and inventory control has following values:

1 2

1 2,, ,

s sT a a R

1 2,s s denote the number of customers admitted to the system(eligible queue) and number of

items in stock respectively. 1 2,a a denote the decision(action) for admission and inventory

control respectively .

(1)

1 20,1, , :T a a R

2,0 = 15.65555555, 2,1 = 13.3

(1)

1 20,2, , :T a a R

2,0 = 14.08888889, 2,1 = 14.3

(1)

1 20,3, , :T a a R

2,0 = 12.80000000, 2,1 = 14.9

(1)

1 20,4, , :T a a R

2,0 = 11.92777779, 2,1 = 15.7

(1)

1 21,1, , :T a a R

0,0 =25.28086450, 0,1 =22.92530895, 1,0 =14.65555556, 1,1 =12.3

(1)

1 21,2, , :T a a R

0,0 =23.71419784, 0,1 =23.92530895, 1,0 =13.08888889, 1,1 =13.3

(1)

1 21,3, , :T a a R

0,0 =22.42530895, 0,1 =24.52530895, 1,0 =11.80000000, 1,1 =13.9

(1)

1 21,4, , :T a a R


487

0,0 =21.55308674, 0,1 =25.32530895, 1,0 =10.92777778, 1,1 =14.7

(1)

1 22,2, , :T a a R

0,0 =18.67218103, 0,1 =18.88329214, 1,0 =11.08888889, 1,1 =11.3

(1)

1 22,3, , :T a a R

0,0 =17.38329214, 0,1 =19.48329214, 1,0 =9.800000000, 1,1 =11.9

(1)

1 22,4, , :T a a R

0,0 =16.51106993, 0,1 =20.28329214, 1,0 =8.927777779, 1,1 =12.7

(1)

1 23,3, , :T a a R

0,0 =11.91550503, 0,1 =14.01550503, 1,0 =7.800000000, 1,1 =9.9

(1)

1 23,4, , :T a a R

0,0 =11.04328281, 0,1 =14.81550503, 1,0 =6.927777779, 1,1 =10.7

(1)

1 24,4, , :T a a R

0,0 =6.615535235, 0,1 =10.38775746, 1,0 =4.927777779, 1,1 =8.7

(1)

1 21,5, , :T a a R

0,0 =19.62530895, 1,0 =9,

(1)

1 22,5, , :T a a R

0,0 =14.58329214, 1,0 =7,

(1)

1 23,5, , :T a a R

0,0 =9.115505027, 1,0 =5,

(1)

1 24,5, , :T a a R

0,0 =4.687757456, 1,0 =3,

(1)

0,52,0 , 10T R

(1)

0,02,2 , 10 1.5T R

(1)

5,50,0 , 0T R


488

The new policy will be 2

0,1,1,1,1,2 (Admission )R control and

20,0,0,0,1,2 (Inventory )R control . Since the new policy

2R is different from the initial

policy 1

R .

Iteration 2:

For the policy 2(Admission )R control the linear equations connecting the average cost

2

g R and the relative values are given by

v5 = -g + 0.2v5 + 0.55v4 + 0.25v3

v4 = 3 - g + 0.15v4 + 0.65v3 + 0.2v2

v3 = 5 - g + 0.25v3 + 0.6v2 + 0.15v1

v2 = 7 - g + 0.15v2 + 0.75v1 + 0.1v0

v1 = 9 - g + 0.15v1 + 0.85v0

v0 = 10 - g + 0.9v1 + 0.1v0

By assuming v5=0 and solving we get,

2

5v R =0, 2

4v R = 9.870448179, 2

3v R = 16.22787115, 2

2v R = 21.63739496,

2

1v R = 24.49453782, 2

0v R =25.06596639, 2g R = 9.485714286.

For the policy 2(Inventory )R control the linear equations connecting the average cost

2

m R and the relative values are given by

w5 = -m + 0.3w5 + 0.4w4 + 0.2w3 + 0.1w2

w4 = -m + 0.2w4 + 0.4w3 + 0.3w2 + 0.1w1

w3 = -m +0.25w3 + 0.55w2 + 0.15w1 + 0.05w0

w2 = -m + 0.25w2 + 0.65w1 + 0.1w0

w1 = 2 - m + 0.4w1 + 0.6w0

w0 = 1.5 - m + 1.0w0

By assuming w5=0 and solving we get,

2

5w R =0, 2

4w R = 1.558994709, 2

3w R = 2.423809524, 2

2w R = 3.916402116,

2

1w R = 6.027513228, 2

0w R =5.194179894, 2

m R = 1.500000000.

The test quantity ,sT a R for admission and inventory control has following values:

(2)

1 20,1, , :T a a R


489

2,0 = 14.12751323, 2,1 = 13.3

(2)

1 20,2, , :T a a R

2,0 = 14.11640212, 2,1 = 14.3

(2)

1 20,3, , :T a a R

2,0 = 12.72380952, 2,1 = 14.9

(2)

1 20,4, , :T a a R

2,0 = 11.95899471, 2,1 = 15.7

(2)

1 21,1, , :T a a R

0,0 =19.62205105, 0,1 =18.79453782, 1,0 =13.12751323, 1,1 =12.3

(2)

1 21,2, , :T a a R

0,0 =19.61093994, 0,1 =19.79453782, 1,0 =13.11640212, 1,1 =13.3

(2)

1 21,3, , :T a a R

0,0 =18.21834733, 0,1 =20.39453782, 1,0 =11.72380952, 1,1 =13.9

(2)

1 21,4, , :T a a R

0,0 =17.45353253, 0,1 =21.19453782, 1,0 =10.95899471, 1,1 =14.7

(2)

1 22,2, , :T a a R

0,0 =18.75379709, 0,1 =18.93739497, 1,0 =11.11640212, 1,1 =11.3

(2)

1 22,3, , :T a a R

0,0 =17.36120448, 0,1 =19.53739497, 1,0 =9.723809524, 1,1 =11.9

(2)

1 22,4, , :T a a R

0,0 =16.59638968, 0,1 =20.33739497, 1,0 =8.958994710, 1,1 =12.7

(1)

1 23,3, , :T a a R

0,0 =13.95168067, 0,1 =16.12787116, 1,0 =7.723809524, 1,1 =9.9

(1)

1 23,4, , :T a a R

0,0 =13.18686587, 0,1 =16.92787116, 1,0 =6.958994710, 1,1 =10.7


490

(2)

1 24,4, , :T a a R

0,0 =8.829442893, 0,1 =12.57044818, 1,0 =4.958994710, 1,1 =8.7

(2)

1 21,5, , :T a a R

0,0 =15.49453782, 1,0 =9,

(2)

1 22,5, , :T a a R

0,0 =14.63739497, 1,0 =7,

(2)

1 23,5, , :T a a R

0,0 =11.22787116, 1,0 =5,

(2)

1 24,5, , :T a a R

0,0 =6.870448183, 1,0 =3,

(2)

0,52,0 , 10T R

(2)

0,02,2 , 10 1.5T R

(2)

5,50,0 , 0T R

The new policy 3

0,1,1,1,1,2R (Admission )control and

30,0,0,0,1,2R (Inventory )control which is the identical with the policy 2

.R After two

iterations we obtained the optimal policies * 0,1,1,1,1,2 (Admission )R control and

* 0,0,0,0,1,2 (Inventory ).R control It I beneficial to allow customers to the system at

states:1,2,3,4 only and replenishment order is placed when the system state is in state

1(inventory level). At state (0,0) compulsory admission and replenishment is suggested.

9. Conclusion and Future Research:

In this article we presented an application of Markov Decision Process(MDP) in

admission and replenishment control using classical approach namely „policy iteration’. This

result can be extended to admission and service control in service facility systems. We are

currently studying Markov Decision Process in discrete time with admission and service

control. In future we would like to extend the model to control both service and replenish order

simultaneously.


491

Acknowledgement P. Maheswari's research is supported by the University Grants

Commission, Govt. of India, under NFOBC Scheme (F./2015-16/NFO-2015-17-OBC-TAM-

46773/(SA-III/Website)).

References

[1] Bellman, R., (1957), Dynamic Programming, Princeton University Press, Princeton, NJ.

[2] Berman, O. and Sapna, K.P., (2001), Optimal Control of Service for facilities holding

inventory, Computers and operations Research, 28: 429-441.

[3] Blackwell, D., (1962), Discrete dynamic programming, Ann. Math. Statist., 33, 719-726.

[4] De Ghellinck, G., (1960), Les problѐmes de dѐcisions sѐquentielles, Cahiers Centre Etudes

Recherche Opѐr., 2, 161-179.

[5] Denardo, E.V. and Fox, B.L., (1968), Multi-chain Markov renewal programs, SIAM J.

Appl. Math. 16, 468-487.

[6] Derman, C., (1970), Finite State Markovian Decision Processes, Academic Press, New

York.

[7] Elango, C. and Rozario, G.M., Optimal Policy for a Inventory System with Partial

Backlogging, Working paper, Madurai Kamaraj university.

[8] Golabi, K., Kulkarni, R.B. and Way, C.B., (1982), A statewide pavement management

system, Interfaces, 12, no. 6, 5-21.

[9] Hastings, N.A.J., (1971), Bounds on the gain of a Markov decision process, Operat. Res.,

19, 240-244.

[10] He, Q.-M. and Buzacott J., (2002), Optimal and near-optimal inventory control policies

for a make-to-order inventory-production system, European Journal of Operational Research,

141: 113-132.

[11] Hordijk, A. and Kallenberg, L.C.M., (1979), Linear programming and Markov decision

chains, Management Sci., 25, 352-362.

[12] Hordijk, A. and Kallenberg, L.C.M., (1984), Constrained undiscounted stochastic

dynamic programming, Math. Operat. Res., 9, 276-289.

[13] Howard, R.A., (1960), Dynamic Programming and Markov Processes, John Wiley and

sons, Inc, New York.

[14] Kawai, H., (1983), An optimal ordering and replacement policy of a Markovian

degradation system under complete observation, part I. J. Operat. Res. Soc. Japan, 26, 279-

290.

[15] Manne, A., (1960), Linear programming and sequential decisions, Management Sci., 6,

259-267.


492

[16] Mine, H. and Osaki, S., (1970), Markov Decision Processes, American Elsevier

Publishing Company Inc, New York.

[17] Odoni, A., (1969), On finding the maximal gain for Markov decision processes, Operat.

Res., 17, 857-860.

[18] Puterman, M.L., (1994), Markov Decision Processes: Discrete Stochastic Dynamic

Programming, John Wiley and Sons, Inc New York.

[19] Selvakumar, C., Maheswari, P. and Elango, C., Discrete MDP Problem in Service Facility

Systems with Inventory Management, communicated to (ICMMCMSE2017).

[20] Stengos, D. and Thomas, L.C., (1980), The blast furnaces problem Eur. J. Operat. Res., 4,

330-336.

[21] Tijims, H.C., (2003), A First Course in Stochastic Models, John Wiley and Sons Ltd,

England.

[22] Tijms, H.C. and Van der Duyn Schouten, F.A., (1985), A Markov decision algorithm for

optimal inspections and revisions in a maintenance system with partial information, Eur. J.

Operat. Res., 21, 245-253.

[23] Veinott, A.F. Jr, (1966), On finding optimal policies in discrete dynamic programming

with no discounting, Ann. Math. Statist., 37, 1284-1294.

[24] White, J., (1985), Real Applications Markov Decision Processes, INFORMS, 15:6, 73-83.


493

discrete mdp problem with admission and inventory control

Documents