1 operations research prepared by: abed alhameed mohammed alfarra supervised by: dr. sana’a wafa...

1

Operations Research

Prepared by:

Abed Alhameed Mohammed Alfarra

Supervised by:

Dr. Sana’a Wafa Al-Sayegh

2nd Semester 2008-2009

ITGD4207

University of Palestine

2

ITGD4207 Operations Research

Chapter 14

Markov Decision Processes

3

Outline

Introduction to MDPsDefinition MDPSolutionMDP Basics and TerminologyMarkov AssumptionA prototype Example 1Example 2

4

Introduction to MDPs• a Markov Decision Process is a discrete time

stochastic control process characterized by a set of states; in each state there are several actions from which the decision maker must choose.

• For a state s and an action a, a state transition function Pa(s) determines the transition probabilities to the next state. The decision maker earns a reward for each state transition.

• Roots in operations research• Also used in economics, communications

engineering, ecology, performance modeling

5

Definition MDP

• Defined formal as a tuple: <S, A, T, R>– S: State– A: Action– T: Transition function

• Table P(s’| s, a), prob of s’ given action “a” in state “s”

– R: Reward• R(s, a) = cost or reward of taking action a in state s

– is the probability that action a in state s at time t will lead to

state s' at time t + 1,

6

Definition MDP

• The goal is to maximize some cumulative function of the rewards, typically the discounted sum over a potentially infinite horizon:

7

Solution

• The solution to a Markov Decision Process can be expressed as a policy π, a function from states to actions. Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov Chain.

8

MDP Basics and Terminology• Goal is to choose a sequence of actions for optimality• Defined as <S, A, T, R>• MDP models:

– Finite horizon: Maximize the expected reward for the next n steps

– Infinite horizon: Maximize the expected discounted reward.

– Transition model: Maximize average expected reward per transition.

– Goal state: maximize expected reward (minimize expected cost) to some target state G.

9

Markov Assumption

• Markov Assumption: Transition probabilities (and rewards) from any given state depend only on the state and not on previous history

• Where you end up after action depends only on current state

• Choose a sequence of actions (not just one decision or one action)– Utility based on a sequence of decisions

10

A prototype Example 1

A manufacturer has one key machine at the core of one of its production processes. Because of heavy use, the machine deteriorates rapidly in both quality and output. Therefore, at the end of each week, a thorough inspection is done those results in classifying the condition of the machine into one of four possible states:

11

The following matrix shows the relative frequency (probability) of each possible transition from the state in one month (a row of the matrix) to the state in the following month (a column of the matrix).

12

The expected costs per week from this source are as follows:

Find the expected average cost per unit time:

Total cost when machine enter state 3 = 6.000$

13

Solution

14

π0 = π0π1= 7/8 π0 + ¾ π1π1- ¾ π1 = 7/8 π0 0.25 π1 = 7/8 π0π1 =3.5 π0

π2= 1/16 π0 + 1/8 π1 + 1/2 π2π2- 1/2 π2 = 1/16 π0 + 1/8 π10.5π2= 1/16 π0 + 1/8 π1π2= 0.125 π0 + 0.25 π1

π3= 1/16 π0 + 1/8 π1 + 1/2 π2 = π0

1=π0+π1+π2+π3 1=π0+ 3.5π0+0.125 π0+ 0.25 (3.5 π0)+π0 1= (1+3.5+0.125+0.878+1)+π0 1= 6.5π0 π0 = 0.15 (2/13)

)1 (π1 =3.5(2/13)= 7/13

)2 (π2 =0.125(2/13)+0. 25(7/13)= 2/13

)3 (π3 = 2/13

16

Example 2 Assume we have 3 types of household detergents

Ariel, Tide, Omo

Compacting for attract customers

After studying the market situation at the widely found that the three types of current shares in the market as follows: -- Ariel = 40%Tide = 35%Omo= 25%

17

The study showed changes in the demand for all three species were estimated for the regular 6 weeks. The conversion rates were measured from one species to another during the study period Were as in the following table

State from/to Tide ArielOmo

Ariel 0.050.9 0.05

Tide0.80.10.1

Omo 0.150.1 0.75

18

Find Identification of the market share of sales volume for each of the detergent during the next periodic periods based on current estimates of shares and the transition matrix of possibilities.

19

• Market for Tide = (0.40*0.05+0.35*0.8+0.25*0.15)=0.3375 • Market share for Ariel = (0.40*0.9+0.35*0.1+0.25*0.1)=0.42

• Market for Omo = (0.40*0.05+0.35*0.1+0.25*0.75)=0.2425

Solution

20

• Comparing the ratios of these ratios, we find that the new means: --

- Increase the share of cleaner Ariel from the local market increased = 2%

- Tide Detergent decline in the share of the domestic market = 1.25%

- Decline in the share of Omo = 0.75%

1 operations research prepared by: abed alhameed mohammed alfarra supervised by: dr. sana’a wafa...

Documents

goal state

given state

state s r

state transition function

target state g

transition probabilities

possible transition

average expected reward