partially observable markov decision process (chapter 15 & 16)
DESCRIPTION
Partially Observable Markov Decision Process (Chapter 15 & 16). José Luis Peralta. Contents. POMDP Example POMDP Finite World POMDP algorithm Practical Considerations Approximate POMDP Techniques. Partially Observable Markov Decision Processes (POMDP). POMDP: - PowerPoint PPT PresentationTRANSCRIPT
TKK | Automation Technology Laboratory
Partially Observable Markov Decision Partially Observable Markov Decision Process Process
(Chapter 15 & 16)(Chapter 15 & 16)
José Luis Peralta
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
ContentsContents
• POMDP• Example POMDP• Finite World POMDP algorithm• Practical Considerations• Approximate POMDP Techniques
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• POMDP: Uncertainty in Measurements State Uncertainty in Control Effects
• Adapt previous Value Iteration Algorithm (VI-VIA)
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• POMDP: World can't be sensed directly
• Measurements: incomplete, noisy, etc.
• Partial Observability Robot has to estimate a posterior distribution over a
possible world state.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• POMDP: Algorithm to find optimal control policy exit for
FINITE WORLD:• State space • Action space • Space of observation • Planning horizon
Computation is complex For continuous case there are approximations
All Finite
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• The algorithm we are going to study all based in Value Iteration (VI).
with
The same as previous but is not observable
• Robot has to make decision in the BELIEF STATE Robot’s internal knowledge about the state of the
environment Space of posteriori distribution over state
x
( )b
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• So
with
• Control Policy
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• Belief bel Each value in POMDP is function of
entire probability distribution
• Problems: State Space finite Belief Space
continuous State Space continuous Belief
Space infinitely-dimensional continuum
Also complexity in calculate the Value Function
Because of the integral over all the
distribution
( )b
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)
• At the end optimal solution exist for Interesting Special Case of Finite World: state space; action space; space of observations;
planning horizon All finite
• Solution of VF are Piecewise Linear Function over the belief space
The previous arrive because • Expectation is a linear operation• Ability to select different controls in different parts
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDPExample POMDP
2 States: 1 2,x x 3 Control Actions: 1 2 3, ,u u u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
1 1
1 2
( , ) 100
( , ) 100
r x u
r x u
2 1
2 2
( , ) 100
( , ) 50
r x u
r x u
Example POMDPExample POMDP
When execute payoff:
Dilemma opposite payoff in each state knowledge of the state translate directly into
payoff
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
1 3 2 3( , ) ( , ) 1r x u r x u
Example POMDPExample POMDP
To acquire knowledge robot has control
affects the state of the world in non-deterministic manner:
(Cost of waiting, cost of sensing, etc.)
1 1 3
1 2 3
( , ) 0.2
( , ) 0.8
p x x u
p x x u
1 1 3
1 2 3
( , ) 0.8
( , ) 0.2
p x x u
p x x u
3u
3u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDPExample POMDP
• Benefit Before each control decision, the robot can sense. By sensing robot gains knowledge about the state Make better control decisions High payoff expectation
• In the case of control action , robot sense without terminal action 3u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDPExample POMDP
• The measurement model is governed by the following probability distribution:
1 1
1 2
( ) 0.7
( ) 0.3
p z x
p z x
2 1
2 2
( ) 0.3
( ) 0.7
p z x
p z x
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDPExample POMDP
This example is easy to graph over the belief space (2 states)• Belief state
1 1
2 2 2 1 1
( )
( ) but 1 so we just graph
p b x
p b x p p p
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDPExample POMDP
• Control Policy Function that maps the unit interval [0;1] to space of all
actions
:[0;1] u Example
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
• Control Choice (When to execute what control?)
First consider the immediate payoff . Payoff now is a function of belief state
So for , the expected payoff
Payoff in POMDPs
1 2 3, ,u u u
1 2,b p p
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate
the robot simply selects the action of highest expected payoff
Piecewise Linear convex Function
Maximum of individual payoff function
1 1V T
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate
the robot simply selects the action of highest expected payoff
Piecewise Linear convex Function
Maximum of individual payoff function
1 1V T
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate
the robot simply selects the action of highest expected payoff
Transition occurs when in
1 2, ,r b u r b u
1
3
7p
Optimal Policy
1 1V T
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP - Sensing Example POMDP - Sensing
• Now we have perception What if the robot can sense before it chooses control? How it affects the optimal Value Function
Sensing info about State enable choose better control action
In previous example 13
7p
Expected payoff14,7
How better will this be after sensing?
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control ChoiceBelief after sensing as a function of the belief before sensing
Given by Bayes Rule
Finally
1z
1
0.7 0.40.6087
0.4*0.4 0.3p
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control ChoiceHow this affects the Value Function?
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
Mathematically
That is just replacing by in the Value Function 1p 1p 1V
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
However our interest is the complete Expected Value Function after sensing, that consider also the probability of sensing the other measurement . This is given by:2z
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control ChoiceAn this results in
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Control ChoiceExample POMDP – Control Choice
Mathematically
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP - PredictionExample POMDP - Prediction
To plan at a horizon larger than we have to take this into consideration and project our
value function accordingly
According to our transition probability model
In between the expectation is linear
If
If
1T
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – PredictionExample POMDP – PredictionAn this results in
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – PredictionExample POMDP – Prediction
And adding and we have: 1u 2u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – PredictionExample POMDP – Prediction
Mathematically
cost Fix!!31 of u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – PruningExample POMDP – PruningFull backup :
547,86420 is defined over 10
linear functions
T
561,012,33730 is defined over 10
linear functions
T
Impractical!!!
Efficient approximate POMDP needed
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Finite World POMDP algorithmFinite World POMDP algorithm
To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Finite World POMDP algorithmFinite World POMDP algorithm
To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations
It looks easy let’s try something more “real”…
Probabilistic Robot “RoboProb”
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt looks easy let’s try something more “real”…
Probabilistic Robot “RoboProb”
11 States: 1 2 3 4, , ,x x x x
5 Control Actions: 1u
5 6 7 8, , ,x x x x9 10 11, ,x x x
2u3u 4u5u Sense without moving
1 2 3 45 6 78 9 10 11
0.10.1
0.8
Transition Model
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations
It looks easy let’s try something more “real”…Probabilistic Robot
“RoboProb”
-0,04 -0,04 -0,04 1-0,04 -0,04 -1-0,04 -0,04 -0,04 -0,04
“Reward” Payoff
1 1
8 2
( , ) 0.04
( , ) 0.04
r x u
r x u
The same set for all control action
Example
7 5
7 3
( , ) 1
( , ) 1
r x u
r x u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
Transition Probability
( , )i j kp x x u
Example 1( , )i jp x x u
1u
1 2 3 4 5 6 7 8 9 10 111 0,9 0,1 0 0 0 0 0 0 0 0 02 0,1 0,8 0,1 0 0 0 0 0 0 0 03 0 0,1 0,8 0,1 0 0 0 0 0 0 04 0 0 0 1 0 0 0 0 0 0 05 0,8 0 0 0 0,2 0 0 0 0 0 06 0 0 0,8 0 0 0,1 0,1 0 0 0 07 0 0 0 0 0 0 1 0 0 0 08 0 0 0 0 0,8 0 0 0,1 0,1 0 09 0 0 0 0 0 0 0 0,1 0,8 0,1 0
10 0 0 0 0 0 0,8 0 0 0,1 0 0,111 0 0 0 0 0 0 0,8 0 0 0,1 0,1
Posteriori State
Cur
rent
Sta
te
1 2 3 45 6 78 9 10 11
0.10.1
0.8
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
Transition Probability
( , )i j kp x x u
Example 5( , )i jp x x u
1 2 3 45 6 78 9 10 11
1 2 3 4 5 6 7 8 9 10 111 1 0 0 0 0 0 0 0 0 0 02 0 1 0 0 0 0 0 0 0 0 03 0 0 1 0 0 0 0 0 0 0 04 0 0 0 1 0 0 0 0 0 0 05 0 0 0 0 1 0 0 0 0 0 06 0 0 0 0 0 1 0 0 0 0 07 0 0 0 0 0 0 1 0 0 0 08 0 0 0 0 0 0 0 1 0 0 09 0 0 0 0 0 0 0 0 1 0 0
10 0 0 0 0 0 0 0 0 0 1 011 0 0 0 0 0 0 0 0 0 0 1
Posteriori State
Cur
rent
Sta
te
5u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
Measurement Probability
( )j ip z x
1 2 3 4 5 6 7 8 9 10 111 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,032 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,033 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,034 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,035 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,036 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,037 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,038 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,039 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03
10 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,0311 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7
Probability of Measuring Zi
Cur
rent
Sta
te
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
Belief States
1 1( )p b x
3 3( )p b x2 2( )p b x
11 11 2 3 10( ) 1p b x p p p
Impossible to graph!!
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
Each linear function results from executing control , followed by observing measurement , and then executing control .
uz
u
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
Defining Measurement Probability
Defining “Reward” Payoff
Defining Transition Probability
Merging Transition (Control) Probability
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…
Probabilistic Robot “RoboProb”
u
z
u
Setting Beliefs
Executing
Sensing
number of states
number of controlsC
N
N
timesCN
timesN
Executing timesCN
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsNow What…?
Probabilistic Robot “RoboProb”
Calculating
number of states
number of controlsC
N
N
The real problem is to compute
,r b u timesN
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations
The real problem is to compute
Given a belief and a control action , the outcome is a distribution over distributions.
Because belief is also based on the next measurement, the measurement itself is generated stochastically.
,p b u bKey factor in this update is the conditional probability
This probability specifies a distribution over probability distributions.
b u
b
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations
The real problem is to compute
So we make
Contain only on non-zero term = b
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations
The real problem is to compute
Arriving to:
Just integrate over measurements instead of uzBecause our space is finite we have
With
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations
The real problem is to compute
At the end we have something
So, this VIA is far from practical. For any reasonable number of distinct states, measurements,
and controls, the complexity of the value function is prohibitive, even for relatively beginning planning horizons.
Need for approximations
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP TechniquesApproximate POMDP Techniques
• Here we have 3 approximate probabilistic planning and control algorithms QMDP AMDP MC-POMDP
• Varying degrees of practical applicability. • All 3 algorithms relied on approximations of the
POMDP value function. • They differed in the nature of their
approximations.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - QMDPApproximate POMDP Techniques - QMDP
• The QMDP framework considers uncertainty only for a single action choice: Assumes after the immediate next control action, the
state of the world suddenly becomes observable. Full observability make possible to use the MDP-
optimal value function. QMDP generalizes the MDP value function to belief
spaces through the mathematical expectation operator.
Planning in QMDPs is as efficient as in MDPs, but the value function generally overestimates the true value of a belief state.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - QMDPApproximate POMDP Techniques - QMDP
• Algorithm
• The QMDP framework considers uncertainty only for a single action choice.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP
• Augmented-MDP (AMDP) maps the belief into a lower-dimensional representation, over which it then performs exact value iteration.
• “Classical" representation consists of the most likely state under a belief, along with the belief entropy.
• AMDPs are like MDPs with one added dimension in the state representation that measures global degree of uncertainty.
• To implement AMDP, its necessary to learn the state transition and the reward function in the low-dimensional belief space.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP
• “Classical" representation consists of the most likely state under a belief, along with the belief entropy.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP
MEAN
COVARIANCE
TRUE COVARIANCE
TRUE MEAN
ESTIMATED MEAN
ESTIMATED COVARIANCE
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP
• AMDPs in mobile robot navigation is called coastal navigation.
• Anticipates uncertainty• Selects motion that trades off overall path
length with the uncertainty accrued along a path.
• Resulting trajectories differ significantly from any non-probabilistic solution.
• Being temporarily lost is acceptable, if the robot can later re-localize with sufficiently high probability.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP
• AMDP Algorithm
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP
• The Monte Carlo MPOMDP (MC-POMDP)• Particle filter version of POMDPs. • Calculates a value function defined over sets of
particles. • MC-POMDPs uses local learning technique,
which used a locally weighted learning rule in combination with a proximity test based on KL-divergence.
• MC-POMDPs then apply Monte Carlo sampling to implement an approximate value backup.
• The resulting algorithm is a full-fledged POMDP algorithm whose computational complexity and accuracy are both functions of the parameters of the learning algorithm.
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP
• particle set representing belief b
• Value Function
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP
• MC-POMDP Algorithm
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
• Contents :- Motivation
- Conclusions- Problem Description- Objective - Robot Model- Experimental Results
discrete Monte Carlo representation of 1:11 kk yxp
set of N particles : )(1ikx
Draw new particles from proposal Distribution
)(1
)( ik
ik xxp
Given new observation ky
evaluate importance weights using likelihood function
)()( ikk
ik xypw
Resample Particles
Discrete Monte Carlo representation (aproximation) of kk yxp :1
Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
References and LinksReferences and Links
• References[1] Thrun, Burgard, Fox. Probabilistic Robotics. MIT Press, 2005
• Linkshttp://en.wikipedia.org/wiki/Partially_observable_Markov_decision_processhttp://www.cs.cmu.edu/~trey/zmdp/http://www.cassandra.org/pomdp/index.shtml http://www.cs.duke.edu/~mlittman/topics/pomdp-page.html
TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology
ExerciseExerciseExercise 1 in [1] Chapter 15A person faces two doors. Behind one is a tiger, behind the other a reward of +10. The person can either listen or open one of the doors. When opening the door with a tiger, the person will be eaten, which has an associated cost of -20. Listening costs -1. When listening, the person will hear a roaring noise that indicates the presence of the tiger, but only with 0.85 probability will the person be able to localize the noise correctly. With 0.15 probability, the noise will appear as if it came from the door hiding the reward.
Your questions:
(a) Provide the formal model of the POMDP, in which you define the state, action, and measurement spaces, the cost function, and the associated probability functions. (b) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, listen, open door 1"? Explain your calculation.
(c) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, then open the door for which we did not hear a noise"? Again, explain your calculation.