finding approximate pomdp solutions through belief compression
DESCRIPTION
Finding Approximate POMDP Solutions through Belief Compression. Based on slides by Nicholas Roy, MIT. Estimated robot position Robot position distribution True robot position Goal position. Reliable Navigation. Conventional trajectories may not be robust to localisation error. Control. - PowerPoint PPT PresentationTRANSCRIPT
Based on slides byNicholas Roy, MIT
Finding Approximate POMDP Solutions through Belief Compression
Reliable Navigation
Conventional trajectories may not be robust to localisation error
Estimated robot positionRobot position distribution
True robot positionGoal position
Perception and Control
Perception Control
World state
Control algorithms
Perception and Control
Assumed full observability
Exact POMDP planning
Probabilistic Perception
ModelP(x) argmax P(x) Control
World state World state
Probabilistic Perception
ModelP(x) Control
Brittle
Intractable
Perception and Control
Assume full observability
Exact POMDP planning
Brittle
World state
Probabilistic Perception
ModelP(x) Compressed P(x) Control
Intractable
Main Insight
World state
Probabilistic Perception
ModelP(x) Low-dimensional P(x) Control
Good policies for real world POMDPs can be found by planning over low-dimensional representations
of the belief space.
but not usually.
The controller may be globally uncertain...
Belief Space Structure
Coastal Navigation
Represent beliefs using
Discretise into low-dimensional belief space MDP
)();(maxarg~ bHsbbs
Coastal Navigation
A Hard Navigation Problem
0
1
2
3
4
5
6
7
8
9
Maximum Likelihood AMDP
Dis
tanc
e in
M
Average Distance to Goal
Dimensionality Reduction
Principal Components Analysis
Original Beliefs
WeightsCharacteristicBeliefs
Principal Components Analysis
Given belief bn, we want bm, m«n.
Collection of beliefs drawn from 200 state problem
Prob
abili
ty o
f bei
ng in
stat
e
State
~
One sample distribution
m=9 gives this representation for one sample distribution
Principal Components Analysis
Given belief bn, we want bm, m«n.
Prob
abili
ty o
f bei
ng in
stat
e
State
~
Principal Components Analysis
Many real world POMDP distributions are characterised by large regions of low probability.
Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)
1 basis2 bases3 bases4 bases
Example EPCA
State
Prob
abili
ty o
f bei
ng in
stat
e
Example Reduction
E-PCA will indicate appropriate number of bases, depending on beliefs encountered
Finding Dimensionality
Planning
S1
S2
S3Original POMDP Low-dimensional
belief space B
E-PCA
Discrete beliefspace MDP
Discretise
~
Model Parameters
Reward function
R(b)
s1 s2 s3
p(s)
Back-project to high dimensional belief
S
b sRspsREbR )()())(()(
Compute expected reward from belief:~~
Model Parameters
Low dimensionFull dimension
~1. For each belief bi and action a
bi
~3. Propagate according to
action
bj
4. Propagate according toobservation
bj
~
~5. Recover bj
||
1
||
1
||
1
)(),|()|()~
,,~
(bZ
k
S
l
S
mmjmllkji sbasspszpbabT
6. Set T(bi, a, bj) to probabilityof observation
~~ bi
~2. Recover full belief bi
Robot Navigation Example
True (hidden) robot positionGoal position
Goal state
Initial Distribution
Robot Navigation Example
True robot positionGoal position
Policy Comparison
0
1
2
3
4
5
6
7
8
9
Maximum Likelihood AMDP E-PCA
Average Distance to GoalD
ista
nce
in M
6 bases
People Finding
People Finding as a POMDP
Fully Observable Robot
Position of person unknownRobot position
True person position
Finding and Tracking People
Robot positionTrue person position
People Finding as a POMDP
Factored belief space2 dimensions: fully-observable robot position6 dimensions: distribution over person positions
Regular grid gives ≈ 1016 states
Variable Resolution
Non-regular grid using samples
b1b2 b3 b4
b5
T(b1, a1, b2)
T(b1, a2, b5)
Compute model parameters using nearest-neighbour
~ ~
~ ~
~
~~
~ ~
Refining the Grid
V(b1)~
V(b'1)~
Sample beliefs according to policy
b1
~
b'~
Construct new model~ ~Keep new belief if V(b'1) > V(b1)
The Optimal Policy
Original distribution
Reconstruction using EPCA and 6 bases
Robot positionTrue person position
0
50
100
150
200
250
Closest Densest MaximumLikelihood
E-PCA RefinedE-PCA
Policy Comparison
Average time to find person
Ave
rage
# o
f Act
ions
to fi
nd P
erso
n
E-PCA: 72 statesRefined E-PCA: 260 states
Fully observable MDP
Nick’s Thesis Contributions
Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA.
POMDPs can scale to bigger, more complicated real-world problems.POMDPs can be used for real deployed robots.