approximate pomdp planning: overcoming the curse of history!

64
Approximate POMDP planning: Overcoming the curse of history! Presented by: Joelle Pineau Joint work with: Geoff Gordon and Sebastian Thrun Machine Learning Lunch - March 10, 2003

Upload: vidor

Post on 13-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Approximate POMDP planning: Overcoming the curse of history!. Presented by: Joelle Pineau Joint work with: Geoff Gordon and Sebastian Thrun Machine Learning Lunch - March 10, 2003. To use or not to use a POMDP. POMDPs provide a rich framework for sequential decision-making, which can model: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Approximate POMDP planning: Overcoming the curse of history!

Approximate POMDP planning:

Overcoming the curse of history!

Presented by: Joelle Pineau

Joint work with: Geoff Gordon and Sebastian Thrun

Machine Learning Lunch - March 10, 2003

Page 2: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

To use or not to use a POMDP

• POMDPs provide a rich framework for sequential decision-making, which can model:

– varying rewards across actions and goals

– uncertainty in the action effects

– uncertainty in the state of the world

Page 3: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Existing applications of POMDPs

– Maintenance scheduling

» Puterman, 1994

– Robot navigation

» Koenig & Simmons, 1995;

Roy & Thrun, 1999

– Helicopter control

» Bagnell & Schneider, 2001;

Ng et al., 2002

– Dialogue modeling

» Roy, Pineau & Thrun, 2000;

Peak&Horvitz, 2000

– Preference elicitation

» Boutilier, 2002

Page 4: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Graphical Model Representation

POMDP is n-tuple { S, A, , b, T, O, R }:

What goes on: st-1 st

b(s) = initial beliefT(s,a,s’) = state-to-state transition probabilitiesO(s,a,o) = observation generation probabilitiesR(s,a) = Reward function

S = state setA = action set = observation set

Page 5: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Graphical Model Representation

POMDP is n-tuple { S, A, , b, T, O, R }:

What goes on:

at-1

st-1 st

at

rt-1 rt

b(s) = initial beliefT(s,a,s’) = state-to-state transition probabilitiesO(s,a,o) = observation generation probabilitiesR(s,a) = Reward function

S = state setA = action set = observation set

(s)(s)

Page 6: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Graphical Model Representation

POMDP is n-tuple { S, A, , b, T, O, R }:

What goes on:

at-1

ot

st-1 st

at

ot-1rt-1 rt

b(s) = initial beliefT(s,a,s’) = state-to-state transition probabilitiesO(s,a,o) = observation generation probabilitiesR(s,a) = Reward function

S = state setA = action set = observation set

What we see:(s) (s)

Page 7: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Graphical Model Representation

POMDP is n-tuple { S, A, , b, T, O, R }:

What we infer:

What goes on:

bt-1 bt

at-1

ot

st-1 st

at

ot-1rt-1 rt

b(s) = initial beliefT(s,a,s’) = state-to-state transition probabilitiesO(s,a,o) = observation generation probabilitiesR(s,a) = Reward function

S = state setA = action set = observation set

What we see:

(b)(b)

Page 8: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2}

P(s1)

0

1

Page 9: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2, s3}

P(s1)

P(s2)

0

1

1

Page 10: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2, s3 , s4}

P(s1)

P(s2)

0

1

1

P(s3)

Page 11: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

The first curse of POMDP planning

• The curse of dimensionality:

– dimension of the belief = # of states

» dimension of planning problem = # of states

– related to the MDP curse of dimensionality

Page 12: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Planning for POMDPs

• Learning a value function V(b) bB:

• Learning an action-selection policy (b) bB:

Bb

AabVbabTabRbV

'

)'()',,(),(max)(

BbAabVbabTabRb

'

)'()',,(),(maxarg)(

Page 13: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Exact value iteration for POMDPs

• Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 0 1

P(s1)

V0(b)

b

Page 14: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Exact value iteration for POMDPs

• Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 0 1 1 3

P(s1)

V1(b)

b

Page 15: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Exact value iteration for POMDPs

• Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 0 1 1 3

P(s1)

V1(b)

b

Page 16: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Exact value iteration for POMDPs

• Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 0 1 1 3 2 27

P(s1)

V2(b)

b

Page 17: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Exact value iteration for POMDPs

• Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 0 1 1 3 2 27 3 2187

P(s1)

V2(b)

b

Page 18: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Exact value iteration for POMDPs

• Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 0 1 1 3 2 27 3 2187 4 14,348,907

P(s1)

V2(b)

b

Page 19: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Properties of exact value iteration

• Value function is always piecewise-linear convex

• Many hyper-planes can be pruned away

P(s1)

V2(b)

b

|S|=2, |A|=3, ||=2 Iteration # hyper-planes

0 1 1 3 2 5 3 9 4 7 5 13 10 27 15 47 20 59

Page 20: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Is pruning sufficient?

|S|=20, |A|=6, ||=8

Iteration # hyper-planes0 11 5

2 213 3 ?????

Not for this problem!

Page 21: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

The second curse of POMDP planning

• The curse of dimensionality:

– the dimension of each hyper-plane = # of states

• The curse of history:

– the number of hyper-planes grows

exponentially with the planning horizon

Page 22: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

The second curse of POMDP planning

• The curse of dimensionality:

– the dimension of each hyper-plane = # of states

• The curse of history:

– the number of hyper-planes grows

exponentially with the planning horizon

||1

2 |||||| nAS

|| n

Complexity of POMDP value iteration:

dimensionality history

Page 23: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Possible approximation approaches

• Ignore the belief:

• Discretize the belief:

• Compress the belief:

• Plan for trajectories:

s1

s0

s2

- overcomes both curses- very fast- performs poorly in high entropy beliefs[Littman et al., 1995]

- overcomes the curse of history (sort of) - scales exponentially with # states[Lovejoy, 1991; Brafman 1997;Hauskrecht, 1998; Zhou&Hansen, 2001]

- overcomes the curse of dimensionality[Poupart&Boutilier, 2002; Roy&Gordon, 2002]

- can diminish both curses- requires restricted policy class- local minimum, slow-changing gradients[Baxter&Bartlett, 2000; Ng&Jordan, 2002]

Page 24: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

A new algorithm: Point-based value iteration

• Main idea:

– Select a small set of belief points

P(s1)

V(b)

b1 b0 b2

Page 25: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

A new algorithm: Point-based value iteration

• Main idea:

– Select a small set of belief points

– Plan for those belief points only

P(s1)

V(b)

b1 b0 b2

Page 26: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

A new algorithm: Point-based value iteration

• Main idea:

– Select a small set of belief points Focus on reachable beliefs

– Plan for those belief points only

P(s1)

V(b)

b1 b0 b2a,o a,o

Page 27: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

A new algorithm: Point-based value iteration

• Main idea:

– Select a small set of belief points Focus on reachable beliefs

– Plan for those belief points only Learn value and its gradient

P(s1)

V(b)

b1 b0 b2a,o a,o

Page 28: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Point-based value update

P(s1)

V(b)

b1 b0 b2

Page 29: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Point-based value update

• Initialize the value function (…and skip ahead a few iterations)

P(s1)

Vn(b)

b1 b0 b2

Page 30: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

Point-based value update

P(s1)

Vn(b)

b

Page 31: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

Point-based value update

P(s1)

Vn(b)

b

)()( ,, oan

oab bVs

ba1,o2ba2,o2ba2,o1ba1,o1

Page 32: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

Point-based value update

P(s1)

Vn(b)

b

)()( ,, oan

oab bVs

ba1,o2ba2,o2ba2,o1ba1,o1

ba1,o1, b

a2,o1

ba2,o2

ba1,o2

Page 33: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

– Sum over observations:

Point-based value update

P(s1)

Vn(b)

b

',

, )'(),,()',,(),()(so

oab

ab soasOsasTasRs

)()( ,, oan

oab bVs

ba1,o2ba2,o2ba2,o1

ba1,o1, b

a2,o1

ba2,o2

ba1,o2

ba1,o1

Page 34: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

– Sum over observations:

Point-based value update

P(s1)

Vn(b)

b

',

, )'(),,()',,(),()(so

oab

ab soasOsasTasRs

)()( ,, oan

oab bVs

ba1,o1, b

a2,o1

ba2,o2

ba1,o2

Page 35: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

– Sum over observations:

Point-based value update

P(s1)

Vn+1(b)

b

',

, )'(),,()',,(),()(so

oab

ab soasOsasTasRs

)()( ,, oan

oab bVs

ba1

ba2

Page 36: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

– Sum over observations:

– Max over actions:

Point-based value update

',

, )'(),,()',,(),()(so

oab

ab soasOsasTasRs

)()( ,, oan

oab bVs

abanV maxarg1

P(s1)

Vn+1(b)

b

ba1

ba2

Page 37: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

• Initialize the value function (…and skip ahead a few iterations)

• For each bB:

– For each (a,o): Project forward bba,o and find best value:

– Sum over observations:

– Max over actions:

Point-based value update

',

, )'(),,()',,(),()(so

oab

ab soasOsasTasRs

)()( ,, oan

oab bVs

abanV maxarg1

P(s1)

Vn+1(b)

b1 b2b0

Page 38: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Complexity of value update

Exact Update Point-based Update

I - Projection S2An S2AB

II - Sum S2An SAB2

III - Max SAn SAB

where: S = # states n = # solution vectors at iteration n A = # actions B = # belief points

= # observations

n+1

Page 39: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Theoretical properties of point-based updates

• Theorem: For any belief set B and any horizon n, the error of the PBVI algorithm n=||Vn

B-Vn*|| is bounded by:

1'

2minmax

||'||minmax

)1(

)(

bbwhere

RR

BbbB

Bn

Page 40: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Back to the full algorithm

• Main idea:

– Select a small set of belief points PART II

– Plan for those belief points only PART I

P(s1)

V(b)

b1 b0 b2a,o a,o

Page 41: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Experimental results: Lasertag domain

State space = RobotPosition OpponentPositionObservable: RobotPosition - always

OpponentPosition - only if same as Robot

Action space = {North, South, East, West, Tag}

Opponent strategy: Move away from robot w/ Pr=0.8

|S|=870, |A|=5, ||=30

Page 42: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Performance of PBVI on Lasertag domain

Opponent tagged 59% of trials

Opponent tagged 17% of trials

Page 43: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Performance on well-known POMDPs

Maze33|S|=36, |A|=5, ||=17

Hallway|S|=60, |A|=5, ||=20

Hallway2|S|=92, |A|=5, ||=17

Reward0.1980.942.302.25

Reward0.261n.v.0.530.53

Reward0.109n.v.0.350.34

Time(s)0.19n.v.

121663448

Time(s)0.51n.v.450288

Time(s)1.44n.v.

27898360

Bn.a.174660470

Bn.a.n.a.30086

Bn.a.337

184095

%Goal2298

10098

%Goal47n.v10095

MethodQMDP

GridPBUAPBVI

Page 44: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Back to the full algorithm

• Main idea:

– Select a small set of belief points PART II

– Plan for those belief points only PART I

P(s1)

V(b)

b1 b0 b2a,o a,o

Page 45: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Selecting good belief points

• What can we learn from policy search methods?

– Focus on reachable beliefs.

Page 46: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Selecting good belief points

• What can we learn from policy search methods?

– Focus on reachable beliefs.

P(s1)

b ba1,o2ba2,o2ba2,o1ba1,o1

a2,o2 a1,o2

a2,o1

a1,o1

Page 47: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Selecting good belief points

• What can we learn from policy search methods?

– Focus on reachable beliefs.

• What can we learn from MDP exploration techniques?

– Select widely-spaced beliefs, rather than near-by beliefs.

P(s1)

b ba1,o2ba2,o2ba2,o1ba1,o1

Page 48: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Selecting good belief points

• What can we learn from policy search methods?

– Focus on reachable beliefs.

• What can we learn from MDP exploration techniques?

– Select widely-spaced beliefs, rather than near-by beliefs.

P(s1)

b ba1,o2ba2,o2ba2,o1ba1,o1

Page 49: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

How does PBVI actually select belief points?

• Start with B b0

• For any belief point bB:

P(s1)

b

Page 50: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

How does PBVI actually select belief points?

• Start with B b0

• For any belief point bB:

– For each action aA:

» Generate a new belief ba by applying a and stochastically picking an

observation o.

P(s1)

b

a1,o2

ba1

Page 51: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

How does PBVI actually select belief points?

• Start with B b0

• For any belief point bB:

– For each action aA:

» Generate a new belief ba by applying a and stochastically picking an

observation o.

P(s1)

b

a2,o2 a1,o2

ba1ba2

Page 52: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

How does PBVI actually select belief points?

• Start with B b0

• For any belief point bB:

– For each action aA:

» Generate a new belief ba by applying a and stochastically picking an

observation o.

– Add to B the belief ba which is farthest away from any bB.

» B argmax{ba} [ minbB s(ba(s) - b(s)) ]

P(s1)

b

a2,o2 a1,o2

ba1ba2

Page 53: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

How does PBVI actually select belief points?

• Start with B b0

• For any belief point bB:

– For each action aA:

» Generate a new belief ba by applying a and stochastically picking an

observation o.

– Add to B the belief ba which is farthest away from any bB.

» B argmax{ba} [ minbB s |ba(s) - b(s)| ]

• Repeat until |B| = set size

P(s1)

b

a2,o2 a1,o2

ba1ba2

Page 54: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

The anytime PBVI algorithm

• Alternate between:

– Growing the set of belief point (e.g. B doubles in size everytime)

– Planning for those belief points

• Terminate when you run out of time or have a good policy.

Page 55: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

The anytime PBVI algorithm

• Alternate between:

– Growing the set of belief point (e.g. B doubles in size everytime)

– Planning for those belief points

• Terminate when you run out of time or have a good policy.

• Lasertag results:

– 13 phases: |B|=1334

– ran out of time!

Page 56: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

The anytime PBVI algorithm

• Alternate between:

– Growing the set of belief point (e.g. B doubles in size everytime)

– Planning for those belief points

• Terminate when you run out of time or have a good policy.

• Lasertag results:

– 13 phases: |B|=1334

– ran out of time!

• Hallway2 results:

– 8 phases: |B|=95

– found good policy.

Page 57: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Alternative belief expansion heuristics

• Compare 4 approaches to belief expansion:

– Random (RA)

P(s1)

Page 58: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Alternative belief expansion heuristics

• Compare 4 approaches to belief expansion:

– Random (RA)

– Stochastic Simulation with Random Action (SSRA)

P(s1)

b

a2,o2 a1,o2

ba1,o2ba2,o2

Page 59: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Alternative belief expansion heuristics

• Compare 4 approaches to belief expansion:

– Random (RA)

– Stochastic Simulation with Random Action (SSRA)

– Stochastic Simulation with Greedy Action (SSGA)

P(s1)

b

a2,o2

ba2,o2

(b)=a2

Page 60: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Validation of the belief expansion heuristic

• Compare 4 approaches to belief expansion:

– Random (RA)

– Stochastic Simulation with Random Action (SSRA)

– Stochastic Simulation with Greedy Action (SSGA)

– Stochastic Simulation with Explorative Action (SSEA)

P(s1)

b

a2,o2 a1,o2

ba1,o2ba2,o2

maxa||b-ba||

Page 61: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Validation of the belief expansion heuristic

• Hallway domain: |S|=60, |A|=5, ||=20

Page 62: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Validation of the belief expansion heuristic

• Hallway2 domain: |S|=92, |A|=5, ||=17

Page 63: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Validation of the belief expansion heuristic

• Tag domain: |S|=870, |A|=5, ||=30

Page 64: Approximate POMDP planning: Overcoming the curse of history!

Machine Learning Lunch - March 10, 2003 Joelle Pineau

Summary

• POMDPs suffer from the curse of history» # of beliefs grows exponentially with the planning horizon

• PBVI addresses the curse of history by limiting planning to a small set of likely beliefs.

• Strengths of PBVI include:» anytime algorithm;

» polynomial-time value updates;

» bounded approximation error;

» empirical results showing we can solve problems up to ~1000 states.