sa-1 probabilistic robotics tutorial aaai-2000 sebastian thrun computer science and robotics...

Post on 17-Dec-2015

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SA-1

Probabilistic RoboticsTutorial AAAI-2000

Sebastian ThrunComputer Science and Robotics

Carnegie Mellon University

© Sebastian Thrun, CMU, 2000 2SA-1

Recommended Readings

Probabilistic Algorithms in Robotics

(basic survey, 95 references)

AI Magazine (to appear Dec 2000)

Also: Tech Report: CMU-CS-00-126

http://www.cs.cmu.edu/~thrun/papers/thrun.probrob.html

© Sebastian Thrun, CMU, 2000 3SA-1

Collaborators and Funding

Anita ArendraMichael BeetzMaren BennewitzEric BauerJoachim BuhmannWolfram BurgardArmin B. CremersFrank DellaertDieter FoxDirk HähnelJohn Langford

Gerhard Lakemeyer

Dimitris MargaritisMichael MontemerloSungwoo ParkFrank PfenningJoelle PineauMartha PollackCharles RosenbergNicholas RoyJamieson SchulteReid SimmonsDirk SchulzWolli Steiner

Special thanks: Kurt Konolige, John Leonard, Andrew Moore, Reid Simmons

Sponsored by: DARPA (TMR, CoABS, MARS), NSF (CAREER, IIS, LIS),

EC, Daimler Benz, Microsoft + others

© Sebastian Thrun, CMU, 2000 4SA-1

Tutorial Goal

To familiarize you with probabilistic paradigm in robotics

• Basic techniques• Advantages• Pitfalls and limitations

• Successful Applications• Open research issues

© Sebastian Thrun, CMU, 2000 5SA-1

Tutorial Outline

Introduction Probabilistic State Estimation

• Localization• Mapping

Probabilistic Decision Making• Planning• Exploration

Conclusion

© Sebastian Thrun, CMU, 2000 6SA-1

Robotics Yesterday

© Sebastian Thrun, CMU, 2000 7SA-1

Robotics Today

© Sebastian Thrun, CMU, 2000 8SA-1

Robotics Tomorrow?

© Sebastian Thrun, CMU, 2000 9SA-1

Current Trends in Robotics

Robots are moving away from factory floors to

• Entertainment, toys• Personal services• Medical, surgery• Industrial automatization (mining, harvesting, …)• Hazardous environments (space, underwater)

© Sebastian Thrun, CMU, 2000 10SA-1

Robots are Inherently Uncertain

Uncertainty arises from four major factors:• Environment stochastic, unpredictable• Robot stochastic• Sensor limited, noisy• Models inaccurate

© Sebastian Thrun, CMU, 2000 11SA-1

Probabilistic Robotics

)(

)()|()|(

bp

apabpbap

© Sebastian Thrun, CMU, 2000 12SA-1

Probabilistic Robotics

Key idea: Explicit representation of uncertainty (using the calculus of probability theory)

Perception = state estimation Action = utility optimization

© Sebastian Thrun, CMU, 2000 13SA-1

Advantages of Probabilistic Paradigm

Can accommodate inaccurate models Can accommodate imperfect sensors Robust in real-world applications Best known approach to many hard robotics

problems

© Sebastian Thrun, CMU, 2000 14SA-1

Pitfalls

Computationally demanding False assumptions Approximate

© Sebastian Thrun, CMU, 2000 15SA-1

Trends in Robotics

Reactive Paradigm (mid-80’s)• no models• relies heavily on good sensing

Probabilistic Robotics (since mid-90’s)• seamless integration of models and sensing• inaccurate models, inaccurate sensors

Hybrids (since 90’s)• model-based at higher levels• reactive at lower levels

Classical Robotics (mid-70’s)• exact models• no sensing necessary

© Sebastian Thrun, CMU, 2000 16SA-1

Example: Museum Tour-Guides Robots

Rhino, 1997 Minerva, 1998

© Sebastian Thrun, CMU, 2000 17SA-1

Rhino (Univ. Bonn + CMU, 1997)

W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D. Schulz, W. Steiner, S. Thrun

© Sebastian Thrun, CMU, 2000 18SA-1

Minerva (CMU + Univ. Bonn, 1998)

Minerva

S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Hähnel, C. Rosenberg, N. Roy, J. Schulte, D. Schulz

© Sebastian Thrun, CMU, 2000 19SA-1

“How Intelligent Is Minerva?”

fish dog monkey humanamoeba

5.7%

29.5%

25.4%

36.9%

2.5%

© Sebastian Thrun, CMU, 2000 20SA-1

“Is Minerva Alive?"

undecided noyes

3.2%

27.0%

69.8%

© Sebastian Thrun, CMU, 2000 21SA-1

“Is Minerva Alive?"

undecided noyes

3.2%

27.0%

69.8%

“Are You Under 10 Years of Age?”

© Sebastian Thrun, CMU, 2000 22SA-1

Nature of Sensor Data

Odometry Data Range Data

© Sebastian Thrun, CMU, 2000 23SA-1

Technical Challenges

Navigation• Environment crowded, unpredictable• Environment unmodified• “Invisible” hazards• Walking speed or faster• High failure costs

Interaction• Individuals and crowds• Museum visitors’ first encounter• Age 2 through 99• Spend less than 15 minutes

© Sebastian Thrun, CMU, 2000 24SA-1

© Sebastian Thrun, CMU, 2000 25SA-1

Tutorial Outline

Introduction Probabilistic State Estimation

• Localization• Mapping

Probabilistic Decision Making• Planning• Exploration

Conclusion

© Sebastian Thrun, CMU, 2000 26SA-1

The Localization Problem

Estimate robot’s coordinates s=(x,y,) from sensor data• Position tracking (error bounded)• Global localization (unbounded error)• Kidnapping (recovery from failure)

Ingemar Cox (1991): “Using sensory information to locate the robot in its environment is the most fundamental problem to provide a mobile robot with autonomous capabilities.”

see also [Borenstein et al, 96]

© Sebastian Thrun, CMU, 2000 27SA-1

s

p(s)

Probabilistic Localization

[Simmons/Koenig 95][Kaelbling et al 96][Burgard et al 96]

© Sebastian Thrun, CMU, 2000 28SA-1

Bayes Filters

)|()( 0 ttt dspsb

1011011 ),,|(),,,|()|( tttttttt dsoaspoasspsop

1111 )(),|()|( ttttttt dssbasspsop

),,,,|( 011 ooaosp tttt

),,,|(),,,,|( 011011 ooaspooasop ttttttt Bayes

),,,|()|( 011 ooaspsop ttttt Markov

110111 )|(),|()|( tttttttt dsdspasspsop

[Kalman 60, Rabiner 85]

d = datao = observationa = actiont = times = state

Markov

1021111 ),,|(),|()|( ttttttttt dsoaospasspsop

© Sebastian Thrun, CMU, 2000 29SA-1

Bayes Filters are Familiar to AI!

Kalman filters Hidden Markov Models Dynamic Bayes networks Partially Observable Markov Decision Processes

(POMDPs)

1111 )(),|()|()( tttttttt dssbasspsopsb

© Sebastian Thrun, CMU, 2000 30SA-1

Markov Assumption

)|(),,,,|( 011 tttttt sopooasop ),|(),,,,|( 110111 ttttttt asspooassp

)|,()|,,()|,,,,( 0101 ttttTtttT soapsoopsoaoop

} used above

Knowledge of current state renders past, future independent:

• “Static World Assumption”• “Independent Noise Assumption”

© Sebastian Thrun, CMU, 2000 31SA-1

Localization With Bayes Filters

1111 )(),|()|()( tttttttt dssbasspsopsb

1111 )|(),,|(),|()|( tttttttt dsmsbmasspmsopmsb

map m

s’a

p(s|a,s’,m)

a

s’

laser data p(o|s,m)p(o|s,m)observation o

© Sebastian Thrun, CMU, 2000 32SA-1

Xavier: (R. Simmons, S. Koenig, CMU 1996)

Markov localization in a topological map

© Sebastian Thrun, CMU, 2000 33SA-1

Markov Localizationin Grid Map

[Burgard et al 96] [Fox 99]

© Sebastian Thrun, CMU, 2000 34SA-1

What is the Right Representation?

Kalman filter

[Schiele et al. 94], [Weiß et al. 94], [Borenstein 96], [Gutmann et al. 96, 98], [Arras 98]

Piecewise constant(metric, topological)

[Nourbakhsh et al. 95], [Simmons et al. 95], [Kaelbling et al. 96], [Burgard et al. 96], [Konolige et al. 99]

Variable resolution(eg, trees)

[Burgard et al. 98]

Multi-hypothesis

[Weckesser et al. 98], [Jensfelt et al. 99]

© Sebastian Thrun, CMU, 2000 35SA-1

Idea: Represent Belief Through Samples

• Particle filters[Doucet 98, deFreitas 98]

• Condensation algorithm[Isard/Blake 98]

• Monte Carlo localization[Fox/Dellaert/Burgard/Thrun 99]

1111 )|(),,|(),|()|( tttttttt dsmsbmasspmsopmsb

Monte Carlo Localization (MCL)

MCL: Importance Sampling)(),|()( tttt sbmsopsb

),|( msop tt

tttttt ssbmsaspsb d)(),,|()( 11

MCL: Robot Motion

motion

)|( loP t

MCL: Importance Sampling)(),|()( 1111 tttt sbmsopsb

© Sebastian Thrun, CMU, 2000 40SA-1

1111 )|(),,|(),|()|( tttttttt dsmsbmasspmsopmsb

Particle Filters

draw s(i)t1 from b(st1)

draw s(i)t from p(st | s(i)

t1,at1,m)

Represents b(st) by set of weighted particles {s(i)t,w(i)

t}

Importance factor for s(i)t:

ondistributi proposal

ondistributitarget )( itw

),|( )( msop itt

)(),,|(

)(),,|(),|()(11

)(1

)(

)(11

)(1

)()(

itt

it

it

itt

it

it

itt

sBelmassp

sBelmasspmsop

© Sebastian Thrun, CMU, 2000 41SA-1

Monte Carlo Localization

© Sebastian Thrun, CMU, 2000 42SA-1

Monte Carlo Localization, cont’d

© Sebastian Thrun, CMU, 2000 43SA-1

Performance Comparison

Monte Carlo localizationMarkov localization (grids)

© Sebastian Thrun, CMU, 2000 44SA-1

Monte Carlo Localization

Approximate Bayes Estimation/Filtering• Full posterior estimation• Converges in O(1/#samples) [Tanner’93]• Robust: multiple hypothesis with degree of belief• Efficient: focuses computation where needed• Any-time: by varying number of samples• Easy to implement

© Sebastian Thrun, CMU, 2000 45SA-1

Pitfall: The World is not Markov!

99.0)(),|()short is (?

ttt

oo

tt dssbdomsopopt [Fox et al 1998]

Distance filters:

© Sebastian Thrun, CMU, 2000 46SA-1

Avoiding Collisions with Invisible Hazards

Raw sensors

ttamst dssbIaopt

)()( ),raytrace(

99.0)(sup* aopa ta

Virtual sensors added

© Sebastian Thrun, CMU, 2000 49SA-1

Multi-Robot Localization

Robots can detect each other (using cameras)

[Fox et al, 1999]

© Sebastian Thrun, CMU, 2000 50SA-1

Probabilistic Localization: Lessons Learned

Probabilistic Localization = Bayes filters Particle filters: Approximate posterior by random

samples Extensions:

• Filter for dynamic environments• Safe avoidance of invisible hazards• People tracking• Multi-robot localization• Recovery from total failures [eg Lenser et al, 00, Thrun et al 00]

© Sebastian Thrun, CMU, 2000 51SA-1

Tutorial Outline

Introduction Probabilistic State Estimation

• Localization• Mapping

Probabilistic Decision Making• Planning• Exploration

Conclusion

© Sebastian Thrun, CMU, 2000 52SA-1

The Problem: Concurrent Mapping and Localization

70 m

© Sebastian Thrun, CMU, 2000 53SA-1

The Problem: Concurrent Mapping and Localization

© Sebastian Thrun, CMU, 2000 54SA-1

On-Line Mapping with Rhino

© Sebastian Thrun, CMU, 2000 55SA-1

Concurrent Mapping and Localization

Is a chicken-and-egg problem• Mapping with known poses is “simple”• Localization with known map is “simple”• But in combination, the problem is hard!

Today’s best solutions are all probabilistic!

© Sebastian Thrun, CMU, 2000 56SA-1

Mapping: Outline

Posterior estimationwith known poses:Occupancy grids

Posterior estimationwith known poses:Occupancy grids

Maximum likelihood:ML*

Maximum likelihood:ML*

Maximum likelihood:EM

Maximum likelihood:EM

Posterior estimation:EKF (SLAM)

Posterior estimation:EKF (SLAM)

© Sebastian Thrun, CMU, 2000 57SA-1

Mapping as Posterior Estimation

1111 )(),|()|()( tttttttt dssbasspsopsb

1111111 ),(),,|,(),|(),( tttttttttttttt dmdsmsbamsmspmsopmsb

1 tt mmAssume static map

1111 ),(),,|(),|(),( tttttttt dsmsbmasspmsopmsb

1111 ),(),|(),|(),( tttttttt dsmsbasspmsopmsb

[Smith, Self, Cheeseman 90, Chatila et al 91, Durrant-Whyte et al 92-00, Leonard et al. 92-00]

© Sebastian Thrun, CMU, 2000 58SA-1

Kalman Filters

N-dimensional Gaussian

Can handle hundreds of dimensions

2

2

2

2

2

2

2

1

21

21

21

21

2222221

1111211

,),(

yxlll

yyxyylylyl

xxyxxlxlxl

lylxllllll

lylxllllll

lylxllllll

Nt

N

N

N

NNNNNN

N

N

y

x

l

l

l

msb

© Sebastian Thrun, CMU, 2000 59SA-1

Underwater Mapping

By: Louis L. Whitcomb, Johns Hopkins University

© Sebastian Thrun, CMU, 2000 60SA-1

Underwater Mapping - Example

“Autonomous Underwater Vehicle Navigation,” John Leonard et al, 1998

© Sebastian Thrun, CMU, 2000 61SA-1

Mapping with Extended Kalman Filters

Courtesy of [Leonard et al 1998]

© Sebastian Thrun, CMU, 2000 62SA-1

The Key Assumption Inverse sensor model p(st|ot,m) must be Gaussian.

Main problem: Data association

Posterior multi-modal

Undistinguishable features

In practice: • Extract small set of highly distinguishable features from sensor data• Discard all other data• If ambiguous, take best guess for landmark identity

Posterior uni-modal

Distinguishable features

© Sebastian Thrun, CMU, 2000 63SA-1

Mapping Algorithms - Comparison

SLAM

(Kalman)

Output Posterior

Convergence Strong

Local minima No

Real time Yes

Odom. Error Unbounded

Sensor Noise Gaussian

# Features 103

Feature uniq Yes

Raw data No

© Sebastian Thrun, CMU, 2000 64SA-1

Mapping: Outline

Posterior estimationwith known poses:Occupancy grids

Posterior estimationwith known poses:Occupancy grids

Maximum likelihood:ML*

Maximum likelihood:ML*

Maximum likelihood:EM

Maximum likelihood:EM

Posterior estimation:EKF (SLAM)

Posterior estimation:EKF (SLAM)

© Sebastian Thrun, CMU, 2000 65SA-1

Mapping with Expectation Maximization

Idea: maximum likelihood (with unknown data association)

t

tt

dsdsdsmasspmsopmpmb 211110

),,|(),|()()(

1111 ),(),|(),|(),( tttttttt dsmsbasspmsopmsb

EM: Maximize log-likelihood by iterating

]|[argmax ][]1[ k

m

k mmQm

)]|)|,,,([log]|[ 0][

00][

][ tk

ttm

k dmdsspEmmQ k E-step:

M-step:

[Dempster et al. 77]

Mapping with known poses

Markov localization (bi-directional)

[Thrun et al. 98]

© Sebastian Thrun, CMU, 2000 66SA-1

map(1)

© Sebastian Thrun, CMU, 2000 67SA-1

backward

forward

map(2)map(1)

© Sebastian Thrun, CMU, 2000 68SA-1

backward

forward

map(10)

© Sebastian Thrun, CMU, 2000 69SA-1

CMU’s Wean Hall (80 x 25 meters)

15 landmarks 16 landmarks

17 landmarks 27 landmarks

© Sebastian Thrun, CMU, 2000 70SA-1

EM Mapping, Example (width 45 m)

© Sebastian Thrun, CMU, 2000 71SA-1

Mapping Algorithms - Comparison

SLAM

(Kalman)

EM

Output Posterior ML/MAP

Convergence Strong Weak?

Local minima No Yes

Real time Yes No

Odom. Error Unbounded Unbounded

Sensor Noise Gaussian Any

# Features 103

Feature uniq Yes No

Raw data No Yes

© Sebastian Thrun, CMU, 2000 72SA-1

Mapping: Outline

Posterior estimationwith known poses:Occupancy grids

Posterior estimationwith known poses:Occupancy grids

Maximum likelihood:ML*

Maximum likelihood:ML*

Maximum likelihood:EM

Maximum likelihood:EM

Posterior estimation:EKF (SLAM)

Posterior estimation:EKF (SLAM)

© Sebastian Thrun, CMU, 2000 73SA-1

Incremental ML Mapping, Online

Idea: step-wise maximum likelihood

1111111 ),(),,|,(),|(),( tttttttttttttt dmdsmsbamsmspmsopmsb

),,|,(),|(argmax, 111,

ttttms

tt amsmspmsopms

Incremental ML estimate:

© Sebastian Thrun, CMU, 2000 74SA-1

Incremental ML: Not A Good Idea

path

robot

mismatch

© Sebastian Thrun, CMU, 2000 75SA-1

ML* Mapping, Online

Idea: step-wise maximum likelihood

111111 )(),,|(),|()( tttttttttt dssbmasspmsopsb

2. Posterior:

[Gutmann/Konolige 00, Thrun et al. 00]

1111111 ),(),,|,(),|(),( tttttttttttttt dmdsmsbamsmspmsopmsb

),,|,(),|(argmax, 111,

ttttms

tt amsmspmsopms

1. Incremental ML estimate:

© Sebastian Thrun, CMU, 2000 76SA-1

ML* Mapping, OnlineCourtesy of Kurt Konolige, SRI

[Gutmann & Konolige, 00]

© Sebastian Thrun, CMU, 2000 77SA-1

ML* Mapping, Online

Yellow flashes:

artificially distorted map (30 deg, 50 cm)

[Thrun et al. 00]

© Sebastian Thrun, CMU, 2000 78SA-1

Mapping withPoor Odometry

map andexploration path

raw data

DARPA Urban Robot

© Sebastian Thrun, CMU, 2000 79SA-1

Mapping Without(!) Odometry

mapraw data (no odometry)

© Sebastian Thrun, CMU, 2000 80SA-1

Localization in Multi-Robot Mapping

© Sebastian Thrun, CMU, 2000 81SA-1

Localization in Multi-Robot MappingCourtesy of Kurt Konolige, SRI

[Gutmann & Konolige, 00]

© Sebastian Thrun, CMU, 2000 82SA-1

3D Mapping

two laser range finders

© Sebastian Thrun, CMU, 2000 83SA-1

3D Structure Mapping (Real-Time)

© Sebastian Thrun, CMU, 2000 84SA-1

3D Texture Mapping

raw image sequencepanoramic camera

© Sebastian Thrun, CMU, 2000 85SA-1

3D Texture Mapping

© Sebastian Thrun, CMU, 2000 86SA-1

Mapping Algorithms - Comparison

SLAM

(Kalman)

EM ML*

Output Posterior ML/MAP ML/MAP

Convergence Strong Weak? No

Local minima No Yes Yes

Real time Yes No Yes

Odom. Error Unbounded Unbounded Unbounded

Sensor Noise Gaussian Any Any

# Features 103

Feature uniq Yes No No

Raw data No Yes Yes

© Sebastian Thrun, CMU, 2000 87SA-1

Mapping: Outline

Posterior estimationwith known poses:Occupancy grids

Posterior estimationwith known poses:Occupancy grids

Maximum likelihood:ML*

Maximum likelihood:ML*

Maximum likelihood:EM

Maximum likelihood:EM

Posterior estimation:EKF (SLAM)

Posterior estimation:EKF (SLAM)

© Sebastian Thrun, CMU, 2000 88SA-1

Occupancy Grids: From scans to maps

© Sebastian Thrun, CMU, 2000 89SA-1

Occupancy Grid Maps

1111 )(),|()|()( tttttttt dssbasspsopsb

Assumptions: poses known, occupancy binary, independenttss 0

[Elfes/Moravec 88]

][1

][11

][1

][][][ )(),|()|()( xyt

xytt

xyt

xyt

xytt

xyt dmmbammpmopmb

)()()|( ][1][][ xyxyt

xy mbmpomp

)()|()( ][][][ xyxyt

xy mbmopmb

][ xytm

][1

][ xyt

xyt mm Assume

© Sebastian Thrun, CMU, 2000 90SA-1

Example

CAD map occupancy grid map

The Tech Museum, San Jose

© Sebastian Thrun, CMU, 2000 91SA-1

Mapping Algorithms - Comparison

SLAM

(Kalman)

EM ML* Occupan. Grids

Output Posterior ML/MAP ML/MAP Posterior

Convergence Strong Weak? No Strong

Local minima No Yes Yes No

Real time Yes No Yes Yes

Odom. Error Unbounded Unbounded Unbounded None

Sensor Noise Gaussian Any Any Any

# Features 103

Feature uniq Yes No No No

Raw data No Yes Yes Yes

© Sebastian Thrun, CMU, 2000 92SA-1

Mapping: Lessons Learned

Concurrent mapping and localization: hard robotics problem

Best known algorithms are probabilistic1. EKF/SLAM: Full posterior estimation, but restrictive

assumptions (data association)

2. EM: Maximum Likelihood, solves data association

3. ML*: less robust but online

4. Occupancy grids: Binary Bayes filter, assumes known poses (= much easier)

© Sebastian Thrun, CMU, 2000 93SA-1

Tutorial Outline

Introduction Probabilistic State Estimation

• Localization• Mapping

Probabilistic Decision Making• Planning• Exploration

Conclusion

© Sebastian Thrun, CMU, 2000 94SA-1

The Decision Making Problem

Central Question: What should a robot do next?

Embraces • control (short term, tight feedback) • planning (longer term, looser feedback)

Probabilistic Paradigm: Considers uncertainty• current• future

© Sebastian Thrun, CMU, 2000 95SA-1

Planning under Uncertainty

Environment State Model

Classical Planning

deterministic observable Deterministic, accurate

MDP, universal plans

stochastic observable stochastic, accurate

POMDPs stochastic partially observable

stochastic, inaccurate

© Sebastian Thrun, CMU, 2000 96SA-1

Classical Situation

hellheaven

• World deterministic• State observable

© Sebastian Thrun, CMU, 2000 97SA-1

MDP-Style Planning

hellheaven

• World stochastic• State observable

[Koditschek 87, Barto et al. 89]

• Policy• Universal Plan• Navigation function

© Sebastian Thrun, CMU, 2000 98SA-1

Stochastic, Partially Observable

sign

hell?heaven?

[Sondik 72] [Littman/Cassandra/Kaelbling 97]

© Sebastian Thrun, CMU, 2000 99SA-1

Stochastic, Partially Observable

sign

hellheaven

sign

heavenhell

© Sebastian Thrun, CMU, 2000 100SA-1

Stochastic, Partially Observable

sign

heavenhell

sign

??

sign

hellheaven

start

50% 50%

© Sebastian Thrun, CMU, 2000 101SA-1

Outline

Deterministic, fully observable

Stochastic, fully observable, discrete states/actions (MDPs)

Stochastic, partially observable, discrete (POMDPs, Augmented MDPs)

Stochastic, partially observable, continuous (Monte Carlo POMDPs)

© Sebastian Thrun, CMU, 2000 102SA-1

Robot Planning FrameworksClassicalAI/robotplanning

State/actions discrete & continuous

State observable

Environment deterministic

Plans Sequences of actions

Completeness Yes

Optimality Rarely

State space size

Huge, often continuous, 6 dimensions

Computational Complexity

varies

© Sebastian Thrun, CMU, 2000 103SA-1

MDP-Style Planning

hellheaven

• World stochastic• State observable

[Koditschek 87, Barto et al. 89]

• Policy• Universal Plan• Navigation function

© Sebastian Thrun, CMU, 2000 104SA-1

Markov Decision Process (discrete)

s2

s3

s4s5

s1

0.7

0.3

0.90.1

0.3

0.3

0.4

0.99

0.1

0.2

0.8 r=10

r=0

r=0

r=1

r=0

[Bellman 57] [Howard 60] [Sutton/Barto 98]

© Sebastian Thrun, CMU, 2000 105SA-1

Value Iteration Value function of policy

Bellman equation for optimal value function

Value iteration: recursively estimating value function

Greedy policy:

)(,|)()( iitt

sasssrEsV

')'(),|'(max)()( dssVasspsrsVa

')'(),|'(argmax)( dssVasspsa

')'(),|'(max)()( dssVasspsrsVa

[Bellman 57] [Howard 60] [Sutton/Barto 98]

© Sebastian Thrun, CMU, 2000 106SA-1

Value Iteration for Motion Planning(assumes knowledge of robot’s location)

© Sebastian Thrun, CMU, 2000 107SA-1

Continuous Environments

From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

© Sebastian Thrun, CMU, 2000 108SA-1

Approximate Cell Decomposition [Latombe 91]

From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

© Sebastian Thrun, CMU, 2000 109SA-1

Parti-Game [Moore 96]

From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

© Sebastian Thrun, CMU, 2000 110SA-1

Robot Planning FrameworksClassicalAI/robotplanning

Value Iteration in

MDPs

Parti-Game

State/actions discrete & continuous

discrete continuous

State observable observable observable

Environment deterministic stochastic stochastic

Plans Sequences of actions

policy policy

Completeness Yes Yes Yes

Optimality Rarely Yes No

State space size

Huge, often continuous, 6 dimensions

millions n/a

Computational Complexity

varies quadratic n/a

© Sebastian Thrun, CMU, 2000 111SA-1

Stochastic, Partially Observable

sign

??

start

sign

heavenhell

sign

hellheaven

50% 50%

sign

??

start

© Sebastian Thrun, CMU, 2000 112SA-1

A Quiz

-dim continuous*stochastic1-dimcontinuous

stochastic

actions# states size belief space?sensors

3: s1, s2, s3deterministic3 perfect

3: s1, s2, s3stochastic3 perfect

23-1: s1, s2, s3, s12, s13, s23, s123deterministic3 abstract states

deterministic3 stochastic

2-dim continuous*: p(S=s1), p(S=s2)stochastic3 none

2-dim continuous*: p(S=s1), p(S=s2)

*) countable, but for all practical purposes

-dim continuous*deterministic1-dimcontinuous

stochastic

aargh!stochastic-dimcontinuous

stochastic

© Sebastian Thrun, CMU, 2000 113SA-1

Introduction to POMDPs

80100

ba

0

ba

40

s2s1

action a

action b

p(s1)

[Sondik 72, Littman, Kaelbling, Cassandra ‘97]

s2s1

100

0

100

action aaction b

Value function (finite horizon): Piecewise linear, convex Most efficient algorithm today: Witness algorithm

© Sebastian Thrun, CMU, 2000 114SA-1

Value Iteration in POMDPs Value function of policy

Bellman equation for optimal value function

Value iteration: recursively estimating value function

Greedy policy:

)(,|)()( iitt

babbbrEbV

')'(),|'(max)()( dbbVabbpbrbVa

')'(),|'(argmax)( dbbVabbpba

')'(),|'(max)()( dbbVabbpbrbVa

Substitute b for s

© Sebastian Thrun, CMU, 2000 115SA-1

Missing Terms: Belief Space

Expected reward:

Next state density:

dssbsrbr )()()(

')(),|'()'|'(),|'( dsdssbasspsopabop

'),|'(),,'|'(),|'( doabopabobpabbp

Bayes filters!(Dirac distribution)

© Sebastian Thrun, CMU, 2000 116SA-1

Value Iteration in Belief Space

. ...

next belief state b’

observation o

. ...

belief state b

max Q(b’, a)

next state s’, reward r’state s

Q(b, a)value function

© Sebastian Thrun, CMU, 2000 117SA-1

Why is This So Complex?

State Space Planning(no state uncertainty)

Belief Space Planning(full state uncertainties)

?

© Sebastian Thrun, CMU, 2000 118SA-1

Augmented MDPs:

s

sHsbb ][);(argmax

[Roy et al, 98/99]

conventional state space

uncertainty (entropy)

© Sebastian Thrun, CMU, 2000 119SA-1

Path Planning with Augmented MDPs

information gainConventional planner Probabilistic Planner

[Roy et al, 98/99]

© Sebastian Thrun, CMU, 2000 120SA-1

Robot Planning FrameworksClassicalAI/robotplanning

Value Iteration in

MDPs

Parti-Game POMDP Augmented MDP

State/actions discrete & continuous

discrete continuous discrete discrete

State observable observable observable partially observable

partially observable

Environment deterministic stochastic stochastic stochastic stochastic

Plans Sequences of actions

policy policy policy policy

Completeness Yes Yes Yes Yes No

Optimality Rarely Yes No Yes No

State space size

Huge, often continuous, 6 dimensions

millions n/a dozens thousands

Computational Complexity

varies quadratic n/a exponential O(N4)

© Sebastian Thrun, CMU, 2000 121SA-1

Decision Making: Lessons Learned

Four sources of uncertainty• Environment unpredictable• Robot wear and tear• Sensors limitations• Models inaccurate

Two implications• Need policy instead of simple (open-loop) plan• Policy must be conditioned on belief state

Approaches• MDP: Only works with perfect sensors, models• POMDPs: general framework, but scaling limitations• Augmented MDPs: lower computation, but approximate

© Sebastian Thrun, CMU, 2000 122SA-1

Tutorial Outline

Introduction Probabilistic State Estimation

• Localization• Mapping

Probabilistic Decision Making• Planning• Exploration

Conclusion

© Sebastian Thrun, CMU, 2000 123SA-1

Exploration: Maximize Knowledge Gain

Pick action a that maximizes knowledge gain. Constant time actions:

Variable time actions:

[Thrun 93] [Yamauchi 96] [Burgard et al 00] + many others

dodssbasspsopomH )(),|'()'|(] with [entropy of map

max]|[][ amHmH

max)(timeexpected

]|[][

a

amHmH

© Sebastian Thrun, CMU, 2000 124SA-1

Practical Implementation

For each location <x,y>• estimate number of cells robot can sense• estimate costs of getting there (value iteration)

[Simmons et al 00]

© Sebastian Thrun, CMU, 2000 125SA-1

Real-Time Exploration

© Sebastian Thrun, CMU, 2000 126SA-1

Coordinated Multi-Robot Exploration

Robots place “bids” for target areas Greedy assignment of robots to areas Exploration strategies and assignments

continuously re-evaluated while robots in motion

[Burgard et al 00] [Simmons et al 00]

© Sebastian Thrun, CMU, 2000 127SA-1

Collaborative Exploration and Mapping

© Sebastian Thrun, CMU, 2000 128SA-1

San Antonio Results

© Sebastian Thrun, CMU, 2000 129SA-1

Benefit of Cooperation

[Burgard et al 00]

© Sebastian Thrun, CMU, 2000 130SA-1

Exploration: Lessons Learned

Exploration = greedily maximize knowledge gain Greedy methods can be very effective Facilitates multi-robot coordination

© Sebastian Thrun, CMU, 2000 131SA-1

Tutorial Outline

Introduction Probabilistic State Estimation

• Localization• Mapping

Probabilistic Decision Making• Planning• Exploration

Conclusion

© Sebastian Thrun, CMU, 2000 132SA-1

Problem Summary

In Robotics, there is no such thing as• A perfect sensor• A deterministic environment• A deterministic robot• An accurate model

Therefore: Uncertainty inherent in robotics

© Sebastian Thrun, CMU, 2000 133SA-1

Key Idea

Probabilistic Robotics: Represents and reasons with uncertainty, represented explicitly

• Perception = posterior estimation• Action = optimization of expected utility

© Sebastian Thrun, CMU, 2000 134SA-1

Examples Covered Today

Localization Mapping Planning Exploration Multi-robot

© Sebastian Thrun, CMU, 2000 135SA-1

Successful Applications of Probabilistic Robotics

Industrial outdoor navigation [Durrant-Whyte, 95] Underwater vehicles [Leonard et al, 98] Coal Mining [Singh 98] Missile Guidance Indoor navigation [Simmons et al, 97] Robo-Soccer [Lenser et al, 00] Museum Tour-Guides [Burgard et al, 98, Thrun 99] + many others

© Sebastian Thrun, CMU, 2000 136SA-1

Relation to AI

Probabilistic methods highly successful in a range of sub-fields of AI

• Speech recognition• Language processing• Expert systems• Computer vision• Data Mining

• (and many others)

© Sebastian Thrun, CMU, 2000 137SA-1

Open Research Issues

Better representations, faster algorithms Learning with domain knowledge (eg, models,

behaviors) High-level reasoning and robot programming

using probabilistic paradigm Theory: eg, surpassing the Markov assumption Frameworks for probabilistic programming Innovative applications

top related