darpa mobile autonomous robot softwaremay 2000 1 adaptive intelligent mobile robotics william d....
TRANSCRIPT
![Page 1: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/1.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 1
Adaptive Intelligent Mobile Robotics
William D. Smart, Presenter
Leslie Pack Kaelbling, PI
Artificial Intelligence Laboratory
MIT
![Page 2: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/2.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 2
Progress to Date
• Fast bootstrapped reinforcement learning• algorithmic techniques• demo on robot
• Optical-flow based navigation• flow algorithm implemented• pilot navigation experiments on robot• pilot navigation experiments in simulation
testbed
![Page 3: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/3.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 3
Making RL Really Work
Typical RL methods require far too much data to be practical in an online setting. Address the problem by
• strong generalization techniques• using human input to bootstrap
Let humans do what they’re good at
Let learning algorithms do what they’re good at
![Page 4: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/4.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 4
JAQL
Learning a value function in a continuous state and action space
• based on locally weighted regression (fancy version of nearest neighbor)
• algorithm knows what it knows• use meta-knowledge to be conservative about
dynamic-programming updates
![Page 5: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/5.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 5
Problems with Q-Learning on Robots
• Huge state spaces/sparse data• Continuous states and actions• Slow to propagate values• Safety during exploration• Lack of initial knowledge
![Page 6: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/6.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 6
Value Function Approximation
Use a function approximator instead of a table• generalization• deals with continuous spaces and actions
• Q-learning with VFA has been shown to diverge, even in benign cases
Which function approximator should we use to minimize problems?
Q(s,a)s
aF
![Page 7: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/7.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 7
Locally Weighted Regression
• Store all previous data points• Given a query point, find k nearest points• Fit a locally linear model to these points, giving
closer ones more weight• Use KD-trees to make lookups more efficient
• Fast learning from a single data point
![Page 8: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/8.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 8
Locally Weighted Regression
Original function
![Page 9: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/9.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 9
Locally Weighted Regression
Bandwidth = 0.1, 500 training points
![Page 10: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/10.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 10
Problems with ApproximateQ-Learning
Errors are amplified by backups
),(),(),( 1 ttnextttttt asQQrasQasQ
),(max 1 asQQ tanext
![Page 11: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/11.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 11
One Source of Errors
![Page 12: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/12.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 12
Independent Variable Hull
Interpolation is safe; extrapolation is not, so
• construct hull around known points
• do local regression if the query point is within the hull
• give a default prediction if not
![Page 13: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/13.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 13
Recap
Use LWR to represent the value function• generalization• continuous spaces
Use IVH and “don’t know”• conservative predictions• safer backups
![Page 14: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/14.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 14
Incorporating Human Input
Humans can help a lot, even if they can’t perform the task very well.
• Provide some initial successful trajectories through the space
• Trajectories are not used for supervised learning, but to guide the reinforcement-learning methods through useful parts of the space
• Learn models of the dynamics of the world and of the reward structure
• Once learned models are good, use them to update the value function and policy as well.
![Page 15: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/15.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 15
Give Some Trajectories
Supply an example policy• Need not be optimal and might be very wrong• Code or human-controlled
Used to generate experience• Follow example policy and record experiences• Shows learner “interesting” parts of the space• “Bad” initial policies might be better
![Page 16: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/16.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 16
Two Learning Phases
LearningSystem
SuppliedControlPolicy
Environment
Phase One
AR O
![Page 17: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/17.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 17
Two Learning Phases
LearningSystem
SuppliedControlPolicy
Environment
Phase Two
AR O
![Page 18: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/18.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 18
What does this Give Us?
• Natural way to insert human knowledge• Keeps robot safe in early stages of learning• Bootstraps information into the Q-function
![Page 19: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/19.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 19
Experimental Results:Corridor-Following
![Page 20: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/20.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 20
Corridor-Following
3 continuous state dimensions• corridor angle• offset from middle• distance to end of corridor
1 continuous action dimension• rotation velocity
Supplied example policy• Average 110 steps to goal
![Page 21: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/21.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 21
Corridor-Following
Experimental setup• Initial training runs start from roughly the middle of
the corridor• Translation speed has a fixed policy• Evaluation on a number of set starting points• Reward
• 10 at end of corridor• 0 everywhere else
![Page 22: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/22.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 22
Corridor-Following
Average steps to goal
65
85
105
125
-25 -15 -5 5 15 25
Training runs
Ste
ps
to g
oal
“Best” possible
Average training
Phase 1 Phase 2
![Page 23: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/23.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 23
Corridor Following: Initial Policy
![Page 24: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/24.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 24
Corridor Following: After Phase 1
![Page 25: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/25.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 25
Corridor Following: After Phase 1
![Page 26: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/26.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 26
Corridor Following: After Phase 2
![Page 27: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/27.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 27
Conclusions
VFA can be made more stable• Locally weighted regression• Independent variable hull• Conservative backups
Bootstrapping value function really helps• Initial supplied trajectories• Two learning phases
![Page 28: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/28.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 28
Optical Flow
Get range information visually by computing optical flow field
• nearer objects cause flow of higher magnitude• expansion pattern means you’re going to hit• rate of expansion tells you when• elegant control laws based on center and rate of
expansion (derived from human and fly behavior)
![Page 29: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/29.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 29
Approaching a Wall
![Page 30: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/30.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 30
Balance Strategy
Simple obstacle-avoidance strategy• compute flow field• compute average magnitude of flow in each hemi-
field• turn away from the side with higher magnitude
(because it has closer objects)
![Page 31: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/31.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 31
Balance Strategy in Action
![Page 32: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/32.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 32
Crystal Space
![Page 33: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/33.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 33
Crystal Space
![Page 34: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/34.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 34
Crystal Space
![Page 35: DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649ceb5503460f949b63f9/html5/thumbnails/35.jpg)
DARPA Mobile Autonomous Robot SoftwareMay 2000 35
Next Steps
• Extend RL architecture to include model-learning and planning
• Apply RL techniques to tune parameters in optical-flow
• Build topological maps using visual information• Build highly complex simulated environment• Integrate planning and learning in multi-layer
system