machine learning and robotics
TRANSCRIPT
MACHINE LEARNING AND ROBOTICSLisa Lyons
10/22/08
OUTLINE
Machine Learning Basics and Terminology An Example: DARPA Grand/Urban Challenge Multi-Agent Systems Netflix Challenge (if time permits)
INTRODUCTION Machine learning is
commonly associated with robotics
When some think of robots, they think of machines like WALL-E (right) – human-looking, has feelings, capable of complex tasks
Goals for machine learning in robotics aren’t usually this advanced, but some think we’re getting there
Next three slides outline some goals that motivate researchers to continue work in this area
HOUSEHOLD ROBOT TO ASSIST HANDICAPPED
Could come preprogrammed with general procedures and behaviors
Needs to be able to learn to recognize objects and obstacles and maybe even its owner (face recognition?)
Also needs to be able to manipulate objects without breaking them
May not always have all information about its environment (poor lighting, obscured objects)
FLEXIBLE MANUFACTURING ROBOT
Configurable robot that could manufacture multiple items
Must learn to manipulate new types of parts without damaging them
LEARNING SPOKEN DIALOG SYSTEM FOR REPAIRS
Given some initial information about a system, a robot could converse with a human and help to repair it
Speech understanding is a very hard problem in itself
MACHINE LEARNING BASICS AND TERMINOLOGYWith applications and examples in robotics
LEARNING ASSOCIATIONS
Association Rule – probability that an event will happen given another event already has (P(Y|X))
CLASSIFICATION
Classification – model where input is assigned to a class based on some data
Prediction – assuming a future scenario is similar to a past one, using past data to decide what this scenario would look like
Pattern Recognition – a method used to make predictions Face Recognition Speech Recognition
Knowledge Extraction – learning a rule from data
Outlier Detection – finding exceptions to the rules
REGRESSION
Linear regression is an example Both Classification and
Regression are “Supervised Learning” strategies where the goal is to find a mapping from input to output
Example: Navigation of autonomous car Training Data: actions of human
drivers in various situations Input: data from sensors (like GPS
or video) Output: angle to turn steering
wheel
UNSUPERVISED LEARNING
Only have input Want to find regularities in the input Density Estimation: finding patterns in the
input space Clustering: find groupings in the input
REINFORCEMENT LEARNING Policy: generating
correct actions to reach the goal
Learn from past good policies
Example: robot navigating unknown environment in search of a goal Some data may be
missing May be multiple
agents in the system
POSSIBLE APPLICATIONS
Exploring a world Learning object
properties Learning to interact
with the world and with objects
Optimizing actions Recognizing states
in world model
Monitoring actions to ensure correctness
Recognizing and repairing errors
Planning Learning action
rules Deciding actions
based on tasks
WHAT WE EXPECT ROBOTS TO DO
Be able to react promptly and correctly to changes in environment or internal state
Work in situations where information about the environment is imperfect or incomplete
Learn through their experience and human guidance
Respond quickly to human interaction Unfortunately, these are very high
expectations which don’t always correlate very well with machine learning techniques
DIFFERENCES BETWEEN OTHER TYPES OF MACHINE LEARNING AND ROBOTICS
Planning can frequently be done offline
Actions usually deterministic
No major time constraints
Often require simultaneous planning and execution (online)
Actions could be nondeterministic depending on data (or lack thereof)
Real-time often required
Other ML Applications Robotics
AN EXAMPLE: DARPA GRAND/URBAN CHALLENGE
THE CHALLENGE
Defense Advanced Research Projects Agency (DARPA)
Goal: to build a vehicle capable of traversing unrehearsed off-road terrain
Started in 2003 142 mile course through Mojave No one made it through more than 5% of the
course in 2004 race In 2005, 195 teams registered, 23 teams
raced, 5 teams finished
THE RULES
Must traverse a desert course up to 175 miles long in under 10 h
Course kept secret until 2h before the race Must follow speed limits for specific areas of
the course to protect infrastructure and ecology
If a faster vehicle needs to overtake a slower one, the slower one is paused so that vehicles don’t have to handle dynamic passing
Teams given data on the course 2h before race so that no global path planning was required
A DARPA GRAND CHALLENGE VEHICLE CRASHING
A DARPA GRAND CHALLENGE VEHICLE THAT DID NOT CRASH
…namely Stanley, the winner of the 2005 challenge
TERRAIN MAPPING AND OBSTACLE DETECTION Data from 5 laser scanners mounted on top
of the car is used to generate a point cloud of what’s in front of the car
Classification problem Drivable Occupied Unknown
Area in front of vehicle as grid Stanley’s system finds the probability that ∆h
> δ where ∆h is the observed height of the terrain in a certain cell
If this probability is higher than some threshold α, the system defines the cell as occupied
(CONT.)
A discriminative learning algorithm is used to tune the parameters
Data is taken as a human driver drives through a mapped terrain avoiding obstacles (supervised learning)
Algorithm uses coordinate ascent to determine δ and α
COMPUTER VISION ASPECT
Lasers only make it safe for car to drive < 25 mph
Needs to go faster to satisfy time constraint Color camera is used for long-range obstacle
detection Still the same classification problem Now there are more factors to consider –
lighting, material, dust on lens Stanley takes adaptive approach
VISION ALGORITHM
1. Take out the sky2. Map a quadrilateral on camera video
corresponding with laser sensor boundaries3. As long as this region is deemed drivable, use
the pixels in the quad as a training set for the concept of drivable surface
4. Maintain Gaussians that model the color of drivable terrain
5. Adapt by adjusting previous Gaussians and/or throwing them out and adding new ones
Adjustment allows for slow adjustment to lighting conditions
Replacement allows for rapid change in color of the road
6. Label regions as drivable if their pixel values are near one or more of the Gaussians and they are connected to laser quadrilateral
ROAD BOUNDARIES
Best way to avoid obstacles on a desert road is to find road boundaries and drive down the middle
Uses low-pass one-dimensional Kalman Filters to determine road boundary on both sides of vehicle
Small obstacles don’t really affect the boundary found
Large obstacles over time have a stronger effect
SLOPE AND RUGGEDNESS
If terrain becomes too rugged or steep, vehicle must slow down to maintain control
Slope is found from vehicle’s pitch estimate Ruggedness is determined by taking data
from vehicle’s z accelerometer with gravity and vehicle vibration filtered out
PATH PLANNING
No global planning necessary Coordinate system used is base trajectory +
lateral offset Base trajectory is smoothed version of
driving corridor on the map given to contestants before the race
PATH SMOOTHING
Base trajectory computed in 4 steps:1. Points are added to the map in proportion to
local curvature2. Least-squares optimization is used to adjust
trajectories for smoothing3. Cubic spline interpolation is used to find a path
that can be resampled efficiently4. Calculate the speed limit
ONLINE PATH PLANNING
Determines the actual trajectory of vehicle during race
Search algorithm that minimizes a linear combination of continuous cost functions
Subject to dynamic and kinematic constraints Max lateral acceleration Max steering angle Max steering rate Max acceleration
Penalize hitting obstacles, leaving corridor, leaving center of road
MULTI-AGENT SYSTEMS
RECURSIVE MODELING METHOD (RMM)
Agents model the belief states of other agents
Beyesian methods implemented Useful in homogeneous non-communicating
Multi-Agent Systems (MAS) Has to be cut off at some point (don’t want a
situations where agent A thinks that agent B thinks that agent A thinks that…)
Agents can affect other agents by affecting the environment to produce a desired reaction
HETEROGENEOUS NON-COMMUNICATING MAS
Competitive and cooperative learning possible
Competitive learning more difficult because agents may end up in “arms race”
Credit-assignment problem Can’t tell if agent benefitted because it’s actions
were good or if opponent’s actions were bad Experts and observers have proven useful Different agents may be given different roles
to reach the goal Supervised learning to “teach” each agent how
to do its part
COMMUNICATION
Allowing agents to communicate can lead to deeper levels of planning since agents know (or think they know) the beliefs of others
Could allow one agent to “train” another to follow it’s actions using reinforcement learning
Negotiations Commitment Autonomous robots could understand their
position in an environment by querying other robots for their believed positions and making a guess based on that (Markov localization, SLAM)
NETFLIX CHALLENGE(if time permits)
REFERENCES Alpaydin, E. Introduction to Machine Learning.
Cambridge, Mass. : MIT Press, 2004. Kreuziger, J. “Application of Machine Learning to
Robotics – An Analysis.” In Proceedings of the Second International Conference on Automation, Robotics, and Computer Vision (ICARCV '92). 1992.
Mitchell et. al. “Machine Learning.” Annu. Rev. Coput. Sci. 1990. 4:417-33.
Stone, P and Veloso, M. “Multiagent Systems: A Survey from a Machine Learning Perspective.” Autonomous Robots 8, 345-383, 2000.
Thrun et. al. “Stanley: The Robot that Won the DARPA Grand Challenge.” Journal of Field Robotics 23(9), 661-692, 2006.