exploration and other applications of reinforcement...

48
Exploration and Other Applications of Reinforcement Learning in Robotics AS-84.4340 Postgraduate Seminar in Automation Technology Juhana Ahtiainen

Upload: nguyenkhanh

Post on 01-Feb-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Exploration and Other Applications of Reinforcement Learning in Robotics

AS-84.4340Postgraduate Seminar in Automation Technology

Juhana Ahtiainen

Page 2: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

OutlineIntroductionExploration

Information gainMonte Carlo algorithmActive LocalizationMapping in Occupancy GridMulti robots extensionExploration for SLAM

Other Applications of Reinforcement LearningRecent advancesRoboCupHumanoid robots

SummaryExercise

Page 3: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

IntroductionReinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environmentso as to maximize some notion of long-term rewardExploration is the problem of controlling a robot so as to maximize its knowledge about the external world

Page 4: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Introduction The environment is usually modelled as a finite-state Markov Decision ProcessReinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those statesNever correct input-output pairs

Page 5: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

ExplorationExploration problem is paramount in robotics

Abandoned mines, nuclear disasters, Mars...Exploration problem comes in many forms

Acquire a map in a static environmentKnown pose

Moving factors (dynamic environment)E.g. pursuit evasion problem

Active localizationKnown map

SLAMVirtually anywhere in robotics

Page 6: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

POMDP and explorationFully subsumed by the POMDP framework?-POMDP in to an algorithm whose sole goal is to

maximize informationpayoff function = e.g information gain

Exploring using POMDP is often not a good idea

Number of unknown variables is hugeAlso the number of possible observations

Page 7: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Ch 17: Practical algorithmsFor high-dimensional exploration problemAll are geedy (look-ahead is limited to only one exploration action)Exploration action can involve a sequence of control actions

e.g select a location anywhere in the map moving there is considered a single exploration action

Page 8: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Information gainKey to exploration is informationEntropy Hp(x) of a probability distribution p is the expected information E[-log p]

Entropy is at its maximum when p is a uniform distribution and in its minimum when p is point-mass distributionIn exploration we seek to minimize the expected entropy of the belief after executing an action

Page 9: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Information gainConditional entropy of the state x’ after executing action u and measuring z:

Information gain associated with action u in belief b is given by the difference:

Conditional entropy with measurement integrated out:

Page 10: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Greedy techniquesExpected information lets us formulate the exploration problem as a decision theoretic problem addressed in the previous presentationsOptimal exploration maximizes the difference between the information gain and the costs

Page 11: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Greedy techniques

Utility of u compute expected entropy after executing u and observing

Previous equation resolves to:

Page 12: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Exploration techniquesMost of them are greedy

Optimal at time horizon 1Enormous branching factor in explorationGoal is to acquire new information

New belief stateAdjust policy

Exploration policies have to be highly reactive

Page 13: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Monte Carlo explorationSamples state x from

momentary belief b

Samples also the next state x’ and corresponding measurement z

New posterior belief

Entropy-cost trade-off

Action with higest MC information gain-cost value

Page 14: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Monte Carlo ExplorationMay still be very time consuming

Number of possible measurements can be huge

e.g. Robot with 24 ultrasonic sensort that report one byte of range data

25624 possible sonar scans in specific location

Page 15: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Active localizationSimplest case of exploration when estimating the state of a low-dimensional variableHere we seek information about robots pose but have a map of the environmentMoving to right place can make localization very fast

Page 16: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Active localization

Can be solved greedely but we need to define exploration actions differently

e.g. target locations in robots coordinate frameThis is ok if we can devise a low level module to map that action back into low-level controls

Page 17: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 18: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 19: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 20: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 21: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 22: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 23: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Analysis of active localizationGreedy

Cannot compose multiple exploration actionsAction definition

Open loop control while moving no measurementsReal robot can abandon target point (closed door)

Not considered during planning

Performs well in practise

Page 24: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Learning occupancy grid mapsMapping problems include many more unknown variablesWe treat the information gain as independent between different grid cellsHow to compute gain

EntropyExpected information gainBinary gain (frontier based exploration)

Page 25: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Calculating the information gainEntropy

StraightforwardThe brighter the larger

Expected information gainEntropy only measures current informationRequires assumptions on the nature of information

Page 26: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Calculating the information gainBinary Gain

Simplest of allBy far the most popularVery crude approximation of the expected information

Tends to work well in practiceCore of Frontier-based exploration

Page 27: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Propagating gainDefinition of an exploration action

Simple but effective move to x-y location along minimum cost path, and then sense all the grid in a small circular diameter around the robotValue iteration the best greedy exploration action

Page 28: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Learning OG maps -

summaryCrude approximationTotally ignores the information acquired as the robot movesTends to work well in practice

Page 29: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Extension to Multi-Robot SystemsAcquire a map through cooperative explorationThe speed up is usually linear, might be even 2K

Single robot might have to traverse many areas twice

CoordinationStatic greedy task allocation techniques

Page 30: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Value function for each robot

(minimum at the robots

pose)

Reset the gain map to zero in the vincity of the chosen cell

Optimal cell to explore for each robot

Page 31: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Summary of multi robot explorationSimple...Each robot greedely picks a best available goal and prohibits other from picking the same cell. Easily trapped in a minimum

Crossing pathsImproved coordination tehniques enable robots to trade goals

Page 32: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example

Page 33: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

SLAM in explorationIn SLAM we do not know the map nor the poseWithout knowledge about the pose the integration of sensor information can lead to serious errorsRobot that only focuses on pose does not move

Entropy decomposition!

Page 34: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Entropy decompositionFull SLAM posterior:

This implies:

The expectation is taken overSLAM entropy is the sum of the path entropy and the expected entropy of the map

Page 35: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Derivation of decomposition

Page 36: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Exploration in fastSLAMBased on grid-based fastSLAM (Ch 13.10)

Posterior by set of particlesEach particle contains a robot pathAlso occupancy grid map

FastSLAM exploration algorithm is a test-and-evaluate algorithm

Proposes a course of action for explorationEvaluates these actions by measuring the residual entropySelects action that minimize the resulting entropy

Page 37: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

FastSLAM SummaryFastSLAM Exploration algorithm is an extension to Monte Carlo exploration algorithm with two insights

Applies to the full sequence of controlsTwo types of entropies!

One pertaining to the robots pathOne to the map

Page 38: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example of SLAM

Page 39: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example of SLAM

Page 40: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Example of SLAM

Page 41: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Reinforcement Learning in RoboticsReinforcement learning offers one of the most general frame-work to take traditional robotics towards true autonomy and versatilityIn many well-defined, low dimensional, discrete problems

Backgammon (Tesauro 1994)Elevator control (Crites & Barto 1996)Helicopter control (Bagnell & Schneider 2001)

Page 42: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Reinforcement Learning in RoboticsGoogle Scholar:

Results 1 - 10 of about 19,700 for Reinforcementlearning in robotics. Recent articles (since 2003)

4,340

Page 43: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Recent advances (2003)Curse of dimensionalityHierarchical reinforcement learning

Temporal abstractionsDecisions not required at each step

Semi-Markov Decision ProcessesGeneralization of MDPTime between one decision and another is a random variable, real- or integer-valued

allows the decision maker to choose actions whenever the system state changesmodels the system evolution in continuous timeallows the time spent in a particular state to follow an arbitrary probability distribution

Page 44: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Reinforcement Learning in RoboCup (http://www.robocup.org/)

Keepaway = keepers vs. takersMax 4 vs 3.

Large state space SMDP

In RoboCup alsoKick ball in to goal while avoiding an opponentFull team (11) learn collaborative passing and shooting (MC)Learn low level skills (drippling, passing, kicking)

Page 45: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

RL for Humanoid robotsApplying RL to high dimensional movement systems like humanoid robots remains an unsolved problemGreedy algorithms are likely to fail

Natural Actor-Critic (Peters et al. 2005)Efficiently optimize nonlinear motor primitivesBased on natural gradient formulation

Page 46: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

SummaryReinforcement learning has been applied succesfully on many areas of robotics

High dimensions are still a problemExploration is one application of RL

Maximize the knowledge gained by the robotActive localization – seeking pose, map knownMapping – pose known at all timesSLAM – decomposition of entropy, map and pose unknown

Page 47: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

References1.

S.

Thrun, W.

Burgard, and D.

Fox. Probabilistic Robotics. MIT Press, Cambridge, MA, 20052.

Wikipedia, http://en.wikipedia.org/wiki/Reinforcement_learning (Sutton, Richard S., and Barto, Andrew G. (1998) Reinforcement Learning: An Introduction MIT Press.

)3.

Barto, A. G. and Mahadevan, S. (2003) Recent Advances in Hierarchical Reinforcement Learning Discrete Event Dynamic Systems

vol. 13(4), pages 341 -

379 4.

Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement Learning for RoboCup- Soccer Keepaway. Adaptive Behavior, 13(3):165–188, 2005

5.

Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30.

6.

Maja

J Matarić, Reinforcement Learning in the Multi-Robot Domain, Autonomous Robots, 4(1), Mar 1997, 73-83

7.

William D. Smart and Leslie Pack Kaelbling, Effective Reinforcement Learning for Mobile Robots, International Conference on Robotics and Automation, May 11-15, 2002

8.

Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2), 215–219.

9.

Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems

(Vol. 8, pp. 1017–1023). Cambridge, MA: The MIT Press.10

.

Bagnell, J. A., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In International Conference on Robotics and Automation

(pp. 1615–1620). IEEE.

11

.

Jan Peters, Sethu

Vijayakumar, Stefan Schaal

(2005), Natural Actor-Critic, in the Proceedings of the 16th European Conference on Machine Learning (ECML 2005).

Page 48: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics

Exercise: