a quick look at the “reinforcement learning”...

Post on 12-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MVA-RL Course

A Quick Look at the“Reinforcement Learning” course

A. LAZARIC (SequeL Team @INRIA-Lille)ENS Cachan - Master 2 MVA

SequeL – INRIA Lille

Why

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 2/16

Why: Important Problems

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 3/16

Why: Important Problems

I Autonomous robotics

I Elder careI Exploration of

unknown/dangerousenvironments

I Robotics for entertainment

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 4/16

Why: Important Problems

I Autonomous roboticsI Elder care

I Exploration ofunknown/dangerousenvironments

I Robotics for entertainment

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 4/16

Why: Important Problems

I Autonomous roboticsI Elder careI Exploration of

unknown/dangerousenvironments

I Robotics for entertainment

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 4/16

Why: Important Problems

I Autonomous roboticsI Elder careI Exploration of

unknown/dangerousenvironments

I Robotics for entertainment

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 4/16

Why: Important Problems

I Autonomous roboticsI Financial applications

I Trading execution algorithmsI Portfolio managementI Option pricing

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 5/16

Why: Important Problems

I Autonomous roboticsI Financial applications

I Trading execution algorithms

I Portfolio managementI Option pricing

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 5/16

Why: Important Problems

I Autonomous roboticsI Financial applications

I Trading execution algorithmsI Portfolio management

I Option pricing

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 5/16

Why: Important Problems

I Autonomous roboticsI Financial applications

I Trading execution algorithmsI Portfolio managementI Option pricing

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 5/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integrationI Maintenance schedulingI Energy market regulationI Energy production

management

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 6/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integration

I Maintenance schedulingI Energy market regulationI Energy production

management

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 6/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integrationI Maintenance scheduling

I Energy market regulationI Energy production

management

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 6/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integrationI Maintenance schedulingI Energy market regulation

I Energy productionmanagement

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 6/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integrationI Maintenance schedulingI Energy market regulationI Energy production

management

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 6/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systems

I Web advertisingI Product recommendationI Date matching

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 7/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systems

I Web advertising

I Product recommendationI Date matching

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 7/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systems

I Web advertisingI Product recommendation

I Date matching

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 7/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systems

I Web advertisingI Product recommendationI Date matching

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 7/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications

I Bike sharing optimizationI Election campaignI ER service optimizationI Intelligent Tutoring Systems

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 8/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications I Bike sharing optimization

I Election campaignI ER service optimizationI Intelligent Tutoring Systems

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 8/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications I Bike sharing optimization

I Election campaign

I ER service optimizationI Intelligent Tutoring Systems

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 8/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications I Bike sharing optimization

I Election campaignI ER service optimization

I Intelligent Tutoring Systems

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 8/16

Why: Important Problems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications I Bike sharing optimization

I Election campaignI ER service optimizationI Intelligent Tutoring Systems

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 8/16

What

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 9/16

What: Decision-Making under Uncertainty

Agent

Environment

state /actuationaction /

perception

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 10/16

How: Reinforcement Learning

Reinforcement learning is learning what to do – how tomap situations to actions – so as to maximize a

numerical reward signal in an unknown uncertainenvironment. The learner is not told which actions to

take, as in most forms of machine learning, but she mustdiscover which actions yield the most reward by tryingthem (trial–and–error). In the most interesting and

challenging cases, actions may affect not only theimmediate reward but also the next situation and,

through that, all subsequent rewards (delayed reward).

“An introduction to reinforcement learning”,Sutton and Barto (1998).

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 11/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

How: the Course

Agent

Environment

state /actuationaction /

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 12/16

What: the Highlights of the CourseHow to model an RL problem

I What: Markov decision processI Tools: probability, processes, Markov chain

How to solve exactly an RL problem

How to solve incrementally an RL problem

How to efficiently explore in an RL problem

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 13/16

What: the Highlights of the CourseHow to model an RL problem

How to solve exactly an RL problemI What: Dynamic programmingI Tools: fixed point, operators

How to solve incrementally an RL problem

How to efficiently explore in an RL problem

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 13/16

What: the Highlights of the CourseHow to model an RL problem

How to solve exactly an RL problem

How to solve incrementally an RL problemI What: temporal difference, Q-learningI Tools: stochastic approximation

How to efficiently explore in an RL problem

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 13/16

What: the Highlights of the CourseHow to model an RL problem

How to solve exactly an RL problem

How to solve incrementally an RL problem

How to efficiently explore in an RL problemI What: multi-armed bandit problemI Tools: concentration inequalities

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 13/16

What: the Highlights of the CourseHow to model an RL problem

How to solve exactly an RL problem

How to solve incrementally an RL problem

How to efficiently explore in an RL problem

How to solve approximately an RL problemI What: approximate dynamic programmingI Tools: statistical learning theory

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 13/16

What: the Highlights of the Course

How to model an RL problem

How to solve exactly an RL problem

How to solve incrementally an RL problem

How to efficiently explore in an RL problem

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 13/16

When/What/Where

I 7 lectures

I 4 practical sessions (and homework) [1 point each]

I 1 final project (report and oral presentation) [16 points]

Opportunities for spring internship and Ph.D. positions.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 14/16

When/What/Whereresearchers.lille.inria.fr/˜lazaric/Webpage/Teaching.html

Date Topic Classroom29/09 Intro/MDP Conference06/10 Dynamic Programming Condorcet13/10 RL Algorithms Condorcet20/10 TP on DP and RL Condorcet27/10 Multi-arm Bandit (1) Condorcet03/11 TP on Bandit Amphi Curie10/11 Multi-arm Bandit (2) [projects] Amphi Curie17/11 TP on Bandit Condorcet24/11 Approximate DP Condorcet01/12 TP on ADP Condorcet08/12 Sample Complexity of ADP Condorcet15/12 Guest lecture (TBD)

mid-Jan Evaluation (TBD)

Lectures are from 11am to 1pm, TP from 11am to 1:15pm.

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 15/16

Reinforcement Learning

Alessandro Lazaricalessandro.lazaric@inria.fr

sequel.lille.inria.fr

top related