a quick look at the “reinforcement learning”...

MVA-RL Course

A Quick Look at the“Reinforcement Learning” course

A. LAZARIC (SequeL Team @INRIA-Lille)ENS Cachan - Master 2 MVA

SequeL – INRIA Lille

A. LAZARIC – Introduction to Reinforcement Learning Fall 2015 - 2/16

Why: Important Problems

I Autonomous robotics

I Elder careI Exploration of

unknown/dangerousenvironments

I Robotics for entertainment

I Autonomous roboticsI Elder care

I Exploration ofunknown/dangerousenvironments

I Autonomous roboticsI Elder careI Exploration of

I Autonomous roboticsI Financial applications

I Trading execution algorithmsI Portfolio managementI Option pricing

I Trading execution algorithms

I Portfolio managementI Option pricing

I Trading execution algorithmsI Portfolio management

I Option pricing

I Trading execution algorithmsI Portfolio managementI Option pricing

I Autonomous roboticsI Financial applicationsI Energy management

I Energy grid integrationI Maintenance schedulingI Energy market regulationI Energy production

management

I Energy grid integration

I Maintenance schedulingI Energy market regulationI Energy production

management

I Energy grid integrationI Maintenance scheduling

I Energy market regulationI Energy production

management

I Energy grid integrationI Maintenance schedulingI Energy market regulation

I Energy productionmanagement

I Energy grid integrationI Maintenance schedulingI Energy market regulationI Energy production

management

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systems

I Web advertisingI Product recommendationI Date matching

I Web advertising

I Product recommendationI Date matching

I Web advertisingI Product recommendation

I Date matching

I Web advertisingI Product recommendationI Date matching

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications

I Bike sharing optimizationI Election campaignI ER service optimizationI Intelligent Tutoring Systems

I Autonomous roboticsI Financial applicationsI Energy managementI Recommender systemsI Social applications I Bike sharing optimization

I Election campaignI ER service optimizationI Intelligent Tutoring Systems

I Election campaign

I ER service optimizationI Intelligent Tutoring Systems

I Election campaignI ER service optimization

I Intelligent Tutoring Systems

I Election campaignI ER service optimizationI Intelligent Tutoring Systems

What: Decision-Making under Uncertainty

Environment

state /actuationaction /

perception

How: Reinforcement Learning

Reinforcement learning is learning what to do – how tomap situations to actions – so as to maximize a

numerical reward signal in an unknown uncertainenvironment. The learner is not told which actions to

take, as in most forms of machine learning, but she mustdiscover which actions yield the most reward by tryingthem (trial–and–error). In the most interesting and

challenging cases, actions may affect not only theimmediate reward but also the next situation and,

through that, all subsequent rewards (delayed reward).

“An introduction to reinforcement learning”,Sutton and Barto (1998).

How: the Course

Environment

perception

Formal and rigorous approach tothe RL’s way to decision-making under uncertainty

How: the Course

Environment

perception

How: the Course

Environment

perception

How: the Course

Environment

perception

How: the Course

Environment

perception

How: the Course

Environment

perception

How: the Course

Environment

perception

How: the Course

Environment

perception

What: the Highlights of the CourseHow to model an RL problem

I What: Markov decision processI Tools: probability, processes, Markov chain

How to solve exactly an RL problem

How to solve incrementally an RL problem

How to efficiently explore in an RL problem

How to solve approximately an RL problem

With examples from resource optimization, trade execution,(computer) games, recommendation systems.

How to solve exactly an RL problemI What: Dynamic programmingI Tools: fixed point, operators

How to solve incrementally an RL problemI What: temporal difference, Q-learningI Tools: stochastic approximation

How to efficiently explore in an RL problemI What: multi-armed bandit problemI Tools: concentration inequalities

How to solve approximately an RL problemI What: approximate dynamic programmingI Tools: statistical learning theory

What: the Highlights of the Course

How to model an RL problem

When/What/Where

I 7 lectures

I 4 practical sessions (and homework) [1 point each]

I 1 final project (report and oral presentation) [16 points]

Opportunities for spring internship and Ph.D. positions.

When/What/Whereresearchers.lille.inria.fr/˜lazaric/Webpage/Teaching.html

Date Topic Classroom29/09 Intro/MDP Conference06/10 Dynamic Programming Condorcet13/10 RL Algorithms Condorcet20/10 TP on DP and RL Condorcet27/10 Multi-arm Bandit (1) Condorcet03/11 TP on Bandit Amphi Curie10/11 Multi-arm Bandit (2) [projects] Amphi Curie17/11 TP on Bandit Condorcet24/11 Approximate DP Condorcet01/12 TP on ADP Condorcet08/12 Sample Complexity of ADP Condorcet15/12 Guest lecture (TBD)

mid-Jan Evaluation (TBD)

Lectures are from 11am to 1pm, TP from 11am to 1:15pm.

Reinforcement Learning

Alessandro Lazaricalessandro.lazaric@inria.fr

sequel.lille.inria.fr

a quick look at the “reinforcement learning”...

Documents

for high reinforcement systems - nevoga.com€¦ · 2...

chb reinforcement

reinforcement learning das reinforcement learning-problem...

markov decision processes and dynamic programming...

markov decision processes and dynamic...

reinforcement accessories

bayesian reinforcement learning in continuous...

reinforcement learning or active...

reinforcement learning and deep reinforcement...

brand reinforcement

inverse reinforcement learning cs885 reinforcement

learning frameworks associative reinforcement...

approximate dynamic...

a. lazaric (sequel team @inria-lille - irit · i robotics...

introduction to reinforcement learning -...

for high reinforcement systems - kotaca.cz · reinforcement...

markov decision processes and dynamic...

autism & reinforcement

from reinforcement learning to deep reinforcement...

reinforcement learning lecture inverse reinforcement...