reinforcement learning and human behavior hanan shteingart and yonatan loewenstein

19
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava

Upload: kibo-hicks

Post on 03-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava. Introduction. Operant Learning - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Reinforcement learning and

human behavior

Hanan Shteingart and Yonatan Loewenstein

MTAT.03.292 Seminar in Computational Neuroscience

Zurab Bzhalava

Page 2: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Introduction

• Operant Learning

• Dominant computational approach to model operant learning is model-free RL

• Human behavior is far more complex

• Remaining Challenges

Page 3: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Reinforcement Learning

RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment

Goal: Learn a policy to maximize some measure of long-term reward

Page 4: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Markov Decision Process

• A (finite) set of states S• A (finite) set of actions A• Transition Model: T(s, a, s’) = P(s’ | a ,s)• Reward Function: R(s)

• ᵧ is a discount factor ᵧ [0; 1]∈

• Policy π

• Optimal policy π*

Page 5: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Markov Decision Process

Bellman equation:

Page 6: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Biological Algorithms

• Behavioral control

• Evaluate the world quickly

• Choose appropriate behavior based on those valuations

Page 7: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

midbrain's dopamine neurons

• Central role in guiding our behavior and thoughts

• Valuation of our world– Value of money– Other human being

• Major role in decision-making • Reward-dependent learning• Malfunction in mental illness • Related to Parkinson's disease. • Schizophrenia

Page 8: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Reinforcement signals define an agent's goals

1. organism is in state X an receives reward information;

2. organism queries stored value of state X;

3. organism updates stored value of state X based on current reward information;

4. organism selects action based on stored policy

5. organism transitions to state Y and receives reward information.

Page 9: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

The reward-prediction error hypothesis

Difference between the experienced and predicted “reward” of an event

•Neurons of the ventral tegmental area

•phasic activity changes encode a 'prediction error about summed future reward'

Page 10: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

prediction-error signal encoded in dopamine neuron firing.

Page 11: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Value binding

Page 12: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Human reward responses

Page 13: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Human reward responses

Page 14: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Model-based RL vs Model-free RL

• goal-directed vs habitual behaviors

• Implemented by two anatomically distinct systems (subject of debate)

• Some findings suggest:

– Medial striatum is more engaged during planning

– Lateral striatum is more engaged during choices in extensively trained tasks

Page 15: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Model-based RL vs Model-free RL

(b) Model-free RL

(c) Model-based RL

Human subjects in exhibited a mixture of both effects.

Page 16: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Challenges in relating human behavior to RL algorithms

• Humans tend to alternate rather than repeat an action after receiving a positively surprising payoff

• Tremendous heterogeneity in reports on human operant learning

• Probability matching or not

Page 17: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Heterogeneity in world model

Page 18: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Learning the world model

Page 19: Reinforcement learning and human behavior  Hanan Shteingart and Yonatan Loewenstein

Reference List:

• Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

• The ubiquity of model-based reinforcement learning Bradley B Doll Dylan A Simon3 and Nathaniel D Daw

• Computational roles for dopamine in behavioral control P. Read Montague1,2, Steven E. Hyman3 & Jonathan D. Cohen4,5