Download - Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL
B
Trials Into Past
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Rew
ard
His
tory
Wei
gh
t (β
)
CON
i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8Kennerley, et al., NatureNeuroscience, 2006
ACCs
B
Trials Into Past
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Rew
ard
His
tory
Wei
gh
t (β
)
CON
i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8Kennerley et al. NatureNeuroscience, 2006
Anatomy - Differences in connections between ACCs and
ACCg.
•Connections unique to the sulcus are mainly with motor regions:• Primary motor cortex
• Premotor cortex
• Parietal motor areas
• Spinal Cord
• ACCs has information about our own actions
Anatomy - Differences in connections between ACCs and
ACCg.• Connections unique to the gyrus are mainly
with regions that process emotional and biological stimuli:
• Periacqueductal grey
• hypothalamus
• STS/STG
• Insula/Temporal pole connections are stronger to the gyrus
• ACCg has access to information about other agents.
Anatomy - shared connections between ACCs and ACCg.
•Some shared connections • Orbitofrontal cortex
• Amydala
• Ventral striatum
• ACCg and ACCs are strongly interconnected
• Both regions have access to and influence over reward and value processing.
ACCs
B
Trials Into Past
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Rew
ard
His
tory
Wei
gh
t (β
)
CON
i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8Kennerley et al. NatureNeuroscience, 2006
Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005
Trials Into Past
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Rew
ard
His
tory
Wei
gh
t (β
)
CON
i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8
What determines the integration length?
Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005
VOLATILEReward probabilities changeapproximately every 25 trials
STABLEReward probabilities changeonly after hundreds of trials
Trials Into Past
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Rew
ard
His
tory
Wei
gh
t (β
)
CON
i-1 i-2 i-3 i-4 i-5 i-6 i-7 i-8
Reinforcement learning
• We need to continually re-appraise the value of an action based each new experience.
δprediction
(Vt)
outcome
αxδnew prediction
(Vt+1)
Updating beliefs on the basis of new information
14
Vt+1=Vt +( αxδ
The learning rate is the weight given to the current information
The prediction erroris the information
available from this event
The learning rate and the value of information.
Vt+1=Vt +( αxδ
The learning rate should represent the value of the current information
for guiding future beliefs.
changes in reward estimates occur throughout the task…
Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
…as do change in volatility estimates
ACC effect size predicts learning rate across subjects
Behrens, Woolrich, Walton &Rushworth Nat Neurosci 2007
Sources of information
Probability that confederate advice is good Probability that correct colour is blue
Value of action information Value of social information
Behrens, Hunt, Woolrich, Rushworth Nature 2008
Reward Prediction Error
Reward -Expectation
Vt+1=Vt +( αxδ
Outcome
Time
Eff
ect
siz
e
Behrens, Hunt, Woolrich, Rushworth Nature 2008
Prediction error on a social partner.
Lie event -Lie prediction
Vt+1=Vt +( αxδ
Outcome
Time
Eff
ect
siz
e
Behrens, Hunt, Woolrich, Rushworth Nature 2008
The value of information and the ACC
30
Value of reward informationValue of social information
Vt+1=Vt +( αxδ
32
Conclusions
• ACC codes a learning signal when information is observed.
• This signal predicts the speed of learning.
• Learning from our own and others’ actions are processed in parallel in ACCs and ACCg.
• The outputs of these parallel learning processes are combined in the reward system.