classical conditioning iii: temporal difference …yael/psy338/4 classical conditioning...
TRANSCRIPT
![Page 1: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/1.jpg)
Classical Conditioning III:Temporal difference learning
PSY/NEU338: Animal learning and decision making: Psychological, computational and neural perspectives
2
understanding a new concept
![Page 2: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/2.jpg)
Ivan Pavlov
recap: animals learn predictions
3
= Conditional Stimulus
= Unconditional Stimulus
= Unconditional Response (reflex); Conditional Response (reflex)
what makes conditioning Pavlovian?
4
procedurally: Pavlovian/classical conditioning is a learning situation in which the reinforcer does not depend
on the animal’s response
from the animal’s point of view: the conditioned response is unaviodable, like a reflex, not utilitarian or flexible;
direct result of a prediction
![Page 3: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/3.jpg)
But... 1) Rescorla’s control condition
temporal contiguity is not enough - need contingency
P(food | light) > P(food | no light)
Credits: Randy Gallistel 5
contingency is also not enough.. need surprise
P(food | noise+light) ≠ P(food | noise alone)
Phase I Phase II
+
But... 2) Kamin’s blocking
![Page 4: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/4.jpg)
The idea: error-driven learning Change in value is proportional to the difference between actual and predicted outcome
Rescorla & Wagner (1972)
Two assumptions/hypotheses: (1) learning is driven by error (formalize notion of surprise)(2) summations of predictors is linear
�V (CSi) = �[RUS ��
j�trial
V (CSj)]
7
what does the theory explain?
R-W
acquisition
extinction
blocking
overshadowing
temporal relationships
overexpectation
√√√√X
√
8
![Page 5: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/5.jpg)
but: second-order conditioning
15
24
33
41
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14
cond
ition
ed r
espo
ndin
g
number of phase 2 pairings
animals learn that a predictor of a predictor is also a predictor!⇒ not interested solely in predicting immediate reinforcement..
phase 1:
phase 2:
test: ?
what would Rescorla-Wagner learning predict here?
9
what do you think will happen?
Challenge:
• Can you modify the R-W learning rule to account for second order conditioning?
• Group work:
- what is the fundamental problem here?
- ideas how to solve it?
10
![Page 6: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/6.jpg)
Parkinson’s Disease→ Motor control / initiation?
Drug addiction, gambling, Natural rewards→ Reward pathway?→ Learning?
Also involved in:• Working memory• Novel situations• ADHD• Schizophrenia• …
hint: dopamineDorsal Striatum (Caudate, Putamen)
Ventral Tegmental Area Substantia Nigra
Nucleus Accumbens(ventral striatum)
Prefrontal Cortex
11
the anhedonia hypothesis (Wise, ’80s)
12
• Anhedonia = inability to experience positive emotional states derived from obtaining a desired or biologically significant stimulus
• Neuroleptics (dopamine antagonists) cause anhedonia• Dopamine areas are a target of intra-cranial self stimulation (ICSS)• Dopamine is important for reward-mediated conditioning
![Page 7: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/7.jpg)
predictablereward
omitted reward
(Schultz et al. ’90s)
but...
13
• behavioral puzzle: second order conditioning
• neural puzzle: dopamine responds to reward predicting stimuli instead of to rewards
• think about it…
14
![Page 8: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/8.jpg)
15
David Marr (1945-1980) proposed three levels of analysis:
1. the problem (Computational Level)
2. the strategy (Algorithmic Level)
3. how its actually done by networks of neurons (Implementational Level)
understanding the brain:where do we start?!
The problem: optimal prediction of future reinforcement
lets start over, this time from the top...
16
want to predict expected sum of future reinforcement
Vt = E
� �⇤
i=t+1
ri
⇥
want to predict expected sum of discounted future reinf. (0<γ<1)
Vt = E
" 1X
i=t+1
�i�t�1ri
#
want to predict expected sum of future reinforcement in a trial/episode
Vt = E
�T⇤
i=t+1
ri
⇥
![Page 9: Classical Conditioning III: Temporal difference …yael/PSY338/4 Classical Conditioning III...Classical Conditioning III: Temporal difference learning ... Ivan Pavlov recap: animals](https://reader031.vdocuments.mx/reader031/viewer/2022021505/5accfe4b7f8b9aa1518cef93/html5/thumbnails/9.jpg)
The problem: optimal prediction of future reinforcement
lets start over, this time from the top...
(note: t indexes time within a trial)
17
want to predict expected sum of future reinforcement in a trial/episode
Vt = E
�T⇤
i=t+1
ri
⇥
Vt = E [rt+1 + rt+2 + ... + rT ]
= E [rt+1] + E [rt+2 + ... + rT ]
= E [rt+1] + Vt+1
The problem: optimal prediction of future reinforcement
lets start over, this time from the top...
(note: t indexes time within a trial)
Think football…What would be a sensible learning rule here?How is this different from Rescorla-Wagner?
18
Vt = E [rt+1 + rt+2 + ... + rT ]
= E [rt+1] + E [rt+2 + ... + rT ]
= E [rt+1] + Vt+1