learning in large-scale spiking neural...
TRANSCRIPT
Learning in large-scalespiking neural networks
Trevor Bekolay
Center for Theoretical NeuroscienceUniversity of Waterloo
Master’s thesis presentationAugust 17, 2011
Outline
Unsupervised learning
Supervised learning
Reinforcement learning
Unsupervised learning
The problem
How does the brain learn without feedback?
Neurons connect at synapses
Neurotransmitter
Neurotransmitter
receptors
Voltage-gated
calcium
channel
Axon terminal
(presynaptic)
Synaptic cleft
Dendritic spine
(postsynaptic)
Synaptic strengths change
DefinitionsThis is called synaptic plasticity.
When a synapses gets stronger, it is potentiated.
When a synapse gets weaker, it is depressed.
Modelling LTP/LTD
0 1 2 3 4 5 60
50
100
150
200
250
300
350
Time (hours)
% c
hang
e in
popul
atio
n EPS
P am
plit
ude
~100Hz stim for10s ~100Hz stim for10s
−10 −5 0 5 10 15 20 25 30−100
−50
0
50
100
Time (minutes)
% c
hang
e in
EPS
P slope
3Hz stim for 5 min
BCM rule
θ
0
∆ωij = κaiaj(aj − θ)
Modelling LTP/LTD
0 1 2 3 4 5 60
50
100
150
200
250
300
350
Time (hours)
% c
hang
e in
popul
atio
n EPS
P am
plit
ude
~100Hz stim for10s ~100Hz stim for10s
−10 −5 0 5 10 15 20 25 30−100
−50
0
50
100
Time (minutes)
% c
hang
e in
EPS
P slope
3Hz stim for 5 min
BCM rule
θ
0
∆ωij = κaiaj(aj − θ)
Modelling LTP/LTD with timing
Pre
Post
wiji
j
t i
pre
t j
post
wij
wij
Pre-post Post-pre-40 0 40
0
-0.5
1
-40 0 40
0
Pre-post Post-pre
What’s the difference?
TheoremIf spike trains have Poisson statistics,STDP can be mapped onto a BCM rule(Izhikevich, 2003 and Pfister & Gerstner, 2006).
What’s the difference?
TheoremIf spike trains have Poisson statistics,STDP can be mapped onto a BCM rule(Izhikevich, 2003 and Pfister & Gerstner, 2006).
Test with a simulation
i jΔωij= κ iφ(aj )a ,θ
Presynaptic
pulse
Postsynaptic
pulse
Pulse delay
Pulse rate
Recreated STDP curve
−0.03 −0.02 −0.01 0.00 0.01
tpost − tpre (seconds)
-50
0
50
100
%ch
ange
inωij
Pulse rate: 20Hzθ = 0.30
θ = 0.33
θ = 0.36
Exhibits frequency dependence
−0.03 −0.02 −0.01 0.00 0.01
tpost − tpre (seconds)
-10
0
10
20
30
40
50
%ch
ange
inωij
Pulse rate: 40Hz
Frequency dependence details
1 5 10 20 50 100Stimulation frequency (Hz)
-50
0
50
100
150
%ch
ange
inωij
Simulation (post-pre pairs)
θ = 0.33
θ = 0.30
1 5 10 20 50 100Stimulation frequency (Hz)
−20
−10
0
10
20
30
%ch
ange
inωij
Experiment (Kirkwood)
Light-rearedDark-reared
1 5 10 20 50 100Stimulation frequency (Hz)
-50
0
50
100
150
%ch
ange
inωij
Simulation (pre-post pairs)
θ = 0.33
θ = 0.30
Supervised learning
The problem
Given input X and output Y, find G(x) = y
Learning nonlinear functions
Function Inputdimensions
Outputdimensions
Neurons inlearningnetwork
Neurons incontrolnetwork
f(x) = x1x2 2 1 420 420f(x) = x1x2 + x3x4 4 1 756 756f(x) = [x1x2, x1x3, x2x3] 3 3 756 756f(x) = [x1, x2]⊗ [x3, x4] 4 2 700 704 (2-layer)
1500 (3-layer)f(x) = [x1, x2, x3]⊗ [x4, x5, x6] 6 3 800 804 (2-layer)
1700 (3-layer)
Simulation results
0 10 20 30 40 50 60
1.0
1.2
1.4
1.6
1.8
2.0
Rel
ativ
eer
ror
(lea
rnin
gvs
.ana
lyti
c)
f(x) = x1x2
0 20 40 60 80 1000.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
f(x) = x1x2 + x3x4
0 20 40 60 80 1000.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
f(x) = [x1x2, x1x3, x2x3]
0 20 40 60 80 100 120 140Learning time (seconds)
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
Rel
ativ
eer
ror
(lea
rnin
gvs
.ana
lyti
c)
3 layer control2 layer control
f(x) = [x1, x2]⊗ [x3, x4]
0 50 100 150 200 250 300 350 400Learning time (seconds)
1.0
1.2
1.4
1.6
1.8
2.0
2.2
3 layer control
2 layer control
f(x) = [x1, x2, x3]⊗ [x4, x5, x6]
Simulation results
1 2 3 4 5 6 7
Input dimensions20
30
40
50
60
70
80
90
100
Tim
eto
lear
n(s
econ
ds)
1 2 3
Output dimensions30
40
50
60
70
80
90
100
3 4 5 6 7 8 9
Input+Output dimensions20
30
40
50
60
70
80
90
100
∼ 2.25 hours to learn 500 → 500 D function
Large-scale models
Neural Engineering Framework1. Representation (encoding and decoding)
2. Transformation
Representation
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
Encoding
Decoding
Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5
−1.0
−0.8
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
Time (seconds)
Encoding vectors represent neuron sensitivity
−1.0 −0.5 0.0 0.5 1.00
50
100
150
200
−0.5
0.0
0.5
−0.50.0
0.5
20
40
60
80
100
LIF tuning curves Projected in 2-D space
Plasticity in the NEF
∆ωij = κaiαjej · Ewhere
I ej is the encoding vector andI E is the error to be minimized.
Calculating E online
Error
OutputInputx
f(y) = -y
G(x) = ?
Excitatory/InhibitoryModulatoryPlastic
y
Our solution
Error
OutputInputx
f(y) = -y
G(x) = ?
Excitatory/InhibitoryModulatoryPlastic
y
Network provides E
+EM-rule ∆ωij = κaiαjej · E
Reinforcement learning
D
G
Rw
RtRt
Rw
A
D
Go
A
L R
Rw
Rt Rt
Reinforcement learning
D
G
Rw
RtRt
Rw
A
D
Go
A
L R
Rw
Rt Rt
Behavioural results
40 80 120 1600.0
0.2
0.4
0.6
0.8
1.0Always move left
Always move right
L:21% R:63% L:63% R:21% L:12% R:72% L:72% R:12%Average over 200 runs
One experimental run
One simulation run
Action selection (basal ganglia)
π(s) = arg maxaQ(st+1, at+1)
Changing Q-values
Ventral striatum calculates reward prediction error
Neural results
Experimental
Simulation
0510152025303540
020406080100120140160
Delay Approach Reward DelayN
eur
on
Tria
l
Contributions
I Shown evidence that BCM capturesqualitative features of STDP
I Proposed a biologically plausiblesolution to the supervised learningproblem
I Created a model that solves stochasticreinforcement learning task andmatches biology
A unifying learning rule
∆ωij = αjai [κuaj(aj − θ) + κsej · E]
Acknowledgements
Thank you for listening and (possibly) reading!