chapter4_part3 differential heb learning & differential competitive learning tutor : prof. gao...
TRANSCRIPT
Chapter4_Part3Chapter4_Part3Differential Heb learning & Differential Heb learning &
Differential Competitive Differential Competitive
learninglearning
Chapter4_Part3Chapter4_Part3Differential Heb learning & Differential Heb learning &
Differential Competitive Differential Competitive
learninglearning
Tutor : Prof. Gao Reporter : WaTutor : Prof. Gao Reporter : WangYingngYing
2006.10.30
Review
Signal Heb Learning Law
Competitive Learning Law
ij ij i jm m S S
ij j ij i jm S m S S ij j i ijm S S m
2006.10.30
Part I: Differential Heb Learning
Learning law
Its simpler version
Hebbian correlations promote spurious causal associations among concurrently active units. Differential correlations estimate the concurrent and presumably causal variation among active units.
ij ij i j i jm m S S S S
ij ij i jm m S S
2006.10.30
Differential Heb Learning
Fuzzy Cognitive Maps (FCMs)Adaptive Causal InferenceKlopf’s Drive Reinforcement ModelConcomitant Variation as Statistical Covari
ancePulse-Coded Differential Hebbian Learning
2006.10.30
Fuzzy Cognitive Maps ( 模糊认知映射 )
Fuzzy signed directed graphs with feedback. It model the world as a collection of classes and causal relations between classes.
The directed edge from causal concept to concept measures how much causes .
ije iC
jC iC jC
iC jCije
: Sells of computers
: Profits
iC jC
2006.10.30
Fuzzy Cognitive Map of South African Politics
外国投资 矿业 雇用黑人
白人种族激进主义
工作保留法律
黑人种族联合
种族隔离政府管理力度
民族政党支持者
1c
2C
3C
4C5C 6C
7C
8C 9C
2006.10.30
Causal Connection Matrix
E
1
2
3
4
5
6
7
8
9
C
C
C
C
C
C
C
C
C
0 1 1 0 0 0 0 1 1
0 0 1 0 0 0 0 1 0
0 0 0 1 0 1 0 1 1
0 0 0 0 0 1 1 0 1
0 1 1 0 0 1 1 0 0
0 0 0 1 0 0 1 1 0
0 0 0 0 1 0 0 1 0
0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 1 0
1 2 3 4 5 6 7 8 9C C C C C C C C C
2006.10.30
TAM recall process
2C 1C
We start with the foreign investment policy
Then
The arrow indicates the threshold operation with, say, ½ as the threshold value.
So zero causal input produces zero causal output. contains equals 1 because we are testing the foreign-investment policy option. Next
Next
So is a fixed point of the FCM dynamical system.
1 1 0 0 0 0 0 0 0 0C
1 0 1 1 0 0 0 0 1 1C E
1 1 1 0 0 0 0 1 1 2C
2 0 1 2 1 1 1 1 4 1C E 1 1 1 1 0 0 0 1 1 3C
3 0 1 2 1 1 0 0 4 1C E
31 1 1 1 0 0 0 1 1 C
3C
2006.10.30
Strengths and weaknesses of FCM
Advantages☺Experts: 1.represent factual and evaluative concepts in an int
eractive framework; 2.quickly draw FCM pictures or respond to questionnaires; 3.consent or dissent to the local causal structure and perhaps the global equilibrations.
☺FCM knowledge representation and inferencing structure: reduces to simple vector-matrix operations, favors integrated-circuit implementation, and allows extension to neural, statistical, or dynamical systems techniques.
Disadvantages It equally encodes the expert’s knowledge or ignorance, wis
dom or prejudice. Since different experts differ in how they assign causal strengths to edges, and in which concepts they deem causally relevant, the FCM seems merely to encode its designer’s biases, and may not even encode them accurately.
2006.10.30
Combination of FCMs We combined arbitrary FCM connection matrices
by adding augmented(增广 )FCM matrices . We add the pointwise to yield the combined FCM matrix :
Some experts may be more credible than others. We can weight each expert with a nonnegative credibility weight by multiplicatively weighting the expert’s augmented FCM matrix:
Adding FCM matrices represents a simple form of causal learning.
1,..., kE E
1,..., kF F iF
F
ii
F F
i
i ii
F F
2006.10.30
Differential Heb Learning
Fuzzy Cognitive Maps (FCMs)Adaptive Causal InferenceKlopf’s Drive Reinforcement ModelConcomitant Variation as Statistical Covari
ancePulse-Coded Differential Hebbian Learning
2006.10.30
Adaptive Causal Inference
We infer causality between variables when we observe concomitant variation or lagged variation between them. If B changes when A changes, we suspect a causal relationship. The more correlated the changes, the more we suspect a causal relationship, or, more accurately.
Time derivatives measure changes. Products of derivatives correlate changes. This leads to the simplest differential Hebbian learning law: ij ij i je e C C
2006.10.30
Adaptive Causal Inference
The passive decay term forces zero causality between unchanging concepts.
The concomitant-variation term indicates causal increase or decrease according to joint concept movement. If and both increase or both decrease, the product of derivatives is positive, v.v.
The concomitant-variation term provides a simple causal “arrow of time”.
ije
i jCC
iC
jC
2006.10.30
Differential Heb Learning
Fuzzy Cognitive Maps (FCMs)Adaptive Causal InferenceKlopf’s Drive Reinforcement ModelConcomitant Variation as Statistical Covari
ancePulse-Coded Differential Hebbian Learning
2006.10.30
Klopf’s Drive Reinforcement Model
Harry Klopf independently proposed the following discrete variant of differential Hebbian learning:
where the synaptic difference updates the
current synaptic efficacy in the first-order difference equation
ijm t
1
T
ij j j j ij i ik
m t S y t c m t k S x t k
1ij ij ijm t m t m t ijm t
2006.10.30
Klopf’s Drive Reinforcement Model
The term drive reinforcement arises from variables and their velocities. Klopf defines a neuronal drive as the weighted signal and a neuronal reinforcer as the weighted difference .
A differentiable version of the drive-reinforcement model take the form:
The synaptic magnitude amplifies the synapse’s plasticity. In particular, suppose the ijth synapse is excitatory: . Then we can derive:
Implicitly the passive decay coefficient scales the term. The coefficient will usually be much smaller than unity to prevent rapid forgetting:
ij ij ij i jm m m S S
ijm0ijm
1ij ij i jm m S S
ijm
ij ij i jm m S S
ij im S
ij im S
2006.10.30
Klopf’s Drive Reinforcement Model
Drive-reinforcement synapses can rapidly encode neuronal signal information. Moreover, signal velocities or directions tend to be more robust, more noise tolerant.
Unfortunately, it tend to zero as they equilibrate, and they equilibrate exponentially quickly. This holds for both excitatory and inhibitory synapses.
2006.10.30
Klopf’s Drive Reinforcement Model
The equilibrium condition implies that
or in general. This would hold equally in a signal Hebbian model if we replaced the signal product with the magnitude -weighted product .
Klopf apparently overcomes this tendency in his simulations by forbidding zero synaptic values: .
0ijm
0ij i jm S S
0ijm
0.1ijm t
i jS S
ij i jm S S
2006.10.30
Klopf’s Drive Reinforcement Model
The simple differential Hebbian learning law
equilibrates to More generally the differential Hebbian law learns
an exponentially weighted average of sampled concomitant variations, since it has the solution
in direct analogy to the signal-Hebbian integral equation.
ij ij i jm m S S
ij i jm S S
0
0tt s t
ij ij i jm t m e S s S s e ds
2006.10.30
Differential Heb Learning
Fuzzy Cognitive Maps (FCMs)Adaptive Causal InferenceKlopf’s Drive Reinforcement ModelConcomitant Variation as Statistical Covari
ancePulse-Coded Differential Hebbian Learning
2006.10.30
Concomitant Variation as Statistical Covariance
The very term concomitant variation resembles the term covariance. In differential Hebbian learning we interpreted variation as time change, and concomitance as conjunction or product. Alternatively we can interpret variation spatially as a statistical variance or covariance.
Sejnowski has cast synaptic modification as a mean-squared optimization problem and derived a covariance-based solution. After some simplifications the optimal solution takes the form of the covariance learning law
,ij ij i i j jm m Cov S x S y
2006.10.30
Concomitant Variation as Statistical Covariance
Since We can derive
The stochastic-approximation approach estimates the unknown expectation with the observed realization product
So we estimate a random process with its observed time samples
xy i j i jE S S S S
, x zCov x z E xz m m
ij ij xy i j x i y jm m E S S E S E S
ij ij i j x i y jm m S S E S E S
2006.10.30
Concomitant Variation as Statistical Covariance
Suppose instead that we estimate the unknown joint-expectation term
as the observed time samples in the
integrand:
This leads to the new covariance learning law
How should a synapse estimate the unknown averages and at each time t?
xy i x i j y jE S E S S E S
xyE ,i j i x i j y jCov S S S E S S E S
ij ij i x i j y jm m S E S S E S
x iE S t y jE S t
2006.10.30
Concomitant Variation as Statistical Covariance
We can lag slightly the stochastic-approximation estimate in time to make a martingale assumption. A martingale assumption estimates the immediate future as the present, or the present as the immediate past
for some time instant s arbitrarily close to t. The assumption increases in accuracy as s approaches t.
0x i x i i
i
E S t E S t S s for s t
S s
1ij ij i jm t m t S t S t
2006.10.30
Concomitant Variation as Statistical Covariance
This approximation assumes that the signal processes are well-behaved: continuous, have finite variance, and are at least approximately wide-sense stationary.
In an approximate sense when time averages resemble ensemble averages, differential Hebbian learning and covariance learning coincide.
2006.10.30
Differential Heb Learning
Fuzzy Cognitive Maps (FCMs)Adaptive Causal InferenceKlopf’s Drive Reinforcement ModelConcomitant Variation as Statistical Covari
ancePulse-Coded Differential Hebbian Learning
2006.10.30
Pulse-Coded Differential Hebbian Learning
The velocity-difference property for pulse-coded signal functions
The pulse-coded differential Hebbian law replaces the signal velocities in the usual differential Hebbian law with the two differences
When no pulse are present, the pulse-coded DHL reduces to the random-signal Heb law.
ij ij i j ij
ij i j i j i j j i ij
m m S S n
m S S x y x S y S n
i i iS t x t S t j j jS t y t S t
2006.10.30
Pulse-Coded Differential Hebbian Learning
Replace the binary pulse functions with the bipolar pulse functions, and then suppose the pulses and the expected pulse frequencies, are pairwise independent. Then the average behavior reduces to
the ensemble-averages random signal Hebbian learning law or, equivalently, the classical deterministic-signal Hebbian learning law.
ij ij i jE m E m E S E S
2006.10.30
Pulse-Coded Differential Hebbian Learning
In the language of estimation theory, both random-signal Heb learning and random pulse-coded differential Heb learning provide unbiased estimators of signal Heb learning.
The pulse frequencies and can be interpret ergodically (time averages equaling space averages) as ensemble averages
,0i i i
i i
S t E x t x s s t
E x t x s
jSiS
2006.10.30
Pulse-Coded Differential Hebbian Learning
Substituting these martingale assumptions into pulse-coded DHL
It suggests that random pulse-coded DHL provides a real-time stochastic approximation to covariance learning
This show again how differential Heb learning and covariance learning coincide when appropriate time averages resemble ensemble averages.
ij ij i i i j j j ijm m x E x t x s y E y t y s n
,ij ij i j ijm m Cov S S n
2006.10.30
Part II: Differential Competitive Learning
Learning law
Learn only if change! The signal velocity is a local reinforceme
nt mechanism. Its sign indicates whether the jth neurons are winning or losing, and its magnitude measures by how much.
ij j j i i ij ijm S y S x m n
jS
2006.10.30
Differential Competitive Learning
If the velocity-difference property replaces the competitive signal velocity ,then the pulse-coded differential competitive learning law is just the difference of nondifferential competitive laws
jS
ij j j i ij ij
j i ij j i ij ij
m y S S m n
y S m S S m n
YF
Winning!
jy t jS tijm =1
Losing!
=0
0
Losing
2006.10.30
Competitive signal velocity & supervised reinforcement function
Both of them use a sign change to punish misclassifying .
Both of them tend to rapidly estimate unknown pattern-class centroids.
The unsupervised signal velocity dose not depend on unknown class memberships, it estimates this information with instantaneous win-rate information.
Even uses less information: DCL will perform comparably to SCL!
2006.10.30
Computation of postsynaptic
signal velocity
Velocity-difference property Nonlinear derivative reduces to the locally
available difference lies between ,except when The signal velocity at time is estimated by mere
presence or absence of the postsynaptic pulse .
jS
1 0j jS S 0j jy S 1j jand y S
0 0j jS S
jS
0 1jS
jy tt
high-speed sensory environmentsstimulus patterns shift constantly
slower, stabler pattern environments
j jy S
jS
2006.10.30
Differential-competitive synaptic conjecture then states:
Synapse can physically detect the presence or absence of pulse as a change in the postsynaptic neuron’s polarization.
Synapse can clearly detects the presynaptic pulse train , and thus the pulse-train’s pulse count in the most recent 30 milliseconds or so.
ix t
jy
jS t
Synapse
Incoming pulse train
Detected
postsynaptic pulse
Electrochemically
2006.10.30
Behavior patterns involved in animal learning
Klopf and Gluck suggest that input signal velocities provide pattern information for this.
Pulse-coded differential Hebbian learn
ing
Pulse-coded differential Hebbian learn
ing
Classical signal Hebbian learning
Classical signal Hebbian learning
Pulse-coded differential competitive
learning
Pulse-coded differential competitive
learning
Ordinary caseMicroscope
Process signalsstore, recognize, recall patterns
Noisy synaptic vectors can locally estimate pattern centroids in real time without supervision.
2006.10.30
Differential Competitive Learning as Delta Modulation
The discrete differential competitive learning law
represents a neural version of adaptive delta modulation. In communication theory, delta-modulation systems tra
nsmit consecutive sampled amplitude differences instead of the sampled amplitude values themselves. A delta-modulation system may transmit only signals, indicating local increase or decrease in the underlying sampled waveform.
1j j j j k jm k m k S y k x m k
1
2006.10.30
Differential Competitive Learning as Delta Modulation
Signal difference can be approximated as the activation difference
The signum operator sgn(.) behaves as a modified threshold function
It fixes the step size of the delta modulation, and a varia
ble step size will results in adaptive delta modulation.
sgn 1
j j j
j j
S y t y t
y t y t
1 0
sgn 0 0
1 0
if x
x if x
if x
jS jy
2006.10.30
Consecutive differences, more informative than consecutive samples
We define the statistical correlation between random variables x and z, it takes values in the bipolar interval [1,-1], x and z are positively correlated if , v.v.
Let denote pulse difference
Suppose the wide-sense-stationary random sequence is zero mean, and each of them has the same finite variance.
2
2
11 ,
1
j j
j j
j j
E y k y ky k y k
E y k y k
,x z
jy k
1k j jd y k y k
jykd
0
2006.10.30
Consecutive differences, more informative than consecutive samples
Random sequence also has zero mean. The above properties simplifies the variance as
If consecutive samples are highly positively correlated, if the differences have less variance than the samples
In the pulse-coded case, when the jth neuron wins, it emits a dense pulse train, this winning pulse frequency nay be sufficiently high to satisfy the property.
2 2
2 2 2 2
1 2 1
2 2 1
k j j j jV d E y k E y k E y k y k
kd 1 ,j jy k y k
kd
1 2