exploiting cognitive constraints to improve machine-learning memory models michael c. mozer...
TRANSCRIPT
Exploiting Cognitive ConstraintsTo Improve Machine-Learning Memory Models
Michael C. MozerDepartment of Computer ScienceUniversity of Colorado, Boulder
Why Care About Human Memory?
The neural architecture of human vision has inspired computer vision. Perhaps the cognitive architecture of memory can inspire the design of RAM systems.
Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.
E.g., selecting material for students to review to maximize long-term retention (Lindsey et al., 2014)
The World’s Most Boring Task
Stimulus X -> Response aStimulus Y -> Response b
freq
uenc
y
response latency
response latency
Sequential Dependencies
Dual Priming Model(Wilder, Jones, & Mozer, 2009; Jones, Curran, Mozer, & Wilder, 2013)
Recent trial history leads to expectation of next stimulus
Responses latencies are fast when reality matches expectation
Expectation is based on exponentially decaying traces of two different stimulus properties
Examining Longer-Term Dependencies(Wilder, Jones, Ahmed, Curran, & Mozer, 2013)
Declarative Memory
Cepeda, Vul, Rohrer, Wixted, & Pashler (2008)
study test
Forgetting Is Influenced By The Temporal Distribution Of Study
Spaced study Massed studyproduces morerobust & durablelearning than
Experimental Paradigm To Study Spacing Effect
Cepeda, Vul, Rohrer, Wixted, & Pashler (2008)
Intersession Interval (Days)
% R
ecal
l
Optimal Spacing Between Study Sessionsas a Function of Retention Interval
Predicting The Spacing Curve
characterizationof student
and domain
intersessioninterval
MultiscaleContextModel
predictedrecall
forgetting afterone session
Intersession Interval (Days)
% R
ecal
l
Multiscale Context Model(Mozer et al., 2009)
Neural network
Explains spacing effects
Multiple Time Scale Model(Staddon, Chelaru, & Higa, 2002)
Cascade of leaky integrators
Explains rate-sensitive habituation
Kording, Tenenbaum, Shadmehr (2007)
Kalman filter
Explains motor adaptation
Key Features Of Models
Each time an event occursin the environment…
A memory of this eventis stored via multiple traces
Traces decay exponentiallyat different rates
Memory strength isweighted sum of traces
Slower scales are downweighted relative to faster scales
Slower scales store memory (learn) only when faster scales fail to predict event
trac
e st
reng
th
medium slowfast
+ +
time
time
eventoccurrence
eventoccurrence
Exponential Mixtures Scale Invariance➜
Infinite mixture of exponentials gives exactly power function
Finite mixture of exponentials gives good approximation to power function
With , can fit arbitrary power functions
+ + =
Relationship To Memory Models In Ancient NN Literature
Focused back prop (Mozer, 1989), LSTM (Hochreiter & Schmidhuber, 1997)
Little/no decay
Multiscale backprop (Mozer, 1992), Tau net (Nguyen & Cottrell, 1997)
Learned decay constants
No enforced dominance of fast scales over slow scales
Hierarchical recurrent net (El Hihi & Bengio, 1995)
Fixed decay constants
History compression (Schmidhuber, 1992;Schmidhuber, Mozer, & Prelinger, 1993)
Event based, not time based
Sketch of Multiscale Memory Module
xt: activation of ‘event’ in input to be remembered, in [0,1]
mt: memory trace strength at time t
Activation rule (memory update) based on error,
Activation rule consistent with the 3 models(for Koerding model, ignore KF uncertainty)
This update is differentiable ➜can back prop through memory module
Redistributes activation across time scales in a manner that is dependent on temporal distribution of input events
Could add output gate as well to make it even more LSTM-like
+
∆fixed
learned
+1
-1
xt
mt
Sketch of Multiscale Memory Module
Pool of self-recurrent neurons with fixed time constants
Input is the response of a feature-detection neuron
This memory module stores the particular feature that is detected
When the feature is present, the memory updates Update depends on error between is a feature detected at time t
When feature detected, memory state compared to input, and a correction is made to memory to represent input strongly
+
∆fixed
learned
+1
-1
+1
Why Care About Human Memory?
Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.
E.g., shopping patterns
E.g., pronominal reference
E.g., music preferences