exploiting cognitive constraints to improve machine-learning memory models michael c. mozer...

Exploiting Cognitive ConstraintsTo Improve Machine-Learning Memory Models

Michael C. MozerDepartment of Computer ScienceUniversity of Colorado, Boulder

Why Care About Human Memory?

The neural architecture of human vision has inspired computer vision. Perhaps the cognitive architecture of memory can inspire the design of RAM systems.

Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.

E.g., selecting material for students to review to maximize long-term retention (Lindsey et al., 2014)

The World’s Most Boring Task

Stimulus X -> Response aStimulus Y -> Response b

freq

uenc

y

response latency

response latency

Sequential Dependencies

Dual Priming Model(Wilder, Jones, & Mozer, 2009; Jones, Curran, Mozer, & Wilder, 2013)

Recent trial history leads to expectation of next stimulus

Responses latencies are fast when reality matches expectation

Expectation is based on exponentially decaying traces of two different stimulus properties

Examining Longer-Term Dependencies(Wilder, Jones, Ahmed, Curran, & Mozer, 2013)

Declarative Memory

Cepeda, Vul, Rohrer, Wixted, & Pashler (2008)

study test

Forgetting Is Influenced By The Temporal Distribution Of Study

Spaced study Massed studyproduces morerobust & durablelearning than

Experimental Paradigm To Study Spacing Effect

Cepeda, Vul, Rohrer, Wixted, & Pashler (2008)

Intersession Interval (Days)

% R

ecal

l

Optimal Spacing Between Study Sessionsas a Function of Retention Interval

Predicting The Spacing Curve

characterizationof student

and domain

intersessioninterval

MultiscaleContextModel

predictedrecall

forgetting afterone session

Intersession Interval (Days)

% R

ecal

l

Multiscale Context Model(Mozer et al., 2009)

Neural network

Explains spacing effects

Multiple Time Scale Model(Staddon, Chelaru, & Higa, 2002)

Cascade of leaky integrators

Explains rate-sensitive habituation

Kording, Tenenbaum, Shadmehr (2007)

Kalman filter

Explains motor adaptation

Key Features Of Models

Each time an event occursin the environment…

A memory of this eventis stored via multiple traces

Traces decay exponentiallyat different rates

Memory strength isweighted sum of traces

Slower scales are downweighted relative to faster scales

Slower scales store memory (learn) only when faster scales fail to predict event

trac

e st

reng

th

medium slowfast

+ +

time

time

eventoccurrence

eventoccurrence

Exponential Mixtures Scale Invariance➜

Infinite mixture of exponentials gives exactly power function

Finite mixture of exponentials gives good approximation to power function

With , can fit arbitrary power functions

+ + =

Relationship To Memory Models In Ancient NN Literature

Focused back prop (Mozer, 1989), LSTM (Hochreiter & Schmidhuber, 1997)

Little/no decay

Multiscale backprop (Mozer, 1992), Tau net (Nguyen & Cottrell, 1997)

Learned decay constants

No enforced dominance of fast scales over slow scales

Hierarchical recurrent net (El Hihi & Bengio, 1995)

Fixed decay constants

History compression (Schmidhuber, 1992;Schmidhuber, Mozer, & Prelinger, 1993)

Event based, not time based

Sketch of Multiscale Memory Module

xt: activation of ‘event’ in input to be remembered, in [0,1]

mt: memory trace strength at time t

Activation rule (memory update) based on error,

Activation rule consistent with the 3 models(for Koerding model, ignore KF uncertainty)

This update is differentiable ➜can back prop through memory module

Redistributes activation across time scales in a manner that is dependent on temporal distribution of input events

Could add output gate as well to make it even more LSTM-like

+

∆fixed

learned

+1

-1

xt

mt

Sketch of Multiscale Memory Module

Pool of self-recurrent neurons with fixed time constants

Input is the response of a feature-detection neuron

This memory module stores the particular feature that is detected

When the feature is present, the memory updates Update depends on error between  is a feature detected at time t

When feature detected, memory state compared to input, and a correction is made to memory to represent input strongly

+

∆fixed

learned

+1

-1

+1

Why Care About Human Memory?

Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.

E.g., shopping patterns

E.g., pronominal reference

E.g., music preferences

exploiting cognitive constraints to improve machine-learning memory models michael c. mozer...

Documents

environmenta memory

memory trace strength

human memory essential

declarative memory cepeda

event trace

function of retention

event occursin

activation of event