cognitive processing as inference + control (sequential...
TRANSCRIPT
Cognitive Processing as
Inference + Control (Sequential DM)
Angela J. Yu
Dept. of Cognitive Science
University of California, San Diego
Information Processing as Bayesian Inference
• Perception(Ernst & Banks, 2002; Kersten & Yuille, 2003; Battaglia et al, 2003)
• Attentional selection(Dayan & Zemel, 1999; Yu & Dayan, 2005; Yu, Dayan, & Cohen, 2009;
Whiteley & Sahani, 2012)
• Sensorimotor Learning(Körding & Wolpert, 2004)
• Temporal sequence learning(Yu & Dayan, 2005; Jones, Mozer, & Kinoshita, 2009; Behrens et al, 2007;
Nassar et al, 2012)
Behavior: Beyond Info Processing
• Which option to take?
• When to make a decision?
• Where to acquire data?
Bayesian inference
X• efficient representation
of information
• normative means of combining information
We need information control
• Perceptual decision-making
• 2AFC vs. Go-NoGo
• Stop-signal task and inhibitory control
• Visual search / active sensing
Outline
Outline
• Perceptual decision-making
• 2AFC vs. Go-NoGo
• Stop-signal task and inhibitory control
• Visual search / active sensing
(Newsome, Britten, & Movshon, 1989)
Poster Child: Perceptual Decision-MakingRandom-dot Coherent Motion Task
(Roitman & Shalden, 2002)
Drift-Diffusion Model
(Smith & Ratcliff, 2004)
Diffusion Model ⇔ Neurobiology
LIP Neural Response
Diffusion Model ⇔ Behavioral Data
Model: harder ⇒• slower• more errors boundary
Accuracy vs. Coherence <RT> vs. Coherence
easyhard
(Roitman & Shalden, 2002)
xt�1 xt+1xt
p(s; �)
p(x|s;�)
p(xt|s;�) =t�
i=1
p(xi|s;�)
Bayesian (Sequential) Statistical Inference
•••
shidden variable
(e.g. stimulus properties)
Generative Model
prior
likelihood
independent/iid noise
Pt =f1(xt)Pt�1
ZtPt+1 =
f1(xt+1)Pt
Zt+1Pt+2 =
f1(xt+2)Pt+1
Zt+2P0 =p(s=1)
Pt := P (s = 1|xt) � p(xt|s = 1)P (s = 1|xt�1)
Perceptual Inference: Evidence Accumulation
Iterative Application of Bayes’ Rule
Inference about hidden states based on data
a1 a2 a1 a2 a1 a2
wait wait
Decision problem: produce not only choice d but data size τ
Information Control: How Much Data?
+: more accurate
-: more time
Sequential Decisions: Bayes Risk Minimization
Decision policy: � : xt �� {L,R, cont}
L/R
continue
L/R
continue
L/R
Loss function (Bayes risk): L(�) = cE(�) + P (d �= s)
Optimal policy: minimizes total expected loss: delay and errors c determines trade-offAlgorithm: Bellman’s Dynamic Programming Principle (1952); slow
V (Pt) = min(Qs(Pt), Qc(Pt)) = min(Qs(Pt), c + E[V (Pt+1)|Pt]Pt+1(xt+1)
Wald & Wolfowitz (1948): hypothesis 1 (left) vs. hypothesis 2 (right)
Optimal policy• accumulate evidence over time: Pr(left) versus Pr(right)• stop if total evidence exceeds “left” or “right” boundary
“left” boundary
“right” boundary
Evidencetrial 1
trial 2
Optimal Decision-Making in 2AFC Task
Outline
• Perceptual decision-making
• 2AFC vs. Go-NoGo
• Stop-signal task and inhibitory control
• Visual search
2-Alternative Forced Choice vs Go/NoGo
left right
2AFC
go nogo
Go/NoGo
no/nogo yes/go0
0.05
0.1
0.15
Err
or
Rate
Data: Error rate
2AFC
GNG
0
200
400
RT
(m
s)
Data: RT
left/nogo right/go0
0.05
0.1
0.15
0.2
Stimulus
Err
or
Rate
Model: Error rate
2AFC
GNG
0
5
10
15
RT
(st
eps)
Model: RT
(Data from Bacon-Mace et al., 2007)A NoGo B Go B/Go
• Subjects show a “go bias”: ⇑FA/hits & ⇓RT in GNG
• Does this imply fundamentally different neural/cognitive processing?
Experimental Design
Rational Inference & Decision-Making
• shared (Bayesian) inference process & loss function
• asymmetric cost in GNG: Go terminates trial, NoGo does not
G
!t+1(bt+1)
G
wait!t(bt)
xt xt+1
wait
L R
!t+1(bt+1)
L R
wait!t(bt)
xt xt+1
waitd stimulus = {A, B}
x1 xt... Evidence
Sensory processing Action selection
2AFC
GN
G
A B A B
go go
(Shenoy & Yu, NIPS, 2012)
Loss function (Bayes risk): L(�) = cE(�) + P (d �= s)
Optimal Decision Threshold and Go Bias
• Optimal thresholds for 2AFC constant, collapsing just before deadline (Frazier & Yu, 2007)
• Optimal threshold for GNG time-varying✴ lower at trial start - represents opportunity cost of waiting✴ earlier Go responses: more hits and false alarms
no/nogo yes/go0
0.05
0.1
0.15
Err
or
Rate
Data: Error rate
2AFC
GNG
0
200
400
RT
(m
s)
Data: RT
left/nogo right/go0
0.05
0.1
0.15
0.2
Stimulus
Err
or
Rate
Model: Error rate
2AFC
GNG
0
5
10
15
RT
(st
eps)
Model: RT
0 20 400
0.5
1
Time
Belief
Decision thresholdA B Model: Error rate C Model: RTOptimal decision policy
2AFC GNG
(Frazier & Yu, NIPS, 2008)
(Shenoy & Yu, NIPS, 2012)
Model Reproduces Go Bias
no/nogo yes/go0
0.05
0.1
0.15
Err
or
Ra
te
Data: Error rate
2AFC
GNG
0
200
400
RT
(m
s)
Data: RT
left/nogo right/go0
0.05
0.1
0.15
0.2
Stimulus
Err
or
Ra
te
Model: Error rate
2AFC
GNG
0
5
10
15
RT
(st
ep
s)
Model: RT
no/nogo yes/go0
0.05
0.1
0.15
Err
or
Rate
Data: Error rate
2AFC
GNG
0
200
400
RT
(m
s)
Data: RT
left/nogo right/go0
0.05
0.1
0.15
0.2
Stimulus
Err
or
Rate
Model: Error rate
2AFC
GNG
0
5
10
15R
T (
steps)
Model: RT
A NoGo B Go B/Go
A NoGo B Go B/Go
• Rational decision-maker also exhibit the Go bias
• Go bias is a natural consequence of the extra time cost of No Go
• Go bias needs not imply cognitive/neural processing differences between 2AFC & GNG
(Shenoy & Yu, NIPS, 2012)
20 50 800
0.1
0.2
0.3
0.4
Err
or
rate
Data: Error rate
go nogo
20 50 800
100
200
300
400
RT
(m
s)
Data: RT
go FA
20 50 800
0.1
0.2
0.3
0.4
Err
or
rate
Model: Error rate
go nogo
20 50 800
5
10
RT
(tim
est
ep
s)
Model: RT
go FA
A B
CD
% Nogo trials % Nogo trials
% Nogo trials % Nogo trials
Model Accounts for NoGo Frequency Effects
• Prediction: Go bias depends on P(NG)
• ↑P(NG) ⇒ ↓FA, ↑RT
• Frequent NoGo trials diminishes temporal advantage of Go response
• Behavioral data confirm prediction
(Data from Neuwenhuis et al., 2007)
(Shenoy & Yu, NIPS, 2012)
• 2AFC: not significantly affected by deadline
• GNG: ↑ deadline ⇒ ↑Go bias (↑FA/hits, ↓RT)
Left Right0
0.02
0.04
0.06
0.08
0.1
Response
Err
or
rate
2AFC choice accuracy
EarlyLate
A
GNG 2AFC0
2
4
6
8
Task
RT
(tim
e s
tep
s)
Reaction times
Early
Late
Nogo Go0
0.02
0.04
0.06
0.08
0.1
Response
Err
or
rate
GNG choice accuracy
EarlyLate
B C
Model Prediction: Influence of Deadline(Shenoy & Yu, NIPS, 2012)
Outline
• Perceptual decision-making
• 2AFC vs. Go-Nogo
• Stop-signal task and inhibitory control
• Visual search / active sensing
go stimulus
(infrequent) stop signal
Time
Experimental Paradigm: Stop Signal Task
(from Emric et al., 2007)
stop-signal delay
⇑ stop-signal delay ➟ ⇑ stop errors
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Optimal Model: Inhibition Fn
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Race Approximation: Inhibition Fn
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c E
rro
r
Time (ms)
Data: Inhibition Function
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Optimal Model: Inhibition Fn
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Race Approximation: Inhibition Fn
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c E
rro
r
Time (ms)
Data: Inhibition Function
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
Pradeep Shenoy & Angela J. Yu University of California, San Diego
26
A. go and stop stimulus identities (d and s) inferred from sensory evidence, tracked throug the trial as belief states pd(t) & ps(t).
B. action selection policy π maps belief to a choice of going or waiting, so as to minimize overall expected cost. Stopping == repeated choice of wait action.
C. Example belief->action mapping for one time step.
Loss = c < t > +cs � P{stop error}+ P{go error}
� : (xt,yt) � {left, right, wait}� : (ptd, p
ts) � {left, right, wait}
As stop trial frequency (r) is increased, go RT increases, and subject makes fewer stop errors (A, C) (data from [5], rhesus macaque).
Model predicts these changes (B, D); Increased stop trial frequency leads to:✦ faster processing of stop signals.✦ trading off go RT and stop error costs (see loss fn).
Stimulus frequency influences stop-go tradeoff Inhibitory Control and Norepinephrine Norepinephrine = unexpected uncertainty = P(stop)
We propose NE reports unexpected uncertainty, i.e., uncertainty about task context, including stimulus/response/reward mappings). Since the stop signal represents a change in context, NE levels correspond to an internal estimate of P(stop).
✦ go RT and SSRT are traded off under the influence of atomoxetine (A), in a dose-dependent manner (C) (data from [7,8].
✦ atomoxetine increases NE levels, and, in our model, subjects’ internal P(stop).✦ Model predictions of going & stopping performance reproduce observed behavior
(B,D)
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Optimal Model: Inhibition Fn
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Race Approximation: Inhibition Fn
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c E
rro
r
Time (ms)
Data: Inhibition Function
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Optimal Model: Inhibition Fn
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rro
r
Race Approximation: Inhibition Fn
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c E
rro
r
Time (ms)
Data: Inhibition Function
r=0.1
r=0.34
r=0.51
r=0.69
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
SSD (ms)
Humans adjust optimally to sequential experience
A. Subjects sequentially estimate prior probability rt of upcoming stop trial✦α = probability of reset, similar to dynamic belief model (Yu/Cohen [6]).
B. Subjects’ internal sequential prior fluctuates with experienced trial history✦ can be inferred using model shown in (A).
C, D. Recovered P(stop) values strongly affect go RT, stop error rate. This approximately linear tradeoff matches optimal stop-go tradeoff (red line).(unpublished behavioral data from humans)
Race model approximation of optimal behavior
Neural activity in stop tasks suggests race-like neural mechanisms (see e.g., [9]).
We study a drift-diffusion implementation of the race model (A) as a possible mechanism underlying optimal decision-making.
The 4 free parameters were chosen to best approximate optimal model’s behavior.
Our model fits suggest: ✦ Optimal behavior can be approximated by race-like implementations✦ SSRT decreases as stop trials are more prevalent✦ (3) rate, threshold, offset change in systematic ways, testable with
recordings from stopping-related neural populations.
Inhibitory control is the ability to modify or withhold actions in response to changing environmental statistics and task demands
✦ deficits associated with brain disorders such as ADHD, substance abuse, OCD.✦ studied using paradigms such as the stop signal task, with behavioral, neural and pharmacological experiments.
Stop signal task [1]:✦ go action (e.g., 2AFC response) occasionally interrupted by a
stop signal, instructing cancelation of go response (A).✦ race model proposes a race between finishing times of
independent go and stop processes (B); postulates stop signal reaction time (SSRT)
✦ SSRT used as index of inhibitory ability (e.g., longer in ADHD [2]), and affected by atomoxetine (a norepinephrine reuptake inhibitor used in treating ADHD).
Goal: investigate role of NE in inhibitory control using our optimal decision-making model [3] for the stop signal task.
✦ Hypothesis: NE reports subjects’ internal unexpected uncertainty, i.e., uncertainty about the likelihood of a context-changing stop signal.
A. Stop Signal Task
B. Stop signal reaction time (race model)
Our model explains the influence of factors such as reward on stopping behavior [3], and fMRI correlates of stimulus anticipation in the task (Poster [4])
Optimal decision-making for the stop task
A. Generative model for sensory evidence B. Optimal action selection
C. Example policy at time t
minimized by
pd(t)
ps(t)
A. Data: Go RT cumulative dist. B. Model: Go RT cumulative dist.
C. Data: Stop error rates D. Model: Stop error rates
A. Sequential estimation of P(stop) B. Example estimation sequence
P (rk|sk) � P (sk|rk)((1� �)P (rk�1|sk�1) + �P0(r � k))
C. Behavioral data: Go RT D. Behavioral data: stop error rate
References Discussion & Conclusion
[1] Logan & Cowan (1984). Psych Review, 91 (3).[2] Lipszyc & Schachar (2010). J. Int Neurophys. Soc. 1(1).[3] Shenoy & Yu (2011). Frontiers Hum. Neurosci. 5(48).[4] Ide, Shenoy, Yu & Li (SfN 2011 Poster): Monday 11am (403.20/YY24).[5] Emeric et al. (2007). Vision Res. 47(1).[6] Yu & Cohen (2009). Advances in Neur Inf Proc Sys. 21.[7] Bari et al. (2009). Psychopharmacology 205(2).[8] Eagle et al. (2008). Psychopharmacology 199(3).
Check out our other posters! [4,10]✦ Optimal decision-making explains influence of stop trial frequency on inhibitory
control in the stop signal task.✦ Humans adjust behavior optimally on a trial-to-trial basis to account for changing
stimulus expectations.✦ Norepinephrine may represent subjects’ internal expectation of task-relevant stimuli in
the task, accounting for its effect on inhibitory control✦ We are exploring rational models for other inhibitory control tasks such as the Go-
nogo task (See poster on wednesday [10]), and connections between inhibitory ability measures in these tasks.
[9] Hanes, Patterson & Schall (1998). J. Neurophys. 79.[10] Shenoy & Yu (SfN 2011 Poster): Wednesday 2pm (931.01/WW57).
SSD (steps)
Stop? Go?
StopGo
Noisy
(Shenoy & Yu, Frontiers in Human Neuroscience, 2011)
Sensory Processing = Bayesian Inference
Action Selection = Sequential Decision-Making
L R L R L R
wait wait
(20 ms) (40 ms) (60 ms)
+: more accurate
-: more time
Decision Policy π: (x1, …, xt) ⇒ {go(L), go(R), wait}
(Shenoy & Yu, Frontiers in Human Neuroscience, 2011)
Model: Optimal Action Selection
time cost stop error(non-canceled)
go error(wrong response)
expectedcost
go error(deadline)
d: true targets: stop trial
δ: chosen targetτ: response time
D: deadliner: freq(stop trials)
L� = c��� + csrP (� <D|s=1) + (1�r)P (� <D, � �=d|s=0) + (1�r)P (� =D|s=0)
Policy: (x1, y1), …, (xt, yt) ⇒ {go(L), go(R), wait}
Objective: minimize expected (average) cost
Behavior cost function
(Shenoy & Yu, Frontiers in Human Neuroscience, 2011)
P(go
)
presentabsent
R
L
Go & Wait regions
wait
go (R)
go (L)
P(stop)
Optimal Decision PolicyMaps Belief State ⇒ Go & Wait Regions
Exact dynamic programming (discretized belief state)
Longer stop signal delay results in more errors
Data: error vs. SSD
Model-Data Comparison: Effects of SSD
Model: error vs. SSD
(from Emric et al., 2007)
(Shenoy & Yu, Frontiers in Human Neuroscience, 2011)
Data: (from Leotti & Wager, 2009)
Go Bias No Bias Stop Bias0
0.2
0.4
0.6
Fra
ctio
n
Stop Errors
Data
Go Bias No Bias Stop Bias300
350
400
450
Tim
e (
ms)
Go RT
Go Bias No Bias Stop Bias220
240
260
280
300
Tim
e (
ms)
SSRT
Low Med High0
0.5
Stop Cost
Fra
ctio
n
Opt
Race
Low Med High15
20
25
Stop Cost
Tim
e(s
tep
s)
Low Med High68
10121416
Stop Cost
Tim
e(s
tep
s)
Go Bias No Bias Stop Bias0
0.2
0.4
0.6F
ract
ion
Stop Errors
Data
Go Bias No Bias Stop Bias300
350
400
450
Tim
e (
ms)
Go RT
Go Bias No Bias Stop Bias220
240
260
280
300
Tim
e (
ms)
SSRT
Low Med High0
0.5
Stop Cost
Fra
ctio
n
Opt
Race
Low Med High15
20
25
Stop Cost
Tim
e(s
tep
s)
Low Med High68
10121416
Stop Cost
Tim
e(s
tep
s)
Model
Reward/Motivation ⇒ Stopping Behavior
Stop error rate Go reaction time Stopping Latency
(Shenoy & Yu, Frontiers in Human Neuroscience, 2011)
(Emeric et al., 2007)
Stimulus Frequency ⇒ Stopping Behavior
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Optimal Model: Inhibition Fn
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Race Approximation: Inhibition Fn
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 4000
0.5
1
Fra
c E
rror
Time (ms)
Data: Inhibition Function
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Optimal Model: Inhibition Fn
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Race Approximation: Inhibition Fn
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 4000
0.5
1
Fra
c E
rror
Time (ms)
Data: Inhibition Function
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Optimal Model: Inhibition Fn
0 5 10 15 20 250
0.5
1
Time (steps)F
rac
Err
or
Race Approximation: Inhibition Fn
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 4000
0.5
1
Fra
c E
rror
Time (ms)
Data: Inhibition Function
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
Cumulative GO reaction time Stop error rate
Model
Data
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Optimal Model: Inhibition Fn
0 5 10 15 20 250
0.5
1
Time (steps)
Fra
c E
rror
Race Approximation: Inhibition Fn
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Race Aproximation: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 4000
0.5
1
Fra
c E
rror
Time (ms)
Data: Inhibition Function
0.1 stop
0.34 stop
0.51 stop
0.69 stop
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
0.1 stop
0.34 stop
0.51 stop
0.69 stop
(Shenoy & Yu, NIPS, 2010)
Trial-to-Trial Expectation: Behavioral Data
0 10 20 300
1
Prior
P(s
top)
0 10 20 30−1
0
1
Pre
d E
rr
Trial
P(st
op)
Trials
Dynamic Belief Model (DBM)stop trials
go trials
Optimal Model: Inhibition Fn
100 200 300 400 5000
0.5
1
Fra
c E
rror
Time (ms)
Data: Inhibition Function
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 400
0.5
1
Time (steps)
Fra
c E
rror
Optimal Model: Inhibition Fn
r=0.1
r=0.34
r=0.51
r=0.69
Pradeep Shenoy & Angela J. Yu University of California, San Diego
26
A. go and stop stimulus identities (d and s) inferred from sensory evidence, tracked throug the trial as belief states pd(t) & ps(t).
B. action selection policy π maps belief to a choice of going or waiting, so as to minimize overall expected cost. Stopping == repeated choice of wait action.
C. Example belief->action mapping for one time step.
Loss = c < t > +cs � P{stop error}+ P{go error}
� : (xt,yt) � {left, right, wait}� : (ptd, p
ts) � {left, right, wait}
As stop trial frequency (r) is increased, go RT increases, and subject makes fewer stop errors (A, C) (data from [5], rhesus macaque).
Model predicts these changes (B, D); Increased stop trial frequency leads to:✦ faster processing of stop signals.✦ trading off go RT and stop error costs (see loss fn).
Stimulus frequency influences stop-go tradeoff Inhibitory Control and Norepinephrine Norepinephrine = unexpected uncertainty = P(stop)
We propose NE reports unexpected uncertainty, i.e., uncertainty about task context, including stimulus/response/reward mappings). Since the stop signal represents a change in context, NE levels correspond to an internal estimate of P(stop).
✦ go RT and SSRT are traded off under the influence of atomoxetine (A), in a dose-dependent manner (C) (data from [7,8].
✦ atomoxetine increases NE levels, and, in our model, subjects’ internal P(stop).✦ Model predictions of going & stopping performance reproduce observed behavior
(B,D)
100 200 300 400 5000
0.5
1
Fra
c R
T
Time (ms)
Data: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
0 10 20 30 40 500
0.5
1
Time (steps)
Fra
c R
T
Optimal Model: RT dist
r=0.1
r=0.34
r=0.51
r=0.69
SSD (ms)
Humans adjust optimally to sequential experience
A. Subjects sequentially estimate prior probability rt of upcoming stop trial✦α = probability of reset, similar to dynamic belief model (Yu/Cohen [6]).
B. Subjects’ internal sequential prior fluctuates with experienced trial history✦ can be inferred using model shown in (A).
C, D. Recovered P(stop) values strongly affect go RT, stop error rate. This approximately linear tradeoff matches optimal stop-go tradeoff (red line).(unpublished behavioral data from humans)
Race model approximation of optimal behavior
Neural activity in stop tasks suggests race-like neural mechanisms (see e.g., [9]).
We study a drift-diffusion implementation of the race model (A) as a possible mechanism underlying optimal decision-making.
The 4 free parameters were chosen to best approximate optimal model’s behavior.
Our model fits suggest: ✦ Optimal behavior can be approximated by race-like implementations✦ SSRT decreases as stop trials are more prevalent✦ (3) rate, threshold, offset change in systematic ways, testable with
recordings from stopping-related neural populations.
Inhibitory control is the ability to modify or withhold actions in response to changing environmental statistics and task demands
✦ deficits associated with brain disorders such as ADHD, substance abuse, OCD.✦ studied using paradigms such as the stop signal task, with behavioral, neural and pharmacological experiments.
Stop signal task [1]:✦ go action (e.g., 2AFC response) occasionally interrupted by a
stop signal, instructing cancelation of go response (A).✦ race model proposes a race between finishing times of
independent go and stop processes (B); postulates stop signal reaction time (SSRT)
✦ SSRT used as index of inhibitory ability (e.g., longer in ADHD [2]), and affected by atomoxetine (a norepinephrine reuptake inhibitor used in treating ADHD).
Goal: investigate role of NE in inhibitory control using our optimal decision-making model [3] for the stop signal task.
✦ Hypothesis: NE reports subjects’ internal unexpected uncertainty, i.e., uncertainty about the likelihood of a context-changing stop signal.
A. Stop Signal Task
B. Stop signal reaction time (race model)
Our model explains the influence of factors such as reward on stopping behavior [3], and fMRI correlates of stimulus anticipation in the task (Poster [4])
Optimal decision-making for the stop task
A. Generative model for sensory evidence B. Optimal action selection
C. Example policy at time t
minimized by
pd(t)
ps(t)
A. Data: Go RT cumulative dist. B. Model: Go RT cumulative dist.
C. Data: Stop error rates D. Model: Stop error rates
A. Sequential estimation of P(stop) B. Example estimation sequence
P (rk|sk) � P (sk|rk)((1� �)P (rk�1|sk�1) + �P0(r � k))
C. Behavioral data: Go RT D. Behavioral data: stop error rate
References Discussion & Conclusion
[1] Logan & Cowan (1984). Psych Review, 91 (3).[2] Lipszyc & Schachar (2010). J. Int Neurophys. Soc. 1(1).[3] Shenoy & Yu (2011). Frontiers Hum. Neurosci. 5(48).[4] Ide, Shenoy, Yu & Li (SfN 2011 Poster): Monday 11am (403.20/YY24).[5] Emeric et al. (2007). Vision Res. 47(1).[6] Yu & Cohen (2009). Advances in Neur Inf Proc Sys. 21.[7] Bari et al. (2009). Psychopharmacology 205(2).[8] Eagle et al. (2008). Psychopharmacology 199(3).
Check out our other posters! [4,10]✦ Optimal decision-making explains influence of stop trial frequency on inhibitory
control in the stop signal task.✦ Humans adjust behavior optimally on a trial-to-trial basis to account for changing
stimulus expectations.✦ Norepinephrine may represent subjects’ internal expectation of task-relevant stimuli in
the task, accounting for its effect on inhibitory control✦ We are exploring rational models for other inhibitory control tasks such as the Go-
nogo task (See poster on wednesday [10]), and connections between inhibitory ability measures in these tasks.
[9] Hanes, Patterson & Schall (1998). J. Neurophys. 79.[10] Shenoy & Yu (SfN 2011 Poster): Wednesday 2pm (931.01/WW57).
SSD (steps)
Go RT vs. Stop Expectancy Stop ER vs. Stop Expectancy
(Ide, Shenoy, Yu*, & Li*, J. Neuroscience, 2013)
(Yu & Cohen, NIPS, 2009)
0 10 20 300
1
Prio
r P
(sto
p)
0 10 20 30−1
0
1
Pre
d E
rrTrial
P(st
op)
Pred
Err
Trials
Sample Sequence
pre-SMA encodes Bayesian posterior prediction: P(stop|observation)
3.9 6
T-value
low med highP(Stop)
low med highP(Stop)
P(st
op)-m
ean
Bayesian Model preSMA fMRI signal
(Ide, Shenoy, Yu*, & Li*, J. Neuroscience, 2013)Neural Encoding of Probablistic Prediction
0 10 20 300
1
Prio
r P
(sto
p)
0 10 20 30−1
0
1
Pre
d E
rrTrial
P(st
op)
Pred
Err
Trials
Sample Sequence
6.2
5.1
T-value
dACC encodes Bayesian surprise: |outcome-P(stop)|
0low med high
P(Stop)low med high
P(Stop)
Bay
esia
n su
rpris
e
low med highP(Stop)
low med highP(Stop)
PSC
Go StopBayesian Model dACC fMRI signal
Go Stop
Neural Encoding of Stimulus Prediction Error(Ide, Shenoy, Yu*, & Li*, J. Neuroscience, 2013)
Neural Alterations Due to Stimulant Use(Harlé, Shenoy, Stewart, Tapert, Yu*, & Paulus*, 2013, under review)
• Less PE response in dACC in OSU
• Deficit correlated with lifetime cocaine use
• Despite similar behavioral adjustments in OSU & CS
Outline
• Perceptual decision-making
• 2AFC vs. Go-Nogo
• Stop-signal task and inhibitory control
• Visual search / active sensing
a b
TARGET9
31
1:3:
9
1:1:1
Search time (ms)c
LocationC
hoic
e (fr
actio
n)
Fixation distribution (1:3:9)d
1 3 90
0.2
0.4
0.6
0.8
1st fixation2nd fixation
Student Version of MATLAB
0.7 0.8 0.9 10.7
0.75
0.8
0.85
0.9
0.95
1
Student Version of MATLAB
1:1:1
1:3:
9
Accuracy
800 1000 1200 1400 1600 1800
800
1000
1200
1400
1600
1800
Student Version of MATLABAt each time step, subject has 3 possible actions:Stay: keep observing current patchSwitch: change to another locationRespond: choose current patch as the target
Active Vision (Search) & Prior Expectation(Yu, Huang, Shenoy, & Schultz, 2013, under review)
Bayesian Inference & Decision-Making(Ahmad & Yu, UAI, 2013)
Bayesian inference: data ⇒ P(target location|data), Pt
Decision policy π: Pt → {respond, fixate l1, fixate l2, ....}
Loss function:
Optimal policy minimizes expected cost:
Context-Dependent Active Controller (C-DAC)
Algorithm: exact DP; RBF- and GP-approx. of Q-factors
C-DAC Decision Policy
• Blue: stop, green: fixate location 1, orange: fixate location 2, maroon: fixate location 3
• Sensitive to behavioral params: time cost, SNR, switch cost
• Produces fixation location as well as duration
• Outperforms greedy MAP (Najemnik & Geisler, 2005)and InfoMax (Butko & Movellan, 2010) in speed + accuracy
(Ahmad & Yu, UAI, 2013)
• Spatial statistics affect data interpretation and collection in both human subjects and C-DAC model• More likely to identify 9-loc as target than 1&3 (hits & FAs)• Faster to identify 9-loc as target but slower if it’s distractor
9 1&3 9 1&3 T D
Spatial Prior: C-DAC & Human Behavior(Ahmad, Huang, & Yu, 2013, under reivew)
Bonus Application
• Human active learning
Multi-Armed Bandit Problem(Zhang & Yu, Cogsci, 2013)
• Subjects pulls an arm on each trial (win/lose); 20 trials
• Exploration vs. exploitation trade-off
• Optimal policy (DP) is computational intense
• Heuristic policies✴ win-stay-lose-shift✴ ε-greedy✴ ε-infomax✴ “knowledge gradient”
WSLS eG Optimal eINFO KG0.4
0.5
0.6
0.7
0.8
0.9
Ag
ree
me
nt
with
Pe
op
le
Basic LearnMeta Learn
WSLS eG eINFO KG0.4
0.5
0.6
0.7
0.8
0.9
1
Mo
de
l Ag
ree
me
nt
with
Op
tima
l
Basic LearnMeta Learn
vKG, tk = E
max
k0ˆ✓t+1k0 |Dt
= k, Bt
��max
k0ˆ✓tk0
DKG, t,g= argmax
kˆ✓t,gk + (T � t� 1) vKG, t, g
k
KG: Myopic Approximation to Optimal Bayes(Zhang & Yu, Cogsci, 2013)
Incremental value of one exploitation sample
Horizon determines value of exploitation vs. exploration
KG best account of human choice KG closest to optimal policy
Summary
• A unified framework for cognitive control: Bayesian inference + control (sequential decision-making)
• Explains
✴ speed-accuracy trade-off in perception✴ go bias in GNG compared to 2AFC
✴ effects of difficulty/reward/prior in inhibitory control✴ dynamics of visual search; influence of spatial prior
✴ exploration-exploitation tradeoff in active learning
• Guides neural data analysis and experimental design
Relevance to B.R.A.I.N. InitiativeCognitive models guide neural data analysis and experimental designDavid Marr (1969): Three Levels of Analysis
Algorithm (how?)
Implementation (what?)
• goals of computation• why things work the way they do
• representation of input/output• how one is transformed into the other
• physical realization of the computations• neural representation and dynamics
Computation (why?)
b
a0Q
t
speed, accuracy, opportunity cost
6.2
5.1
T-value
• Yu Lab✤ Pradeep Shenoy
✤ Sheeraz Ahmad
✤ Crane Huang
✤ Joseph Schilz
• Collaborators✤ Peter Frazier, Savas Dayanik
✤ Jaime Ide, Chiang-Shan Li, Katie Harlé, Martin Paulus
• You all
Thanks to...