cognitive processing as inference + control (sequential...

Cognitive Processing as

Inference + Control (Sequential DM)

Angela J. Yu

Dept. of Cognitive Science

University of California, San Diego

Information Processing as Bayesian Inference

• Perception(Ernst & Banks, 2002; Kersten & Yuille, 2003; Battaglia et al, 2003)

• Attentional selection(Dayan & Zemel, 1999; Yu & Dayan, 2005; Yu, Dayan, & Cohen, 2009;

Whiteley & Sahani, 2012)

• Sensorimotor Learning(Körding & Wolpert, 2004)

• Temporal sequence learning(Yu & Dayan, 2005; Jones, Mozer, & Kinoshita, 2009; Behrens et al, 2007;

Nassar et al, 2012)

Behavior: Beyond Info Processing

• Which option to take?

• When to make a decision?

• Where to acquire data?

Bayesian inference

X• efficient representation

of information

• normative means of combining information

We need information control

• Perceptual decision-making

• 2AFC vs. Go-NoGo

• Stop-signal task and inhibitory control

• Visual search / active sensing

Outline

Outline





(Newsome, Britten, & Movshon, 1989)

Poster Child: Perceptual Decision-MakingRandom-dot Coherent Motion Task

(Roitman & Shalden, 2002)

Drift-Diffusion Model

(Smith & Ratcliff, 2004)

Diffusion Model ⇔ Neurobiology

LIP Neural Response

Diffusion Model ⇔ Behavioral Data

Model: harder ⇒• slower• more errors boundary

Accuracy vs. Coherence <RT> vs. Coherence

easyhard

(Roitman & Shalden, 2002)

xt�1 xt+1xt

p(s; �)

p(x|s;�)

p(xt|s;�) =t�

i=1

p(xi|s;�)

Bayesian (Sequential) Statistical Inference

•••

shidden variable

(e.g. stimulus properties)

Generative Model

prior

likelihood

independent/iid noise

Pt =f1(xt)Pt�1

ZtPt+1 =

f1(xt+1)Pt

Zt+1Pt+2 =

f1(xt+2)Pt+1

Zt+2P0 =p(s=1)

Pt := P (s = 1|xt) � p(xt|s = 1)P (s = 1|xt�1)

Perceptual Inference: Evidence Accumulation

Iterative Application of Bayes’ Rule

Inference about hidden states based on data

a1 a2 a1 a2 a1 a2

wait wait

Decision problem: produce not only choice d but data size τ

Information Control: How Much Data?

+: more accurate

-: more time

Sequential Decisions: Bayes Risk Minimization

Decision policy: � : xt �� {L,R, cont}

L/R

continue

L/R

continue

L/R

Loss function (Bayes risk): L(�) = cE(�) + P (d �= s)

Optimal policy: minimizes total expected loss: delay and errors c determines trade-offAlgorithm: Bellman’s Dynamic Programming Principle (1952); slow

V (Pt) = min(Qs(Pt), Qc(Pt)) = min(Qs(Pt), c + E[V (Pt+1)|Pt]Pt+1(xt+1)

Wald & Wolfowitz (1948): hypothesis 1 (left) vs. hypothesis 2 (right)

Optimal policy• accumulate evidence over time: Pr(left) versus Pr(right)• stop if total evidence exceeds “left” or “right” boundary

“left” boundary

“right” boundary

Evidencetrial 1

trial 2

Optimal Decision-Making in 2AFC Task

Outline




• Visual search

2-Alternative Forced Choice vs Go/NoGo

left right

2AFC

go nogo

Go/NoGo

no/nogo yes/go0

0.05

0.1

0.15

Err

or

Rate

Data: Error rate

2AFC

GNG

0

200

400

RT

(m

s)

Data: RT

left/nogo right/go0

0.05

0.1

0.15

0.2

Stimulus

Err

or

Rate

Model: Error rate

2AFC

GNG

0

5

10

15

RT

(st

eps)

Model: RT

(Data from Bacon-Mace et al., 2007)A NoGo B Go B/Go

• Subjects show a “go bias”: ⇑FA/hits & ⇓RT in GNG

• Does this imply fundamentally different neural/cognitive processing?

Experimental Design

Rational Inference & Decision-Making

• shared (Bayesian) inference process & loss function

• asymmetric cost in GNG: Go terminates trial, NoGo does not

G

!t+1(bt+1)

G

wait!t(bt)

xt xt+1

wait

L R

!t+1(bt+1)

L R

wait!t(bt)

xt xt+1

waitd stimulus = {A, B}

x1 xt... Evidence

Sensory processing Action selection

2AFC

GN

G

A B A B

go go

(Shenoy & Yu, NIPS, 2012)

Loss function (Bayes risk): L(�) = cE(�) + P (d �= s)

Optimal Decision Threshold and Go Bias

• Optimal thresholds for 2AFC constant, collapsing just before deadline (Frazier & Yu, 2007)

• Optimal threshold for GNG time-varying✴ lower at trial start - represents opportunity cost of waiting✴ earlier Go responses: more hits and false alarms

no/nogo yes/go0

0.05

0.1

0.15

Err

or

Rate

Data: Error rate

2AFC

GNG

0

200

400

RT

(m

s)

Data: RT

left/nogo right/go0

0.05

0.1

0.15

0.2

Stimulus

Err

or

Rate

Model: Error rate

2AFC

GNG

0

5

10

15

RT

(st

eps)

Model: RT

0 20 400

0.5

1

Time

Belief

Decision thresholdA B Model: Error rate C Model: RTOptimal decision policy

2AFC GNG

(Frazier & Yu, NIPS, 2008)


Model Reproduces Go Bias

no/nogo yes/go0

0.05

0.1

0.15

Err

or

Ra

te

Data: Error rate

2AFC

GNG

0

200

400

RT

(m

s)

Data: RT

left/nogo right/go0

0.05

0.1

0.15

0.2

Stimulus

Err

or

Ra

te

Model: Error rate

2AFC

GNG

0

5

10

15

RT

(st

ep

s)

Model: RT

no/nogo yes/go0

0.05

0.1

0.15

Err

or

Rate

Data: Error rate

2AFC

GNG

0

200

400

RT

(m

s)

Data: RT

left/nogo right/go0

0.05

0.1

0.15

0.2

Stimulus

Err

or

Rate

Model: Error rate

2AFC

GNG

0

5

10

15R

T (

steps)

Model: RT

A NoGo B Go B/Go

A NoGo B Go B/Go

• Rational decision-maker also exhibit the Go bias

• Go bias is a natural consequence of the extra time cost of No Go

• Go bias needs not imply cognitive/neural processing differences between 2AFC & GNG


20 50 800

0.1

0.2

0.3

0.4

Err

or

rate

Data: Error rate

go nogo

20 50 800

100

200

300

400

RT

(m

s)

Data: RT

go FA

20 50 800

0.1

0.2

0.3

0.4

Err

or

rate

Model: Error rate

go nogo

20 50 800

5

10

RT

(tim

est

ep

s)

Model: RT

go FA

A B

CD

% Nogo trials % Nogo trials

% Nogo trials % Nogo trials

Model Accounts for NoGo Frequency Effects

• Prediction: Go bias depends on P(NG)

• ↑P(NG) ⇒ ↓FA, ↑RT

• Frequent NoGo trials diminishes temporal advantage of Go response

• Behavioral data confirm prediction

(Data from Neuwenhuis et al., 2007)


• 2AFC: not significantly affected by deadline

• GNG: ↑ deadline ⇒ ↑Go bias (↑FA/hits, ↓RT)

Left Right0

0.02

0.04

0.06

0.08

0.1

Response

Err

or

rate

2AFC choice accuracy

EarlyLate

A

GNG 2AFC0

2

4

6

8

Task

RT

(tim

e s

tep

s)

Reaction times

Early

Late

Nogo Go0

0.02

0.04

0.06

0.08

0.1

Response

Err

or

rate

GNG choice accuracy

EarlyLate

B C

Model Prediction: Influence of Deadline(Shenoy & Yu, NIPS, 2012)

Outline


• 2AFC vs. Go-Nogo



go stimulus

(infrequent) stop signal

Time

Experimental Paradigm: Stop Signal Task

(from Emric et al., 2007)

stop-signal delay

⇑ stop-signal delay ➟ ⇑ stop errors

0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r

Optimal Model: Inhibition Fn

0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r

Race Approximation: Inhibition Fn

r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T

Optimal Model: RT dist

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T

Race Aproximation: RT dist

r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c E

rro

r

Time (ms)

Data: Inhibition Function

r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r


0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r


r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c E

rro

r

Time (ms)


r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

r=0.1

r=0.34

r=0.51

r=0.69

Pradeep Shenoy & Angela J. Yu University of California, San Diego

26

A. go and stop stimulus identities (d and s) inferred from sensory evidence, tracked throug the trial as belief states pd(t) & ps(t).

B. action selection policy π maps belief to a choice of going or waiting, so as to minimize overall expected cost. Stopping == repeated choice of wait action.

C. Example belief->action mapping for one time step.

Loss = c < t > +cs � P{stop error}+ P{go error}

� : (xt,yt) � {left, right, wait}� : (ptd, p

ts) � {left, right, wait}

As stop trial frequency (r) is increased, go RT increases, and subject makes fewer stop errors (A, C) (data from [5], rhesus macaque).

Model predicts these changes (B, D); Increased stop trial frequency leads to:✦ faster processing of stop signals.✦ trading off go RT and stop error costs (see loss fn).

Stimulus frequency influences stop-go tradeoff Inhibitory Control and Norepinephrine Norepinephrine = unexpected uncertainty = P(stop)

We propose NE reports unexpected uncertainty, i.e., uncertainty about task context, including stimulus/response/reward mappings). Since the stop signal represents a change in context, NE levels correspond to an internal estimate of P(stop).

✦ go RT and SSRT are traded off under the influence of atomoxetine (A), in a dose-dependent manner (C) (data from [7,8].

✦ atomoxetine increases NE levels, and, in our model, subjects’ internal P(stop).✦ Model predictions of going & stopping performance reproduce observed behavior

(B,D)

0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r


0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r


r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c E

rro

r

Time (ms)


r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r


0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rro

r


r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


r=0.1

r=0.34

r=0.51

r=0.69

r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c E

rro

r

Time (ms)


r=0.1

r=0.34

r=0.51

r=0.69

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

r=0.1

r=0.34

r=0.51

r=0.69

SSD (ms)

Humans adjust optimally to sequential experience

A. Subjects sequentially estimate prior probability rt of upcoming stop trial✦α = probability of reset, similar to dynamic belief model (Yu/Cohen [6]).

B. Subjects’ internal sequential prior fluctuates with experienced trial history✦ can be inferred using model shown in (A).

C, D. Recovered P(stop) values strongly affect go RT, stop error rate. This approximately linear tradeoff matches optimal stop-go tradeoff (red line).(unpublished behavioral data from humans)

Race model approximation of optimal behavior

Neural activity in stop tasks suggests race-like neural mechanisms (see e.g., [9]).

We study a drift-diffusion implementation of the race model (A) as a possible mechanism underlying optimal decision-making.

The 4 free parameters were chosen to best approximate optimal model’s behavior.

Our model fits suggest: ✦ Optimal behavior can be approximated by race-like implementations✦ SSRT decreases as stop trials are more prevalent✦ (3) rate, threshold, offset change in systematic ways, testable with

recordings from stopping-related neural populations.

Inhibitory control is the ability to modify or withhold actions in response to changing environmental statistics and task demands

✦ deficits associated with brain disorders such as ADHD, substance abuse, OCD.✦ studied using paradigms such as the stop signal task, with behavioral, neural and pharmacological experiments.

Stop signal task [1]:✦ go action (e.g., 2AFC response) occasionally interrupted by a

stop signal, instructing cancelation of go response (A).✦ race model proposes a race between finishing times of

independent go and stop processes (B); postulates stop signal reaction time (SSRT)

✦ SSRT used as index of inhibitory ability (e.g., longer in ADHD [2]), and affected by atomoxetine (a norepinephrine reuptake inhibitor used in treating ADHD).

Goal: investigate role of NE in inhibitory control using our optimal decision-making model [3] for the stop signal task.

✦ Hypothesis: NE reports subjects’ internal unexpected uncertainty, i.e., uncertainty about the likelihood of a context-changing stop signal.

A. Stop Signal Task

B. Stop signal reaction time (race model)

Our model explains the influence of factors such as reward on stopping behavior [3], and fMRI correlates of stimulus anticipation in the task (Poster [4])

Optimal decision-making for the stop task

A. Generative model for sensory evidence B. Optimal action selection

C. Example policy at time t

minimized by

pd(t)

ps(t)

A. Data: Go RT cumulative dist. B. Model: Go RT cumulative dist.

C. Data: Stop error rates D. Model: Stop error rates

A. Sequential estimation of P(stop) B. Example estimation sequence

P (rk|sk) � P (sk|rk)((1� �)P (rk�1|sk�1) + �P0(r � k))

C. Behavioral data: Go RT D. Behavioral data: stop error rate

References Discussion & Conclusion

[1] Logan & Cowan (1984). Psych Review, 91 (3).[2] Lipszyc & Schachar (2010). J. Int Neurophys. Soc. 1(1).[3] Shenoy & Yu (2011). Frontiers Hum. Neurosci. 5(48).[4] Ide, Shenoy, Yu & Li (SfN 2011 Poster): Monday 11am (403.20/YY24).[5] Emeric et al. (2007). Vision Res. 47(1).[6] Yu & Cohen (2009). Advances in Neur Inf Proc Sys. 21.[7] Bari et al. (2009). Psychopharmacology 205(2).[8] Eagle et al. (2008). Psychopharmacology 199(3).

Check out our other posters! [4,10]✦ Optimal decision-making explains influence of stop trial frequency on inhibitory

control in the stop signal task.✦ Humans adjust behavior optimally on a trial-to-trial basis to account for changing

stimulus expectations.✦ Norepinephrine may represent subjects’ internal expectation of task-relevant stimuli in

the task, accounting for its effect on inhibitory control✦ We are exploring rational models for other inhibitory control tasks such as the Go-

nogo task (See poster on wednesday [10]), and connections between inhibitory ability measures in these tasks.

[9] Hanes, Patterson & Schall (1998). J. Neurophys. 79.[10] Shenoy & Yu (SfN 2011 Poster): Wednesday 2pm (931.01/WW57).

SSD (steps)

Stop? Go?

StopGo

Noisy

(Shenoy & Yu, Frontiers in Human Neuroscience, 2011)

Sensory Processing = Bayesian Inference

Action Selection = Sequential Decision-Making

L R L R L R

wait wait

(20 ms) (40 ms) (60 ms)

+: more accurate

-: more time

Decision Policy π: (x1, …, xt) ⇒ {go(L), go(R), wait}


Model: Optimal Action Selection

time cost stop error(non-canceled)

go error(wrong response)

expectedcost

go error(deadline)

d: true targets: stop trial

δ: chosen targetτ: response time

D: deadliner: freq(stop trials)

L� = c�� + csrP (� <D|s=1) + (1�r)P (� <D, � �=d|s=0) + (1�r)P (� =D|s=0)

Policy: (x1, y1), …, (xt, yt) ⇒ {go(L), go(R), wait}

Objective: minimize expected (average) cost

Behavior cost function


P(go

)

presentabsent

R

L

Go & Wait regions

wait

go (R)

go (L)

P(stop)

Optimal Decision PolicyMaps Belief State ⇒ Go & Wait Regions

Exact dynamic programming (discretized belief state)

Longer stop signal delay results in more errors

Data: error vs. SSD

Model-Data Comparison: Effects of SSD

Model: error vs. SSD

(from Emric et al., 2007)


Data: (from Leotti & Wager, 2009)

Go Bias No Bias Stop Bias0

0.2

0.4

0.6

Fra

ctio

n

Stop Errors

Data


350

400

450

Tim

e (

ms)

Go RT


240

260

280

300

Tim

e (

ms)

SSRT

Low Med High0

0.5

Stop Cost

Fra

ctio

n

Opt

Race

Low Med High15

20

25

Stop Cost

Tim

e(s

tep

s)

Low Med High68

10121416

Stop Cost

Tim

e(s

tep

s)


0.2

0.4

0.6F

ract

ion

Stop Errors

Data


350

400

450

Tim

e (

ms)

Go RT


240

260

280

300

Tim

e (

ms)

SSRT

Low Med High0

0.5

Stop Cost

Fra

ctio

n

Opt

Race

Low Med High15

20

25

Stop Cost

Tim

e(s

tep

s)

Low Med High68

10121416

Stop Cost

Tim

e(s

tep

s)

Model

Reward/Motivation ⇒ Stopping Behavior

Stop error rate Go reaction time Stopping Latency


(Emeric et al., 2007)

Stimulus Frequency ⇒ Stopping Behavior

0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 4000

0.5

1

Fra

c E

rror

Time (ms)


0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

0.1 stop

0.34 stop

0.51 stop

0.69 stop

0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 4000

0.5

1

Fra

c E

rror

Time (ms)


0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

0.1 stop

0.34 stop

0.51 stop

0.69 stop

0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0 5 10 15 20 250

0.5

1

Time (steps)F

rac

Err

or


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 4000

0.5

1

Fra

c E

rror

Time (ms)


0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

0.1 stop

0.34 stop

0.51 stop

0.69 stop

Cumulative GO reaction time Stop error rate

Model

Data

0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0 5 10 15 20 250

0.5

1

Time (steps)

Fra

c E

rror


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


0.1 stop

0.34 stop

0.51 stop

0.69 stop

0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 4000

0.5

1

Fra

c E

rror

Time (ms)


0.1 stop

0.34 stop

0.51 stop

0.69 stop

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

0.1 stop

0.34 stop

0.51 stop

0.69 stop


Trial-to-Trial Expectation: Behavioral Data

0 10 20 300

1

Prior

P(s

top)

0 10 20 30−1

0

1

Pre

d E

rr

Trial

P(st

op)

Trials

Dynamic Belief Model (DBM)stop trials

go trials


100 200 300 400 5000

0.5

1

Fra

c E

rror

Time (ms)


r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 400

0.5

1

Time (steps)

Fra

c E

rror


r=0.1

r=0.34

r=0.51

r=0.69

Pradeep Shenoy & Angela J. Yu University of California, San Diego

26

A. go and stop stimulus identities (d and s) inferred from sensory evidence, tracked throug the trial as belief states pd(t) & ps(t).

B. action selection policy π maps belief to a choice of going or waiting, so as to minimize overall expected cost. Stopping == repeated choice of wait action.

C. Example belief->action mapping for one time step.

Loss = c < t > +cs � P{stop error}+ P{go error}

� : (xt,yt) � {left, right, wait}� : (ptd, p

ts) � {left, right, wait}

As stop trial frequency (r) is increased, go RT increases, and subject makes fewer stop errors (A, C) (data from [5], rhesus macaque).

Model predicts these changes (B, D); Increased stop trial frequency leads to:✦ faster processing of stop signals.✦ trading off go RT and stop error costs (see loss fn).

Stimulus frequency influences stop-go tradeoff Inhibitory Control and Norepinephrine Norepinephrine = unexpected uncertainty = P(stop)

We propose NE reports unexpected uncertainty, i.e., uncertainty about task context, including stimulus/response/reward mappings). Since the stop signal represents a change in context, NE levels correspond to an internal estimate of P(stop).

✦ go RT and SSRT are traded off under the influence of atomoxetine (A), in a dose-dependent manner (C) (data from [7,8].

✦ atomoxetine increases NE levels, and, in our model, subjects’ internal P(stop).✦ Model predictions of going & stopping performance reproduce observed behavior

(B,D)

100 200 300 400 5000

0.5

1

Fra

c R

T

Time (ms)

Data: RT dist

r=0.1

r=0.34

r=0.51

r=0.69

0 10 20 30 40 500

0.5

1

Time (steps)

Fra

c R

T


r=0.1

r=0.34

r=0.51

r=0.69

SSD (ms)

Humans adjust optimally to sequential experience

A. Subjects sequentially estimate prior probability rt of upcoming stop trial✦α = probability of reset, similar to dynamic belief model (Yu/Cohen [6]).

B. Subjects’ internal sequential prior fluctuates with experienced trial history✦ can be inferred using model shown in (A).

C, D. Recovered P(stop) values strongly affect go RT, stop error rate. This approximately linear tradeoff matches optimal stop-go tradeoff (red line).(unpublished behavioral data from humans)

Race model approximation of optimal behavior

Neural activity in stop tasks suggests race-like neural mechanisms (see e.g., [9]).

We study a drift-diffusion implementation of the race model (A) as a possible mechanism underlying optimal decision-making.

The 4 free parameters were chosen to best approximate optimal model’s behavior.

Our model fits suggest: ✦ Optimal behavior can be approximated by race-like implementations✦ SSRT decreases as stop trials are more prevalent✦ (3) rate, threshold, offset change in systematic ways, testable with

recordings from stopping-related neural populations.

Inhibitory control is the ability to modify or withhold actions in response to changing environmental statistics and task demands

✦ deficits associated with brain disorders such as ADHD, substance abuse, OCD.✦ studied using paradigms such as the stop signal task, with behavioral, neural and pharmacological experiments.

Stop signal task [1]:✦ go action (e.g., 2AFC response) occasionally interrupted by a

stop signal, instructing cancelation of go response (A).✦ race model proposes a race between finishing times of

independent go and stop processes (B); postulates stop signal reaction time (SSRT)

✦ SSRT used as index of inhibitory ability (e.g., longer in ADHD [2]), and affected by atomoxetine (a norepinephrine reuptake inhibitor used in treating ADHD).

Goal: investigate role of NE in inhibitory control using our optimal decision-making model [3] for the stop signal task.

✦ Hypothesis: NE reports subjects’ internal unexpected uncertainty, i.e., uncertainty about the likelihood of a context-changing stop signal.

A. Stop Signal Task

B. Stop signal reaction time (race model)

Our model explains the influence of factors such as reward on stopping behavior [3], and fMRI correlates of stimulus anticipation in the task (Poster [4])

Optimal decision-making for the stop task

A. Generative model for sensory evidence B. Optimal action selection

C. Example policy at time t

minimized by

pd(t)

ps(t)

A. Data: Go RT cumulative dist. B. Model: Go RT cumulative dist.

C. Data: Stop error rates D. Model: Stop error rates

A. Sequential estimation of P(stop) B. Example estimation sequence

P (rk|sk) � P (sk|rk)((1� �)P (rk�1|sk�1) + �P0(r � k))

C. Behavioral data: Go RT D. Behavioral data: stop error rate

References Discussion & Conclusion

[1] Logan & Cowan (1984). Psych Review, 91 (3).[2] Lipszyc & Schachar (2010). J. Int Neurophys. Soc. 1(1).[3] Shenoy & Yu (2011). Frontiers Hum. Neurosci. 5(48).[4] Ide, Shenoy, Yu & Li (SfN 2011 Poster): Monday 11am (403.20/YY24).[5] Emeric et al. (2007). Vision Res. 47(1).[6] Yu & Cohen (2009). Advances in Neur Inf Proc Sys. 21.[7] Bari et al. (2009). Psychopharmacology 205(2).[8] Eagle et al. (2008). Psychopharmacology 199(3).

Check out our other posters! [4,10]✦ Optimal decision-making explains influence of stop trial frequency on inhibitory

control in the stop signal task.✦ Humans adjust behavior optimally on a trial-to-trial basis to account for changing

stimulus expectations.✦ Norepinephrine may represent subjects’ internal expectation of task-relevant stimuli in

the task, accounting for its effect on inhibitory control✦ We are exploring rational models for other inhibitory control tasks such as the Go-

nogo task (See poster on wednesday [10]), and connections between inhibitory ability measures in these tasks.

[9] Hanes, Patterson & Schall (1998). J. Neurophys. 79.[10] Shenoy & Yu (SfN 2011 Poster): Wednesday 2pm (931.01/WW57).

SSD (steps)

Go RT vs. Stop Expectancy Stop ER vs. Stop Expectancy

(Ide, Shenoy, Yu*, & Li*, J. Neuroscience, 2013)

(Yu & Cohen, NIPS, 2009)

0 10 20 300

1

Prio

r P

(sto

p)

0 10 20 30−1

0

1

Pre

d E

rrTrial

P(st

op)

Pred

Err

Trials

Sample Sequence

pre-SMA encodes Bayesian posterior prediction: P(stop|observation)

3.9 6

T-value

low med highP(Stop)

low med highP(Stop)

P(st

op)-m

ean

Bayesian Model preSMA fMRI signal

(Ide, Shenoy, Yu*, & Li*, J. Neuroscience, 2013)Neural Encoding of Probablistic Prediction

0 10 20 300

1

Prio

r P

(sto

p)

0 10 20 30−1

0

1

Pre

d E

rrTrial

P(st

op)

Pred

Err

Trials

Sample Sequence

6.2

5.1

T-value

dACC encodes Bayesian surprise: |outcome-P(stop)|

0low med high

P(Stop)low med high

P(Stop)

Bay

esia

n su

rpris

e

low med highP(Stop)

low med highP(Stop)

PSC

Go StopBayesian Model dACC fMRI signal

Go Stop

Neural Encoding of Stimulus Prediction Error(Ide, Shenoy, Yu*, & Li*, J. Neuroscience, 2013)

Neural Alterations Due to Stimulant Use(Harlé, Shenoy, Stewart, Tapert, Yu*, & Paulus*, 2013, under review)

• Less PE response in dACC in OSU

• Deficit correlated with lifetime cocaine use

• Despite similar behavioral adjustments in OSU & CS

Outline


• 2AFC vs. Go-Nogo



a b

TARGET9

31

1:3:

9

1:1:1

Search time (ms)c

LocationC

hoic

e (fr

actio

n)

Fixation distribution (1:3:9)d

1 3 90

0.2

0.4

0.6

0.8

1st fixation2nd fixation

Student Version of MATLAB

0.7 0.8 0.9 10.7

0.75

0.8

0.85

0.9

0.95

1

Student Version of MATLAB

1:1:1

1:3:

9

Accuracy

800 1000 1200 1400 1600 1800

800

1000

1200

1400

1600

1800

Student Version of MATLABAt each time step, subject has 3 possible actions:Stay: keep observing current patchSwitch: change to another locationRespond: choose current patch as the target

Active Vision (Search) & Prior Expectation(Yu, Huang, Shenoy, & Schultz, 2013, under review)

Bayesian Inference & Decision-Making(Ahmad & Yu, UAI, 2013)

Bayesian inference: data ⇒ P(target location|data), Pt

Decision policy π: Pt → {respond, fixate l1, fixate l2, ....}

Loss function:

Optimal policy minimizes expected cost:

Context-Dependent Active Controller (C-DAC)

Algorithm: exact DP; RBF- and GP-approx. of Q-factors

C-DAC Decision Policy

• Blue: stop, green: fixate location 1, orange: fixate location 2, maroon: fixate location 3

• Sensitive to behavioral params: time cost, SNR, switch cost

• Produces fixation location as well as duration

• Outperforms greedy MAP (Najemnik & Geisler, 2005)and InfoMax (Butko & Movellan, 2010) in speed + accuracy

(Ahmad & Yu, UAI, 2013)

• Spatial statistics affect data interpretation and collection in both human subjects and C-DAC model• More likely to identify 9-loc as target than 1&3 (hits & FAs)• Faster to identify 9-loc as target but slower if it’s distractor

9 1&3 9 1&3 T D

Spatial Prior: C-DAC & Human Behavior(Ahmad, Huang, & Yu, 2013, under reivew)

Bonus Application

• Human active learning

Multi-Armed Bandit Problem(Zhang & Yu, Cogsci, 2013)

• Subjects pulls an arm on each trial (win/lose); 20 trials

• Exploration vs. exploitation trade-off

• Optimal policy (DP) is computational intense

• Heuristic policies✴ win-stay-lose-shift✴ ε-greedy✴ ε-infomax✴ “knowledge gradient”

WSLS eG Optimal eINFO KG0.4

0.5

0.6

0.7

0.8

0.9

Ag

ree

me

nt

with

Pe

op

le

Basic LearnMeta Learn

WSLS eG eINFO KG0.4

0.5

0.6

0.7

0.8

0.9

1

Mo

de

l Ag

ree

me

nt

with

Op

tima

l

Basic LearnMeta Learn

vKG, tk = E

max

k0ˆ✓t+1k0 |Dt

= k, Bt

��max

k0ˆ✓tk0

DKG, t,g= argmax

kˆ✓t,gk + (T � t� 1) vKG, t, g

k

KG: Myopic Approximation to Optimal Bayes(Zhang & Yu, Cogsci, 2013)

Incremental value of one exploitation sample

Horizon determines value of exploitation vs. exploration

KG best account of human choice KG closest to optimal policy

Summary

• A unified framework for cognitive control: Bayesian inference + control (sequential decision-making)

• Explains

✴ speed-accuracy trade-off in perception✴ go bias in GNG compared to 2AFC

✴ effects of difficulty/reward/prior in inhibitory control✴ dynamics of visual search; influence of spatial prior

✴ exploration-exploitation tradeoff in active learning

• Guides neural data analysis and experimental design

Relevance to B.R.A.I.N. InitiativeCognitive models guide neural data analysis and experimental designDavid Marr (1969): Three Levels of Analysis

Algorithm (how?)

Implementation (what?)

• goals of computation• why things work the way they do

• representation of input/output• how one is transformed into the other

• physical realization of the computations• neural representation and dynamics

Computation (why?)

b

a0Q

t

speed, accuracy, opportunity cost

6.2

5.1

T-value

• Yu Lab✤ Pradeep Shenoy

✤ Sheeraz Ahmad

✤ Crane Huang

✤ Joseph Schilz

• Collaborators✤ Peter Frazier, Savas Dayanik

✤ Jaime Ide, Chiang-Shan Li, Katie Harlé, Martin Paulus

• You all

Thanks to...

cognitive processing as inference + control (sequential...

Documents