what’s optimal about n choices?

What’s optimal about N choices?

Tyler McMillen & Phil Holmes,

PACM/CSBMB/Conte Center,

Princeton University.

Banbury, Bunbury, May 2005 at CSH.

Thanks to NSF & NIMH.

Neuro-inspired decision-making models*1. The two-alternative forced-choice task (2-AFC). Optimal decisions: SPRT, LAM and DDM*.2. Optimal performance curves.3. MSPRT: an asymptotically optimal scheme for n > 2 choices (Dragalin et al., 1990-2000) .4. LAM realizations of n-AFC; mean RT vs ER; Hick’s law.5. Summary

(the maximal order statistics)

* Optimality viewpoint: maybe animals can’t do it, but they can’t do better.** Sequential probability ratio test, leaky accumulator model, drift-diffusion model.

2-AFC, SPRT, LAM & DDM

p1(x) p2(x)

Choosing between 2 alternatives with noisy incoming data

Set thresholds +Z, -Z and form running product of likelihood ratios:

Decide 1 (resp. 2) when Rn first falls below -Z (resp. exceeds +Z).

Theorem (Wald, 1947; Barnard, 1946): SPRT is optimal among fixed or variable sample size tests in the sense that, for a given error rate (ER), expected # samples to decide is minimal. (Or, for given # samples, ER is minimal.)

DDM is the continuum limit of SPRT. Let

+Z

-Z

QuickTime™ and aVideo decompressorare needed to see this picture.

Drift, a

Extensive modeling of behavioral data (Stone, Laming, Ratcliff et al., ~1960-2005).

There’s also increasing neural evidence for DDM:

FEF: Schall, Stuphorn & Brown, Neuron, 2002.

LIP: Gold & Shadlen, Neuron, 2002.

Balanced LAM reduces to DDM on invariant line:

(linearized: race model if ). Uncouple via

stable OU flow in y1 if large, DD in y2 if .

Absolute thresholds in (x1, x2) become relative (x2 - x1)!

+Z

-Z

LAM sample paths collapse towards an attracting invariant manifold. (cf. C. Brody: Machens et al., Science, 2005)

QuickTime™ and aVideo decompressorare needed to see this picture.

First passage across threshold determines choice.

Simple expressions for first passage times and ERs:

Redn to 2 params:

Can compute thresholds that maximize reward rate:

(Gold-Shadlen, 2002; Bogacz et al., 2004-5) This leads to …

(1)

Optimal performance curves (OPCs):Human behavioral data: the best are optimal, but what about the rest? Bad objective function, or bad learners?

Left: RR defined previously;Right: a family of RR’s weighted for accuracy.Learning not considered here. (Bogacz et al., 2004; Simen, 2005.)

Increasing acc. wt.

N-AFC: MSPRT & LAM

MSPRT chooses among n alternatives by a max vs. next test:

MSPRT is asymptotically optimal in the sense that # samples is minimal in the limit of low ERs (Dragalin et al, IEEE trans., 1999-2000).

A LAM realization of MSPRT (Usher-McClelland 2001)

asymptotically predicts (cf. Usher et al, 2002)

The log(n-1) dependence is similar to Hick’s Law: RT = A + B log n or RT = B log (n+1).

W.E. Hick, Q.J. Exp. Psych, 1952.

We can provide a theoretical basis and predict explicit SNR and ER dependence in the coefficients A, B.

Multiplicative constants blow up log-ly as ER -> 0.

Behavior for small and larger ERs:

Empirical formula, generalizes (1),

(2)

(2)

But a running max vs next test is computationally costly (?). LAM can approximately execute a max vs average test via absolute thresholds. n-unit LAM decoupled by:

y1 attracted to hyperplane y1 = A, so max vs average becomes

an absolute test!

Attraction is fasterfor larger n: stableeigenvalue1 ~ n.

DD on hyperplane

Max vs average is not optimal, but it’s not so bad:

absolutemax vs averagemax vs next

absolutemax vs averagemax vs next

Unbalanced LAMs - OU processes

Max vs next and max vs ave coincide for n=2. As n increases, max vs ave deteriorates, approaching absolutetest performance. But it’s still better for n < 8-10!

Simple LAM/DD predicts log (n-1), not log n or log (n+1) as in Hick’s law:

but a distribution of starting points gives approx log n scaling for 2 < n < 8, and ER and SNR effects may also enter.

The effect of nonlinear activation functions, bounded below, is to shift scaling toward linear in n:

The limited dynamic range degrades performance, but can be offset by suitable bias (recentering).

Nonlinear LAMs

Linearized LAM

Summary: N-AFC• MSPRT max vs next test is asymptotically optimal in

low ER limit.• LAM (& race model) can perform max vs next test.• Hick’s law:

emerges for max vs next, max vs ave & absolute tests. A, B smallest for max vs next, OK for max vs ave.• LAM executes a max vs average test on its attracting

hyperplane using absolute thresholds.• Variable start points give log n scaling for `small n.’• Nonlinear LAMs degrade performance: RT ~ n for

sufficiently small dynamic range.More info: http://mae.princeton.edu/people/e21/holmes/profile.html

what’s optimal about n choices?

Documents

lam realizations of

nunit lam

n alternatives

n choices

b log n

optimal scheme

whats optimal

optimal decisions