what’s optimal about n choices?
DESCRIPTION
What’s optimal about N choices?. Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks to NSF & NIMH. Neuro-inspired decision-making models*. 1. The two-alternative forced-choice task (2-AFC). - PowerPoint PPT PresentationTRANSCRIPT
What’s optimal about N choices?
Tyler McMillen & Phil Holmes,
PACM/CSBMB/Conte Center,
Princeton University.
Banbury, Bunbury, May 2005 at CSH.
Thanks to NSF & NIMH.
Neuro-inspired decision-making models*1. The two-alternative forced-choice task (2-AFC). Optimal decisions: SPRT, LAM and DDM*.2. Optimal performance curves.3. MSPRT: an asymptotically optimal scheme for n > 2 choices (Dragalin et al., 1990-2000) .4. LAM realizations of n-AFC; mean RT vs ER; Hick’s law.5. Summary
(the maximal order statistics)
* Optimality viewpoint: maybe animals can’t do it, but they can’t do better.** Sequential probability ratio test, leaky accumulator model, drift-diffusion model.
2-AFC, SPRT, LAM & DDM
p1(x) p2(x)
Choosing between 2 alternatives with noisy incoming data
Set thresholds +Z, -Z and form running product of likelihood ratios:
Decide 1 (resp. 2) when Rn first falls below -Z (resp. exceeds +Z).
Theorem (Wald, 1947; Barnard, 1946): SPRT is optimal among fixed or variable sample size tests in the sense that, for a given error rate (ER), expected # samples to decide is minimal. (Or, for given # samples, ER is minimal.)
DDM is the continuum limit of SPRT. Let
+Z
-Z
QuickTime™ and aVideo decompressorare needed to see this picture.
Drift, a
Extensive modeling of behavioral data (Stone, Laming, Ratcliff et al., ~1960-2005).
There’s also increasing neural evidence for DDM:
FEF: Schall, Stuphorn & Brown, Neuron, 2002.
LIP: Gold & Shadlen, Neuron, 2002.
Balanced LAM reduces to DDM on invariant line:
(linearized: race model if ). Uncouple via
stable OU flow in y1 if large, DD in y2 if .
Absolute thresholds in (x1, x2) become relative (x2 - x1)!
+Z
-Z
LAM sample paths collapse towards an attracting invariant manifold. (cf. C. Brody: Machens et al., Science, 2005)
QuickTime™ and aVideo decompressorare needed to see this picture.
First passage across threshold determines choice.
Simple expressions for first passage times and ERs:
Redn to 2 params:
Can compute thresholds that maximize reward rate:
(Gold-Shadlen, 2002; Bogacz et al., 2004-5) This leads to …
(1)
Optimal performance curves (OPCs):Human behavioral data: the best are optimal, but what about the rest? Bad objective function, or bad learners?
Left: RR defined previously;Right: a family of RR’s weighted for accuracy.Learning not considered here. (Bogacz et al., 2004; Simen, 2005.)
Increasing acc. wt.
N-AFC: MSPRT & LAM
MSPRT chooses among n alternatives by a max vs. next test:
MSPRT is asymptotically optimal in the sense that # samples is minimal in the limit of low ERs (Dragalin et al, IEEE trans., 1999-2000).
A LAM realization of MSPRT (Usher-McClelland 2001)
asymptotically predicts (cf. Usher et al, 2002)
The log(n-1) dependence is similar to Hick’s Law: RT = A + B log n or RT = B log (n+1).
W.E. Hick, Q.J. Exp. Psych, 1952.
We can provide a theoretical basis and predict explicit SNR and ER dependence in the coefficients A, B.
Multiplicative constants blow up log-ly as ER -> 0.
Behavior for small and larger ERs:
Empirical formula, generalizes (1),
(2)
(2)
But a running max vs next test is computationally costly (?). LAM can approximately execute a max vs average test via absolute thresholds. n-unit LAM decoupled by:
y1 attracted to hyperplane y1 = A, so max vs average becomes
an absolute test!
Attraction is fasterfor larger n: stableeigenvalue1 ~ n.
DD on hyperplane
Max vs average is not optimal, but it’s not so bad:
absolutemax vs averagemax vs next
absolutemax vs averagemax vs next
Unbalanced LAMs - OU processes
Max vs next and max vs ave coincide for n=2. As n increases, max vs ave deteriorates, approaching absolutetest performance. But it’s still better for n < 8-10!
Simple LAM/DD predicts log (n-1), not log n or log (n+1) as in Hick’s law:
but a distribution of starting points gives approx log n scaling for 2 < n < 8, and ER and SNR effects may also enter.
The effect of nonlinear activation functions, bounded below, is to shift scaling toward linear in n:
The limited dynamic range degrades performance, but can be offset by suitable bias (recentering).
Nonlinear LAMs
Linearized LAM
Summary: N-AFC• MSPRT max vs next test is asymptotically optimal in
low ER limit.• LAM (& race model) can perform max vs next test.• Hick’s law:
emerges for max vs next, max vs ave & absolute tests. A, B smallest for max vs next, OK for max vs ave.• LAM executes a max vs average test on its attracting
hyperplane using absolute thresholds.• Variable start points give log n scaling for `small n.’• Nonlinear LAMs degrade performance: RT ~ n for
sufficiently small dynamic range.More info: http://mae.princeton.edu/people/e21/holmes/profile.html