selective enhancement of associative learning by ... › braingroup › papers ›...

10
Selective enhancement of associative learning by microstimulation of the anterior caudate Ziv M Williams & Emad N Eskandar Primates have the remarkable ability to rapidly adjust or modify associations between visual cues and specific motor responses. Whereas little is known as to how such adjustments in behavioral policy are implemented, recent learning models suggest that the anterior striatum is optimally positioned to have a role in this process. We recorded from single units and delivered microstimulation in the striatum of rhesus monkeys performing an associative learning task. Caudate activity during reinforcement was closely correlated with the rate of learning and peaked during the steepest portion of the learning curve when new associations were being acquired. Moreover, delivering microstimulation in the caudate during the reinforcement period significantly increased the rate of learning without altering the monkeys’ ultimate performance. These findings suggest that the caudate is responsible for implementing selective adjustments to the ‘associative weights’ between sensory cues and motor responses during learning, thus enhancing the likelihood of selecting profitable actions. Reinforcement-based associative learning is a critical adaptive mechan- ism that allows animals to effectively associate or ‘link’ between specific sensory cues and motor responses that are likely to lead to reward. Although the neuronal mechanisms underlying the acquisition of selective visual-motor associations remain unclear, there is increasing anatomic and physiologic evidence to suggest that the anterior striatum is key to this process. Medium spiny neurons in the anterior striatum receive projections from associative cortical areas presumed to have a role in learning and motor selection and project back to those same areas by way of the globus pallidus and thalamus 1–6 . In addition, it is likely that the same neurons are modulated by phasic dopamine release from neurons in the ventral tegmental area and the substantia nigra pars compacta (SNpc) that are sensitive to actual and predicted reward 4,7 . Experiments using intracranial self-stimulation (ICSS), in which rodents repetitively perform actions to receive electrical stimula- tion in dopamine-rich areas of the midbrain and striatum, have provided an especially powerful tool for examining reinforcement- based learning 8,9 . Although these experiments principally rely on the performance of single stereotypical actions, they indicate a possible anatomic basis for the enhancement of ‘rewarded’ behavior. Experi- ments using in vivo preparations also demonstrate that ICSS of the SNpc can lead to potentiation of corticostriatal synapses that are behaviorally relevant 10 , further suggesting that the formation of specific associations probably depends on convergent activation from associa- tive cortical areas and dopaminergic areas sensitive to reward. On the basis of these observations, recent learning models have hypothesized that the striatum may provide a key function in learning by adjusting or modifying the ‘associative weights’ between sensory cues and motor responses that are likely to lead to reward 4,9–13 . In practical terms, these models suggest that when a particular pairing of a sensory cue and a motor response is followed by reward, the striatum acts to enhance their association, thereby increasing the animals’ likelihood of selecting the same action on subsequent cue iterations. In the current study, we recorded single-unit activity and delivered electrical microstimulation in the anterior striatum of awake behaving primates trained to concurrently learn discrete new visual-motor associations. We found that neuronal activity recorded in the caudate was closely correlated with the change in the monkeys’ behavior during learning. Moreover, delivering microstimulation in the caudate during the reinforcement period significantly enhanced the rate of learning for specific associations without altering their ultimate acquisition or maintenance. Stimulation did not alter performance for concurrently learned nonstimulated associations and had little effect on learning performance when delivered in the putamen. RESULTS Learning behavior We trained two rhesus monkeys to perform an associative learning task. During the task, the monkeys learned, by trial and error, to associate new visual images with specific joystick movements in one of four radial directions (Methods). In a given block of trials, we used four images. Two were randomly selected from a group of highly familiar images, and two were new images that the monkey had not seen before. A feedback tone indicated whether or not the correct target was reached, which was then followed by the actual reward (Fig. 1a). We presented an average of 16.5 ± 0.6 (mean ± s.e.m.) new images for each recorded cell over numerous trial blocks, such that multiple images and associated target locations were tested for each cell. Received 23 December 2005; accepted 7 February 2006; published online 26 February 2006; doi:10.1038/nn1662 MGH-HMS Center for Nervous System Repair, Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA. Correspondence should be addressed to E.N.E. ([email protected]). 562 VOLUME 9 [ NUMBER 4 [ APRIL 2006 NATURE NEUROSCIENCE ARTICLES © 2006 Nature Publishing Group http://www.nature.com/natureneuroscience

Upload: others

Post on 03-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

Selective enhancement of associative learningby microstimulation of the anterior caudate

Ziv M Williams & Emad N Eskandar

Primates have the remarkable ability to rapidly adjust or modify associations between visual cues and specific motor responses.

Whereas little is known as to how such adjustments in behavioral policy are implemented, recent learning models suggest

that the anterior striatum is optimally positioned to have a role in this process. We recorded from single units and delivered

microstimulation in the striatum of rhesus monkeys performing an associative learning task. Caudate activity during

reinforcement was closely correlated with the rate of learning and peaked during the steepest portion of the learning curve when

new associations were being acquired. Moreover, delivering microstimulation in the caudate during the reinforcement period

significantly increased the rate of learning without altering the monkeys’ ultimate performance. These findings suggest that the

caudate is responsible for implementing selective adjustments to the ‘associative weights’ between sensory cues and motor

responses during learning, thus enhancing the likelihood of selecting profitable actions.

Reinforcement-based associative learning is a critical adaptive mechan-ism that allows animals to effectively associate or ‘link’ between specificsensory cues and motor responses that are likely to lead to reward.Although the neuronal mechanisms underlying the acquisition ofselective visual-motor associations remain unclear, there is increasinganatomic and physiologic evidence to suggest that the anterior striatumis key to this process. Medium spiny neurons in the anterior striatumreceive projections from associative cortical areas presumed to have arole in learning and motor selection and project back to those sameareas by way of the globus pallidus and thalamus1–6. In addition, it islikely that the same neurons are modulated by phasic dopamine releasefrom neurons in the ventral tegmental area and the substantia nigrapars compacta (SNpc) that are sensitive to actual and predictedreward4,7. Experiments using intracranial self-stimulation (ICSS), inwhich rodents repetitively perform actions to receive electrical stimula-tion in dopamine-rich areas of the midbrain and striatum, haveprovided an especially powerful tool for examining reinforcement-based learning8,9. Although these experiments principally rely on theperformance of single stereotypical actions, they indicate a possibleanatomic basis for the enhancement of ‘rewarded’ behavior. Experi-ments using in vivo preparations also demonstrate that ICSS of theSNpc can lead to potentiation of corticostriatal synapses that arebehaviorally relevant10, further suggesting that the formation of specificassociations probably depends on convergent activation from associa-tive cortical areas and dopaminergic areas sensitive to reward.

On the basis of these observations, recent learning models havehypothesized that the striatum may provide a key function in learningby adjusting or modifying the ‘associative weights’ between sensorycues and motor responses that are likely to lead to reward4,9–13.

In practical terms, these models suggest that when a particular pairing ofa sensory cue and a motor response is followed by reward, the striatumacts to enhance their association, thereby increasing the animals’likelihood of selecting the same action on subsequent cue iterations.

In the current study, we recorded single-unit activity and deliveredelectrical microstimulation in the anterior striatum of awake behavingprimates trained to concurrently learn discrete new visual-motorassociations. We found that neuronal activity recorded in the caudatewas closely correlated with the change in the monkeys’ behavior duringlearning. Moreover, delivering microstimulation in the caudate duringthe reinforcement period significantly enhanced the rate of learning forspecific associations without altering their ultimate acquisition ormaintenance. Stimulation did not alter performance for concurrentlylearned nonstimulated associations and had little effect on learningperformance when delivered in the putamen.

RESULTS

Learning behavior

We trained two rhesus monkeys to perform an associative learning task.During the task, the monkeys learned, by trial and error, to associatenew visual images with specific joystick movements in one offour radial directions (Methods). In a given block of trials, we usedfour images. Two were randomly selected from a group of highlyfamiliar images, and two were new images that the monkey had notseen before. A feedback tone indicated whether or not the correct targetwas reached, which was then followed by the actual reward (Fig. 1a).We presented an average of 16.5 ± 0.6 (mean ± s.e.m.) new images foreach recorded cell over numerous trial blocks, such that multipleimages and associated target locations were tested for each cell.

Received 23 December 2005; accepted 7 February 2006; published online 26 February 2006; doi:10.1038/nn1662

MGH-HMS Center for Nervous System Repair, Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA.Correspondence should be addressed to E.N.E. ([email protected]).

562 VOLUME 9 [ NUMBER 4 [ APRIL 2006 NATURE NEUROSCIENCE

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 2: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learningcriterion was defined by the state-space model and was reached by themonkeys after an average of 4.1 ± 0.2 consecutive correct trials14,15. Themonkeys reached criterion after an average of 9.7 ± 0.1 total trials(correct and incorrect).

Neither monkey demonstrated a response bias to a particulardirection of movement at the start of the trial block (w2 test, P 40.05), and the monkeys rarely selected the same incorrect target two ormore times in a row. Furthermore, we found no difference in thenumber of trials it took the monkeys to reach learning criterionbetween images that were presented first or second during each block(rank sum test, P 4 0.05).

Neuronal responses during learning

We recorded 171 cells from the dorsal anterior caudate and 72 cellsfrom the rostral putamen. Of these, 153 caudate and 64 putaminal cellswere classified as phasically active neurons (PANs), probably corre-sponding to medium spiny projection cells16. Tonically active neurons

were not included in the current analysis17.The PANs recorded from the caudate are themain focus of this report, although the putam-inal neurons provide an important contrastand will be discussed in that context.

Among caudate neurons, 23% were modu-lated during image presentation, 37% duringmovement, 39% during feedback and 40%during reward delivery (repeated-measuresanalysis of variance (ANOVA), P o 0.05;Fig. 1b and Table 1). The most prominentlearning-related responses were observed dur-ing the feedback and reward periods. In 28%of neurons, there was a gradual increase and in24% a gradual decrease in activity that pla-teaued once learning occurred (bootstrap test,P o 0.01; Fig. 1c, left). However, in a largerproportion of neurons (41%), there was anincrease in activity that peaked near the timewhen the learning criterion was reached, when

associations were being acquired, and then gradually decreased oncethe associations were made (Fig. 1c, right).

Learning curve and learning rate related neuronal activity

Neuronal responses during the different periods of the task wereexamined as a function of the monkeys’ trial-by-trial performance,the learning curve, and the slope of this function, the learning rate(Methods). The learning rate indicates the rate of change in behaviorthat occurs during learning. Thus, low rates indicate no change inbehavior, whereas high rates indicate periods of rapid change inbehavior. The learning rate peaks during the steepest portion of thelearning curve, at which time the animals are actively acquiring newassociations between new images and particular movements4,13. Oncean association is learned, however, there are no further changes inbehavior and the learning rate returns to a low level.

Across the population of caudate cells, the most robust learning-related changes in activity were found during the feedback period(ANOVA, P o 0.0001; Fig. 2a,b). Activity during feedback correlated

Fixatio

n

Imag

e pr

esen

tatio

n

Joys

tick m

ovem

ent

Reach

targ

et

Feed

back

soun

d

Juice

rewar

d

Erase

imag

e

Feed

back

Rewar

d Feed

back

Rewar

d

Delay

Go cu

e

Tria

l eve

nts

Tria

l num

ber

Trial number Trial number

500500

a

b

c

20

0

30

25

20

15

10

5

0

Tria

l num

ber

25

20

15

10

5

0

–1,000 0Time (ms)

Act

ivity

(sp

ikes

per

s)

1,500

25

0–1,000 0

Time (ms)

Act

ivity

(sp

ikes

per

s)

25

00 5 10 15 20 25 0 5 10 15 20 2530

0

50

100

0

50

100

Act

ivity

(sp

ikes

per

s)

25

0

Act

ivity

(sp

ikes

per

s)

Percent correct

Percent correct

1,500

Learning curveLearning rateNeuronal activity

100 500 500

Figure 1 Single neuronal responses during

learning. (a) Schematic illustration and timeline of

events for the main task. (b) Perievent histogram

and rasters for two single neurons over the course

of one learning block while the monkey learned to

associate a new image with a movement toward a

specific target location. Activity is aligned to the

feedback tone (first dashed line). Each row in theraster represents a single trial; the trial numbers

are shown on the left. Black squares at right,

correct trials; arrowheads, trial at which the

learning criterion was reached. (c) Learning

performance and mean neuronal activity

during the feedback period. Gray line, monkeys’

behavioral performance (learning curve); dashed

lines, upper and lower confidence bounds (99%)

estimated from the monkeys’ performance. Black

line, learning rate (first derivative or slope of the

learning curve). Thick black line, average firing

rate for the cell during the feedback period. In

the left plot, neuronal activity closely correlates

with the learning curve, whereas in the right plot,

neuronal activity correlates more closely with the

learning rate. Arrowheads, trial at which learning

criterion was reached.

NATURE NEUROSCIENCE VOLUME 9 [ NUMBER 4 [ APRIL 2006 563

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 3: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

strongly with the learning rate (correlation coefficient of the learningrate: rr ¼ 0.28; H0: r ¼ 0, t-test , P o 10–7), and only weakly with thelearning curve (correlation coefficient of the learning curve: rc ¼ 0.06,P 4 0.05; Fig. 2c,d). This was associated with a larger proportion ofcells demonstrating a positive correlation with the rate of learning(Fig. 2e,f). In comparison, learning-related activity during imagepresentation was not significantly correlated with the learning rate(rr ¼ 0.05, P4 0.05) but was weakly correlated with the learning curve(rc ¼ 0.11, Po 0.05; Fig. 2a,b). On average, neuronal criteria precededbehavioral criteria by 1.8 ± 0.05 trials (Fig. 2f, inset). Learning-relatedchanges during feedback were not present for familiar images(rr ¼ –0.10, rc ¼ –0.12, P4 0.05) or for images in which the monkeysdid not reach learning criterion (rr ¼ 0.04, rc ¼ 0.06, P 4 0.05).Learning-related changes were also not present for incorrect trials(rr ¼ –0.04, rc ¼ 0.04, P 4 0.05). Caudate cells thus tended to showthe greatest activity at feedback during the steepest portion of thelearning curve when new associations were being acquired and less

activity at the beginning and end of the curve when there was little orno change in learning behavior. Cells with learning rate–related activitywere more prevalent anteriorly within the head of the caudate (linearregression, P o 0.01; Fig. 3).

In contrast to the caudate, the activity of rostral putaminal neuronswas more closely correlated with the learning curve rather than with thelearning rate. Most neurons demonstrated a progressive increase (47%)or decrease (20%) in activity with learning, whereas a smaller pro-portion (12%) demonstrated activity that peaked during the steepportion of the learning curve. Across the population, there wasa significant correlation with the learning curve during feedback(rc ¼ 0.39; P o 0.00001) and little correlation with the learning rate(rr ¼ -0.08; P 4 0.05; Fig. 4).

Controls for association formation, novelty and reward

A control task was introduced in which we tested 43 caudate cells toexamine whether the observed responses depended on the monkeysactively learning the appropriate visual-motor associations. In this task,the sequence of movement directions was ‘copied’ from the previousblock of trials onto a separate set of new images, but the direction ofmovement was instructed by a color change in one of the targets11.Thus, the monkeys were not required to learn the appropriate motorresponses to the new images (Supplementary Methods) butstill performed the same sequence of movements and experiencedan identical succession of correct and incorrect feedback tonesand rewards. In this task, there was no significant correlation betweenneuronal activity and progression through the sequence of trials

Learning curveLearning rateNeuronal activity

a b c

d e f

*0.3

0.2

Cor

rela

tion

(r)

Correlation (r )

Correct trial number

Act

ivity

(sp

ikes

per

s)

0.1

–0.1 –5–10 –5 0 5 10

Correct trial number–1 –0.5 0.5 10–5–10 0 5 10

0

5

10

Act

ivity

(sp

ikes

per

s)

–5

25

15

0–10 100

0

25

0

5

10

25

85

Imag

e

Delay

Mov

emen

t

Feed

back

Rewar

dIm

age

Delay

Mov

emen

t

Feed

back

Rewar

d

0

0.6

0.5

Cor

rela

tion+

(r)

Percent correct

Trial number

Act

ivity

(sp

ikes

per

s)

–5–10 –5 0 5 10

0

5

10

25

85

Percent correct

Cel

l cou

nt

Cel

l cou

nt

Trial number

Curve

Rate

Correlation– (r)

0.4

0.2

0.3

–0.6

–0.5

–0.4

–0.2

–0.3

*

Figure 2 Population responses in the caudate during learning. (a) Correlation coefficients were calculated for the population of caudate cells (n ¼ 153)

by comparing neuronal activity during each 500-ms time period (that is, image, delay, feedback) with either the learning curve (gray line) or the learning

rate (black line). Each point along the curve represents the mean r value for the population during that time period. Asterisks, intervals during which the

distribution of r values was significantly different from chance (P o 0.01). (b) Positively correlated (solid line) and negatively correlated (dashed line) learning

rate–related responses for the population during learning. (c) Mean neuronal activity and behavioral performance aligned to learning criterion (trial 0). Grayline, learning curve; black line, learning rate; averaged across all trials (confidence bounds not shown). Thick black line, mean neuronal activity during the

feedback period minus baseline. Only cells with either a significant learning curve or significant learning rate–related activity were included in this plot (n ¼112). (d) Mean neuronal activity during feedback for monkey P (n ¼ 90, solid line), and monkey N (n ¼ 22, dashed line). (e) Mean neuronal activity during

feedback for positively (solid line) and negatively (dashed line) correlated responses. (f) Distribution of learning curve– and learning rate–related r values for all

cells at the time of feedback. Black bars, cells with significant r values (P o 0.01). Gray and black arrows, mean learning curve–related and learning rate–

related r values, respectively, for the population. Inset, distribution of lag times between neuronal criteria and behavioral criteria for learning-modulated cell.

Gray and black arrows, mean lag for cells with significant learning curve–related and learning rate–related r values, respectively. Error bars indicate s.e.m.

Table 1 Neuronal modulation in the caudate and putamen

Image Movement Feedback Reward Any Total

Caudate 35 (23%) 57 (37%) 60 (39%) 61 (40%) 110 (72%) 153 (100%)

Putamen 18 (28%) 32 (50%) 37 (58%) 28 (43%) 58 (90%) 64 (100%)

Each column indicates the number and percent of cells modulated during each time period. Thefifth column indicates the number and percent of cells modulated during any (one or more) of thefour periods.

564 VOLUME 9 [ NUMBER 4 [ APRIL 2006 NATURE NEUROSCIENCE

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 4: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

(rr ¼ –0.06, P 4 0.05; rc ¼ –0.02, P 4 0.05), whereas the same cellsshowed significant learning rate–related activity in the standard task(P o 0.05). As in the standard task, there was little correlationbetween learning performance and neuronal activity on incorrect trials(P o 0.05).

Learning-related responses were not attributable to the use ofunfamiliar stimuli. In another control task involving 53 cells, newimages that had been learned in the previous block of trials werereintroduced in the following block but were now associated with analternate set of target locations. In these trials, neuronal activity duringfeedback was again significantly correlated with the rate of learning(rr ¼ 0.21, P o 0.01; rc ¼ 0.01, P 4 0.05).

Learning-related responses were also not due to simple changes inthe frequency or schedule of reward delivery as learning progressed. Ina control task involving 23 cells, we replayed an identical sequence ofimage presentations, cursor movements, feedback and reward deliveryfrom the previous block of trials while the monkeys maintained fixationwithout moving the joystick. We found no peak in activity as themonkeys progressed through these trials (rr ¼ 0.01, P 4 0.05)compared to the standard trials (P o 0.05). Moreover, we found nocorrelation between learning-related activity during feedback with armmovement within the peripheral target circles or eye movement withinthe fixation window (two-dimensional Kolmogorov-Smirnov test,P 4 0.05). There was also no correlation between learning-related

activity and licking activity following reward (bootstrap test, P4 0.05)suggesting that it was unlikely that this response pattern was due tosimple changes in motor activity during the task.

Effect of microstimulation on learning

These findings suggested that neuronal activity in the caudate wasclosely correlated to the acquisition of new visual-motor associationswhen the change in learning behavior was the greatest. Nonetheless,these findings alone did not demonstrate that the caudate is causallyinvolved in modifying the monkeys’ actual behavior. To test this notionmore directly, we used microstimulation18,19 in the caudate and puta-men. We hypothesized that if the caudate is indeed involved inmodifying the association between sensory cues and motor responsesduring learning, then introducing microstimulation at the time ofreinforcement may enhance or retard the rate at which specificassociations are learned. We delivered high-frequency microstimula-tion (HFM) during the feedback-reward period following correctresponses for one of the two concurrently learned new images ineach block. As a control, these trial blocks were also interleaved withother blocks in which low-frequency microstimulation (LFM) or nostimulation (baseline) was delivered.

We found that performance on trials in which new images werecoupled with HFM in the caudate was significantly better thanperformance on nonstimulated trials (two-tailed t-test, incremented

0

2

4

3

5

8

–1

–1

0

1

01

23

4

L

M

P

A

rr value

Dep

th (

cm)

Location (cm)

Figure 3 Locations of all recording sites within the caudate. Zero indicates

the position of the anterior commissure in the anterior-posterior (A-P)

dimension, the midline in the medial-lateral (M-L) dimension, and the dorsal

margin of the caudate in depth. Color codes indicate the mean learning

rate–related neuronal response during feedback (rr) at each site.

Correct trial number

Act

ivity

(sp

ikes

per

s)

–10–10 –5 0 5 10

0

10

25

85

Percent correct

Figure 4 Population responses in the putamen during learning. Mean activity

during feedback for putaminal cells (n ¼ 52), with the same convention as

in Figure 2c.

90

80

70

60

50

40

300

00

0.25

10 20

2 4 6Trial number

Trial number to criterion

Per

cent

cor

rect

Pro

port

ion

imag

es

8 10

90a b

c d

80

70

60

50

40

300 2 4 6

Trial number

Per

cent

cor

rect

8 10

0

2

4

3

5

8

–1

–1

0

1

01

2 34

L

M

P

A

Slr value

Dep

th (

cm)

Location (cm)

Figure 5 Effect of microstimulation in the caudate on learning. (a) Learningcurves are aligned to the first correct trial (0). Blue line, mean performance

for new images in which HFM was delivered in the caudate following correct

responses during the reinforcement period (triggered at feedback). Black line,

mean performance for new images in which no stimulation was delivered.

Error bars indicate s.e.m (61 new images per curve). Thick areas along

curve, trial points at which performance on stimulated trials was significantly

different from performance on nonstimulated trials (P o 0.05). (b) Effect

of microstimulation on learning for monkey P (n ¼ 51 images, solid lines)

and monkey N (n ¼ 10 images, dashed lines), with the same convention as

above. (c) Distribution of learning criteria for stimulated and nonstimulated

images. Abscissa, total number of trials (beginning with the first correct trial)

it took the monkeys to reach learning criterion. Arrowheads, average number

of trials to reach criterion for all images in each set, with the same color

convention as above. (d) Locations of all stimulation sites within the caudate,

with location coordinates following the same convention as in Figure 3. Color

codes indicate the mean SIr for each site.

NATURE NEUROSCIENCE VOLUME 9 [ NUMBER 4 [ APRIL 2006 565

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 5: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

comparisons, P o 0.05; Fig. 5a,b). This was associated with asignificant increase in the steepness, or rate of rise, in the monkeys’learning performance (selectivity index of the rate of learning:SIr ¼ 0.22, t-test, P o 0.001). In addition, the monkeys tooksignificantly fewer trials (30%) to reach learning criteria for stimulatednew images compared to nonstimulated new images (rank sum test,P o 0.05; Fig. 5c). That is, the monkeys seemed to learn the correctvisual-motor associations more quickly and reached learning criteriaearlier when the correct responses were coupled with HFM during thereinforcement period. There was a slight tendency for microstimula-tion to exert a stronger effect on learning behavior anteriorly within thehead of the caudate (linear regression, P¼ 0.12; Fig. 5d). In contrast toits effects on initial learning, HFM had little effect on the monkeys’ finalperformance at the asymptote once the associations had been success-fully learned (selectivity index of the final performance: SIfp ¼ 0.01,P4 0.05; Fig. 6a). No significant change in learning performance wasfound on LFM trials (SIr ¼ –0.02, SIfp ¼ –0.01; Fig. 6a).

Controls for response bias and motivation

Each image was associated with a unique movement direction. Hence,it was important to determine whether the improvement in learningcould be attributed to a nonspecific response bias toward the stimu-lated target location. We therefore evaluated performance for non-stimulated new images that were learned concurrently with stimulatedimages but corresponded to a different direction of movement. If HFMled to a simple directional bias, then we would expect performance todeteriorate on concurrent nonstimulated trials. However, we foundno difference in the rate of learning (SIr ¼ 0.06) or final performance(SIfp ¼ 0.01) on the nonstimulated trials compared to baseline(Fig. 6a). Similarly, there was no deterioration in performance onfamiliar trials, indicating that the effect of stimulation was selective forspecific associations and did not arise from a nonspecific response bias.

It was also important to determine whether stimulation in thecaudate was somehow perceived as being ‘pleasurable’. In a separatetask, monkeys were presented with a pair of targets that changed colorsimultaneously and from which the monkeys could select freely.Different paired combinations of reward and sitmulation were pre-sented in separate blocks of trials. Upon reaching the target, the

monkeys received either the standard reward, reward with HFM, HFMalone or none of the above. There was no preference in target selectionfor trials in which HFM versus no HFM was delivered or for trials inwhich reward with HFM versus reward alone was delivered (binomialtest, P4 0.05; Supplementary Fig. 1 online), suggesting that HFM didnot lead to a simple hedonic response bias. The effect of caudatestimulation on learning therefore did not seem to lead the monkeys tofavor a particular motor response but, rather, acted to enhance specificrewarded image-response associations.

Changes in learning performance with stimulation did not seem toresult from a general increase in attention or motivational drive. Whenstimulation was delivered following both correct and incorrectresponses, there was no difference in the learning rate (SIr ¼ 0.06) orfinal performance compared to baseline (SIfp ¼ 0.02; Fig. 6b). Whenstimulation was delivered on incorrect trials alone, the rate of rise inlearning performance was significantly blunted (SIr ¼ –0.25, Po 0.05)and the number of trials to reach criterion was significantly prolonged(rank sum test, P o 0.01; Fig. 6b). Diminished performance was notdue to repeated selection of trials in which HFM was delivered but,rather, due to a decreased likelihood of selecting the correct targetlocation on subsequent trials (w2 test, P4 0.05). Final performance atthe asymptote remained unchanged from baseline (SIfp ¼ –0.05;Fig. 6c). Moreover, when stimulation was delivered during the image

15105030

40

50

60

70

80

90

Trial number

151050

Trial number

No stimHFMNo stim concurrent

Per

cent

cor

rect

30

40

50

60

70

80

90

Per

cent

cor

rect

Caudate: stimulation on imagepresentation (n = 27)

Caudate: stimulation on feedbackcorrect & incorrect (n = 53)

Caudate: stimulation on feedbackHFM & LFM (n = 61)

201612840

Trial number

30

40

50

60

70

80

90

Per

cent

cor

rect

No stimHFM correct

HFM incorrect

HFM correct & incorrect

No stim concurrent

No stim concurrent

No stim concurrent

No stimHFM correctNo stim concurrent

No stim concurrentLFM correct

151050

Trial number

30

40

50

60

70

80

90

Per

cent

cor

rect

a b c d

Figure 6 Stimulation controls. Learning curves are aligned to the first correct trial (0). Microstimulation trains were triggered either on feedback or on image

presentation, depending on the experiment type (indicated in plot title). Solid colored lines in each figure represent the monkeys’ performance for trials

in which stimulation was delivered for one of the two concurrently learned new images. Dashed lines, with their corresponding color codes, represent the

monkeys’ performance for the concurrently learned nonstimulated new images. Black lines, trials in which no stimulation was delivered for either of the two

new images (baseline). Error bars (s.e.m) are displayed only on the black curve for clarity. Thick areas along the curves, trial points at which performance onstimulated trials was significantly different from baseline (P o 0.05). The legend in the bottom right of each figure indicates the color code for each task type.

The number of learning blocks used to construct each curve is indicated in parenthesis. All curves within each figure were obtained from interleaved trial sets

performed within the same task sessions. (a) Comparison of LFM and HFM in the caudate. The black and blue curves shown here are the same as those shown

in Figure 5a. (b) Stimulation in the caudate during correct trials, incorrect trials or both. (c) Comparison between baseline performance and performance on

trials in which HFM was delivered for incorrect trials. The black and red lines shown here are the same as those shown in b. (d) Stimulation in the caudate

during the image presentation epoch.

151050Trial number

30

40

50

60

70

80

90

Per

cent

cor

rect

Putamen: stimulation on feedbackHFM & LFM (n = 41)

No stimHFM correct

LFM correctNo stim concurrent

No stim concurrent

Figure 7 Effect of microstimulation in the putamen on learning. Same

convention as in Figure 6a.

566 VOLUME 9 [ NUMBER 4 [ APRIL 2006 NATURE NEUROSCIENCE

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 6: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

presentation period in a separate set of trials, there was no significanteffect on learning (SIr ¼ –0.01, SIfp ¼ 0.02; Fig. 6d), indicating that thedelivery of stimulation in time periods other than the reinforcementperiod was not sufficient to alter learning performance.

In comparison to the caudate, there was no enhancement in learningwhen HFM was delivered in the rostral putamen. Rather, we noted asmall nonsignificant decrease in performance (SIr ¼ –0.16, P 4 0.05;SIfp ¼ –0.01, P4 0.05; Fig. 7), suggesting that the effect of stimulationin enhancing learning was selective to the caudate and was notgeneralized to other areas of the anterior striatum.

DISCUSSION

These findings suggest that the caudate is directly involved in augment-ing selective visual-motor associations during learning and that thisprocess probably occurs at the time of reinforcement. In practicalterms, we propose the following simplified synthesis. The anteriorcaudate has convergent input from homologous associative corticalareas that respond to behaviorally relevant cues and from dopaminer-gic neurons that encode reward1–7. In the initial phases of associativelearning, a new stimulus is coupled with some type of arbitrary motorbehavior. By trial and error, a correct and thus rewarded responseeventually occurs. This being an unexpected reward, there is a strongphasic burst of dopaminergic activity and the set of caudate neuronsthat is active becomes potentiated7,9,10. This process continues over thenext few trials, reinforcing that particular visual-motor association. Asthe association is learned, the dopaminergic input is decreased, but thesynaptic weights of that particular group of neurons remains enhancedas long as subsequent behavior remains coupled with reward.

Although the precise mechanisms by which microstimulation in thestriatum exerts its effects are unknown, its influence on learning may beanalogous to that of the dopaminergic responses observed with reward.For example, delivering tetanic electrical stimulation in the substantianigra can lead to the long-term potentiation of activated corticostriatalsynapses that are behaviorally relevant10. Additionally, in striatal slicepreparations, long-term potentiation or depression of specific corti-costriatal synapses is based on the combination of afferent cortical anddopaminergic activity9,20. By enhancing dopamine release21,22, high-frequency microstimulation in the caudate may similarly result in thepotentiation of particular corticostriatal synapses, or potentiate acti-vated synapses directly.

The pattern of responses observed in the current study closelymirrors the actor-critic architecture proposed to have a role inassociative learning. On the basis of this model, dopamine serves asthe critic by signaling errors in predicted reward. The actor, in turn,uses the signal provided by the critic to implement direct changes to themonkeys’ behavior in the form of a policy4,11–13. When unexpectedreward is delivered shortly after a sensory cue is paired with a particularaction, the actor implements changes to the ‘associative weights’ thatlink the specific stimulus to that particular action. Consistent with thismodel, we observed that caudate activity in single-unit recordingspeaked when the change in behavior was greatest—at which time thestrengthening of associative weights is hypothesized to occur4. Once theassociations were learned, however, there was little further change inbehavior, and neuronal activity again returned to a low level. Inagreement with these findings, we observed that microstimulation inthe caudate during reinforcement led to a significant increase in theslope of learning, or the rate of change in the monkeys’ behavior.This occurred without affecting the monkeys’ ultimate performanceonce the associations were already learned, even though microstimula-tion was present for all rewarded responses. Furthermore, the effectwas highly selective for actively represented visual-motor associations

without altering the rate of learning for concurrently learned new orhighly established familiar associations. That is, only microstimulationthat was temporally linked with a rewarded sensory-motor pair led tothe selective augmentation of that particular association. The anteriorcaudate thus seemed to function as the actor, implementing directchanges to the monkeys’ behavior in response to selective sensory cues.

The notion that the striatum may be involved in associative learningis supported by human imaging studies demonstrating caudate activa-tion in subjects performing an instrumental conditioning task11 and bystudies in primates demonstrating activity changes correlated withlearning6,23–27. What has been less clear, however, is whether thestriatum is involved in implementing adjustments to the associationsbetween sensory cues and the monkeys’ actual motor behavior. Ourfindings suggest that the caudate acts to dynamically enhance orstrengthen profitable behavioral policies during learning, thus increas-ing the organism’s likelihood of selecting actions that lead to reward.

METHODSElectrophysiology. Two adult male rhesus monkeys (macaca mulatta) were

used in the study. A titanium head post, plastic recording chamber and scleral

search coil were surgically implanted in strict accordance with guidelines set by

the animal review committee at Massachusetts General Hospital. Neuronal

activity was amplified, band-pass filtered between 200 Hz and 5 kHz, and

sampled at 20 kHz. Spikes were stored and sorted offline using a template-

matching algorithm (Spike 2, Cambridge Electronics Design). Eye position,

joystick position and electromyographic licking activity were each sampled and

recorded at 1 kHz.

Stimulation parameters. Biphasic stimulating pulses, with a cathodal phase

leading, were delivered through a 300–600 kO tungsten microelectrode (Bak

Electronics pulse generator and stimulus isolation unit)19. Phase length and

interphase intervals were both 0.2 ms. Pulse frequency was either 20 Hz (low-

frequency microstimulation) or 200 Hz (high-frequency microstimulation) at

200 mA. Stimulus trains lasted for 1000 ms and were triggered immediately after

the feedback tone or image presentation depending on the experiment type.

Stimulation epochs and durations were set to include either the reinforcement

period (feedback and reward) or image presentation period (image presenta-

tion and delay).

Learning task. During the task, the monkeys were required to learn, by trial and

error, to associate new visual images displayed at the center of the monitor with

a specific movement. The monkeys used a joystick to guide a cursor emerging

from the center of the screen to one of four targets displayed in the periphery.

Each image was associated with only one rewarded target location and did not

overlap with target locations associated with the other images. Two of the images

were randomly selected from a group of familiar images that the monkeys were

well-trained on, and the other two from a group of randomly generated new

images that the monkeys had not previously seen. To ensure an even distribution

of new and familiar trials, each of the four images randomly appeared once

within a set of four consecutive trials. Trials in which the monkey selected an

incorrect target location were repeated. After the monkeys completed a block of

18 correct trials for each image, a new set of images and target locations was

selected. This learning process was repeated multiple times for each cell.

Behavioral analysis. The monkeys were considered to have successfully learned

to associate a given new image with the correct target location if they had

selected the correct target at least five times in a row. The probability of the

monkeys having obtained this number of consecutive correct trials out of an

average of 30 trials was 0.0192 and therefore unlikely to have occurred by

chance. We used a state-space model to estimate the trial on which the monkey

had first learned the association (that is, had reached learning criterion14,15).

We defined the learning criterion as the first trial in which the lower 99%

confidence bound obtained from the Gaussian state equation was greater than

0.25. The criterion describes the trial during which the monkey has first

successfully made the correct association.

NATURE NEUROSCIENCE VOLUME 9 [ NUMBER 4 [ APRIL 2006 567

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 7: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

The mean behavioral performance, or learning curve, for each image was

estimated from the monkeys’ binary responses (correct or incorrect) using a

Bernoulli probability model. This provided a continuous estimate, ranging

from 0 to 1, of the monkeys’ performance. The learning rate was calculated by

approximating the central difference derivative of the learning curve, which

describes the slope (that is, rate of change) in the monkeys’ performance28.

Parametric analysis of behavioral performance was obtained by fitting the

learning curves to a standard logistic equation. Performance Pk at trial number

k ¼ 1,y, K, was defined by the equation

Pk ¼ Pi +Pf � Pi

1 + expð � gðk� dÞÞ ;

where Pi is defined as the initial performance and Pf is the final performance at

the asymptotes. The rate of rise in the learning curve is governed by the

constant g, and the constant d defines the inflection point of the curve. We

found that 3% of the curves did not converge to the function using a

Levenberg-Marquardt modification to the fitting algorithm; these were there-

fore excluded from further analysis29. There was no difference in mean residual

values between the task types indicating that the goodness of fit was similar

across trial conditions (one-way ANOVA, P 4 0.05).

Selectivity indices (SI) for the learning rate (SIr) and final performance (SIfp)

on stimulation trials were defined as (A – B)/(A + B), where A and B are the

values for each of the two trial conditions (stimulated versus nonstimulated

trials). The significance of change in SI was determined by using a one-tailed

t-test (H0: SI ¼ 0, range –1 to 1; P o 0.05). Thus, a learning-rate SI (SIr) of

0 would indicate that there is no difference in the rate of rise in performance

(g) between stimulated and nonstimulated trials. A positive SIr would indicate

that the learning rate was higher on stimulated trials, whereas a negative SIr

would indicate that the learning rate was lower.

Correlating neuronal activity with learning. Firing rates during image

presentation, movement, feedback and reward time periods (500 ms) were

correlated to the monkeys’ learning performance by aligning neuronal activity

for each of the new images to their corresponding learning criteria. Mean

neuronal activity curves and behavioral performance curves were then obtained

and the correlation coefficients (r) were calculated from the aligned data. This

allowed us to evaluate the activity in each cell as a function of learning for

multiple images and associated target locations before, during and after the

criterion had been reached. Correct (rewarded) and incorrect trials were

analysed separately. All cells were included for the analysis regardless of whether

they demonstrated significant perievent activity. Cells with evidence of drift

during baseline were not included at any point in the study.

We used a bootstrap test to evaluate the significance of correlation between

neuronal activity and behavioral performance for individual cells. In each cell,

neuronal activity was randomly shuffled 1,000 times across trials and correla-

tion coefficients were recalculated from the shuffled data. If the rank of the

calculated r value was higher than 99% or lower than 1% of the shuffled

distribution, then the correlation was considered to be significant.

Learning criteria for neuronal responses were independently calculated for

each cell by using a modification to the state-space model described pre-

viously14,15. Criterion for cells with significant learning rate–related responses

were obtained by calculating the integral (area under the curve) of the response

function. Criterion for cells with significant learning curve–related responses

were obtained from the linear function of the response curve.

Note: Supplementary information is available on the Nature Neuroscience website.

ACKNOWLEDGMENTSWe thank E.N. Brown, W.F. Asaad, J.D. Macklis and R. Amirnovin for theirthoughtful comments and critical review of the paper. Funding for this projectwas provided by the National Institutes of Neurological Disorders and Stroke, theParkinson Disease Foundation and the Harvard Center for Neurodegenerationand Repair.

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

Published online at http://www.nature.com/natureneuroscience

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Alexander, G.E. Basal ganglia-thalamocortical circuits: their role in control of move-ments. J. Clin. Neurophysiol. 11, 420–431 (1994).

2. Flaherty, A.W. & Graybiel, A.M. Input-output organization of the sensorimotor striatum inthe squirrel monkey. J. Neurosci. 14, 599–610 (1994).

3. Hoover, J.E. & Strick, P.L. Multiple output channels in the basal ganglia. Science 259,819–821 (1993).

4. Houk, J.C., Davis, J.L. & Beiser, D.G. eds.Models of Information Processing in the BasalGanglia (MIT Press, Cambridge, Massachusetts, 1995).

5. McClure, S.M., Laibson, D.I., Loewenstein, G. & Cohen, J.D. Separate neuralsystems value immediate and delayed monetary rewards. Science 306, 503–507(2004).

6. Pasupathy, A. & Miller, E.K. Different time courses of learning-related activity in theprefrontal cortex and striatum. Nature 433, 873–876 (2005).

7. Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27(1998).

8. Fibiger, H.C., LePiane, F.G., Jakubovic, A. & Phillips, A.G. The role of dopamine inintracranial self-stimulation of the ventral tegmental area. J. Neurosci. 7, 3888–3896(1987).

9. Wickens, J.R., Reynolds, J.N. & Hyland, B.I. Neural mechanisms of reward-related motorlearning. Curr. Opin. Neurobiol. 13, 685–690 (2003).

10. Reynolds, J.N., Hyland, B.I. & Wickens, J.R. A cellular mechanism of reward-relatedlearning. Nature 413, 67–70 (2001).

11. O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumentalconditioning. Science 304, 452–454 (2004).

12. O’Doherty, J.P. Reward representations and reward-related learning in the human brain:insights from neuroimaging. Curr. Opin. Neurobiol. 14, 769–776 (2004).

13. Suri, R.E. & Schultz, W. A neural network model with dopamine-like reinforcementsignal that learns a spatial delayed response task. Neuroscience 91, 871–890(1999).

14. Smith, A.C. et al. Dynamic analysis of learning in behavioral experiments. J. Neurosci.24, 447–461 (2004).

15. Wirth, S. et al. Single neurons in the monkey hippocampus and learning of newassociations. Science 300, 1578–1581 (2003).

16. Alexander, G.E. & DeLong, M.R. Microstimulation of the primate neostriatum. II.Somatotopic organization of striatal microexcitable zones and their relation to neuronalresponse properties. J. Neurophysiol. 53, 1417–1430 (1985).

17. Aosaki, T., Kimura, M. & Graybiel, A.M. Temporal and spatial characteristics oftonically active neurons of the primate’s striatum. J. Neurophysiol. 73, 1234–1252(1995).

18. Cohen, M.R. & Newsome, W.T. What electrical microstimulation has revealed about theneural basis of cognition. Curr. Opin. Neurobiol. 14, 169–177 (2004).

19. Tehovnik, E.J. Electrical stimulation of neural tissue to evoke behavioral responses.J. Neurosci. Methods 65, 1–17 (1996).

20. Hernandez-Lopez, S., Bargas, J., Surmeier, D.J., Reyes, A. & Galarraga, E. D1receptor activation enhances evoked discharge in neostriatal medium spinyneurons by modulating an L-type Ca2+ conductance. J. Neurosci. 17, 3334–3342(1997).

21. Borland, L.M., Shi, G., Yang, H. & Michael, A.C. Voltammetric study of extracellulardopamine near microdialysis probes acutely implanted in the striatum of the anesthe-tized rat. J. Neurosci. Methods 146, 149–158 (2005).

22. Cragg, S.J., Hille, C.J. & Greenfield, S.A. Dopamine release and uptake dynamics withinnonhuman primate striatum in vitro. J. Neurosci. 20, 8209–8217 (2000).

23. Hadj-Bouziane, F. & Boussaoud, D. Neuronal activity in the monkey striatum duringconditional visuomotor learning. Exp. Brain Res. 153, 190–196 (2003).

24. Hollerman, J.R., Tremblay, L. & Schultz, W. Influence of reward expectation onbehavior-related neuronal activity in primate striatum. J. Neurophysiol. 80, 947–963(1998).

25. Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V. & Graybiel, A.M. Building neuralrepresentations of habits. Science 286, 1745–1749 (1999).

26. Kawagoe, R., Takikawa, Y. & Hikosaka, O. Expectation of reward modulates cognitivesignals in the basal ganglia. Nat. Neurosci. 1, 411–416 (1998).

27. Miyachi, S., Hikosaka, O., Miyashita, K., Karadi, Z. & Rand, M.K. Differential roles ofmonkey striatum in learning of sequential hand movement. Exp. Brain Res. 115, 1–5(1997).

28. Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P.Numerical Recipes in C++:The Art of Scientific Computing 2nd edn. (Cambridge Univ. Press, Cambridge, UK,2002).

29. Schraudolph, N.N. Fast curvature matrix-vector products for second-order gradientdescent. Neural Comput. 14, 1723–1738 (2002).

568 VOLUME 9 [ NUMBER 4 [ APRIL 2006 NATURE NEUROSCIENCE

ART ICLES©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

eneu

rosc

ienc

e

Page 8: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

Nature Neuroscience Supplement Materials and Methods Electrophysiological set-up The recording chamber was centered on the left side at stereotactic coordinates A16, L9 (inter-aural). All recording and stimulations in the caudate and putamen were performed rostral to the anterior commissure. Magnetic resonance imaging (1.5 Tesla) was used to confirm target locations. Single-unit recordings were made with tungsten microelectrodes that were advanced through a guide tube and grid system using a hydraulic micro-drive (Crist Instruments). Electromyographic activity was recorded using tin-disk surface electrodes placed below the jaw to ensure a broad and early detection of licking activity.

Trial sequence Each trial within a learning block began with presentation of a central fixation point surrounded by four gray circular targets (distance 11

0 from the center). The animals were required to fixate within 1

0

of the fixation point for the entire trial. Once the animals fixated for 500 ms, a novel or familiar image would appear for 500 ms at the center of the screen, with the fixation point visible in the middle. After a variable delay of another 500-1000 ms, the fixation point would change color, at which time the monkeys used the joystick to move a cursor from the center of the screen to one of the targets. The cursor became visible when it was 3

0 away from the fixation point to limit smooth pursuit or saccadic

eye movements. Once the cursor reached the target, another 100 ms would lapse, and then one of two feedback tones would sound indicating whether the animal had chosen the correct or incorrect target. The animals were then required to maintain the cursor within the target window for an additional 500 ms in order to receive the reward. If during a trial, the animal broke fixation, moved prematurely, or failed to maintain the cursor within the target circle (radial deflection in joystick of 2.5

0), the trial

would abort and no reward would be given. Once reward was delivered, 500 ms would lapse, the image would erase, and the sequence would repeat again.

Controls for motor activity To control for potential activity associated with saccade planning or joystick movements, multiple novel images and associated target-locations were tested per cell. On average, for each cell recorded, 4.1 ± 0.8 novel images were tested for each of the four target-locations or directions of movement. Novel images corresponding to each of the target-locations were also interleaved with an equal number of familiar images corresponding to the same target-locations on alternate trials.

In order to further limit the potential effects of joystick and eye movements, the animals were required to maintain eye fixation within a narrow window throughout the duration of the trial, and joystick position within the target window during feedback period in order for them to receive reward. No systematic difference in mean eye position or cursor position was found during the feedback period when comparing trials before and after learning criterion (2D Kolmogorov-Smirnov test, P > 0.05). Similarly, no difference was found in the mean magnitude or vector of movement (rank-sum test, P > 0.05). Electromyographic licking activity was also examined in one monkey for both the standard and replay control task (detailed bellow). In neither task type was there a correlation between the magnitude of licking activity during reward and the animals’ rate of learning (bootstrap test, P > 0.05).

Behavioral control tasks Guided-movement task. Animals performed alternating blocks of standard and control trials. In one block of trials, the monkeys performed the standard learning task. On the alternate block, a new pair of novel images was used. However, in these control trials the movement directions were ‘copied’ from the previous block, and were now indicated by a color change in one of the targets. The animals were previously trained to know that a change in color in one of the targets indicated the correct movement direction. Hence this did not constitute new learning. The sequence of image presentations, target selections, and reward delivery was the same as that of the previous standard task (including both correct and incorrect trials). Thus, as the trials progressed, novel images were repeatedly paired

Page 9: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

with movements to specific target-locations, but reward delivery was not contingent on the animals learning to associate the central visual image with a specific direction of movement. Trials that were aborted because of fixation breaks or premature movements during the standard task were not included in the control task.

In this task, the animals selected the colored target-location, even on non-rewarded trials, above 95% of the time. To examine whether or not the monkeys were actively learning new visual-motor associations during this control task, we tested one monkey with a third set of trials in which the color change that was previously used to instruct the monkeys’ movements was now eliminated once the animal completed the control trial block. We found that this monkeys’ performance fell to 23% after the color change was eliminated (χ

2 test, P > 0.05). The animal then took a similar number

of trials to reach criterion as during the standard task, suggesting that the monkey had not learned the relevant visual-motor associations during the guided-movement control task.

Novelty task. Animals performed alternating blocks of standard and control trials. Once the animal completed a block of trials for the standard learning task, the same two novel images were presented again in the following block. However, each image was now associated with a different target-location. In this manner, the animals had to learn a new visual-motor association but with the same set of images present. Blocks of standard and control trials were alternated such that an equal number of trials from each task type were recorded per cell.

Replay task. Animals performed alternating blocks of standard and control trials. In the control trials, the exact same sequence of image presentations, cursor movements, feedback tones and reward delivery from the previous block of trials in the standard task were replayed to the monkeys but without the animal moving the joystick. During these trials, the animals were required to maintain their gaze on the central fixation spot throughout the trial. If the monkeys moved the joystick at any time point during the trial, no reward would be given, and the trial would repeat.

Stimulation trials and controls The monkeys performed the standard learning task as before, but with the introduction of stimulation trains during set time epochs during the trial. Stimulated trials were always interleaved with an equal number of concurrently learned non-stimulated trials. In this way, we could compare the effect of microstimulation on the learning of one novel image to that of a concurrently learned non-stimulated novel image. Blocks of trials were randomly interleaved such that stimulation was delivered during the feedback-reward epoch, or image presentation epoch, of correct trials alone, incorrect trials alone, or both correct and incorrect trials. In addition, we randomly interleaved learning blocks in which stimulation was delivered for one of the two novel images with learning blocks in which no stimulation was delivered for either of the novel images (baseline). For trials in which stimulation was delivered for incorrect trials alone, stimulation was delivered for all incorrect choices (i.e. all three incorrect target-locations, but not the correct target-location).

In a separate set of control trials, the animals were presented with a fixation point but no central image and were then shown two randomly selected targets that changed color indicating that movement could start. Each pair of targets corresponded to the following combinations of reward and HFS delivery: reward alone vs. nothing, reward + HFS vs. reward alone, reward alone vs. HFS alone and HFS alone vs. nothing. On these trials, the animals were free to move to either of the two targets, and once 18 trials were completed, a new set of target-locations and reward/HFS contingencies was used.

Joystick and eye positions were continuously recorded during stimulation. Consistent with prior studies evaluating the affect of anterior striatal stimulation on movement

1, we found that

stimulation in this location did not alter mean cursor position or eye position within their target windows (2D Kolmogorov Smirnov test, P > 0.05), nor did it alter the mean vector of movement at any time point during the task (rank-sum test, P > 0.05). No stimulations were performed posterior to the anterior commissure in the “motor” sub-territory of the striatum.

Page 10: Selective enhancement of associative learning by ... › braingroup › papers › williams_eskandar...Of these, the monkeys learned 12.4 ± 0.5 to criterion. The learning criterion

References

1. Alexander, G. E. & DeLong, M. R. Microstimulation of the primate neostriatum. II. Somatotopic organization of striatal microexcitable zones and their relation to neuronal response properties. J Neurophysiol. 53, 1417-1430 (1985).

Legends

Figure 1. Paired choice control task. Bars indicate the mean probability (over 18 consecutive trials) of selecting one target over the other, with a choice proportion of 0.5 indicating an equal probability of selecting either target. Error bars indicate s.e.m.