decision rules and signal detectability in a reinforcement-density discrimination

20
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR DECISION RULES AND SIGNAL DETECTABILITY IN A REINFORCEMENT-DENSITY DISCRIMINA TION MICHAEL L. COMMONS COLUMBIA UNIVERSITY Two probabilistic schedules of reinforcement, one richer in reinforcement, the other leaner, were overlapping stimuli to be discriminated in a choice situation. One of two schedules was in effect for 12 seconds. Then, during a 6-second choice period, the first left-key peck was reinforced if the richer schedule had been in effect, and the first right- key peck was reinforced if the leaner schedule had been in effect. The two schedule stimuli may be viewed as two binomial distributions of the number of reinforcement opportunities. Each schedule yielded different frequencies of 16 substimuli. Each substimulus had a particular type of outcome pattern for the 12 seconds during which a schedule was in effect, and consisted of four consecutive light-cued 3-second T-cycles, each having 0 or 1 reinforced center-key pecks. Substimuli therefore contained 0 to 4 reinforcers. On any 3- second cycle, the first center-key peck darkened that key and was reinforced with proba- bility .75 or .25 in the richer or leaner schedules, respectively. In terms of the theory of signal detection, detectability neared the maximum possible d' for all four pigeons. Left- key peck probability increased when number of reinforcers in a substimulus increased, when these occurred closer to choice, or when pellets were larger for correct left-key pecks than for correct right-key pecks. Averaged over different temporal patterns of reinforcement in a substimulus, substimuli with the same number of reinforcers produced choice proba- bilities that matched relative expected payoff rather than maximized one alternative. Key words: discrimination, decision rules, reinforcement, memory, matching, maximizing, signal detection, T-schedule, key peck, pigeon Most studies of reinforcement schedules have been concerned with the effect of rein- forcement schedules on responding maintained by these schedules. The present approach is to examine discriminative properties of prob- abilistic reinforcement schedules in a choice procedure (Estes, Burke, Atkinson, & Frank- mann, 1957; Shimp, 1973). Psychophysical techniques then permit an analysis of stimu- lus control by reinforcement schedules. Ril- ling and MacDiarmid (1965), Pliskoff and Goldiamond (1966), Hobson (1975), and Lat- tal (1975) used schedules of reinforcement This research was supported in part by Dare Associa- tion Grant #103. Portions of the data were reported at the 1973 Eastern Psychological Association convention, and served to partially fulfill the requirements for a PhD degree. The author wishes to thank John Anthony Nevin and Bruce Schneider for their helpful supervi- sion; Patrice M. Miller, Thomas R. Farrell, and Peter Pastor for assistance in conducting the experiment; and my numerous colleagues for editorial suggestions. Re- prints may be obtained from Michael L. Commons, Laboratory of Human Development, Harvard Graduate School of Education, Harvard University, Larsen Hall, Appian Way, Cambridge, Massachusetts 02138. where the number of responses per reinforcer were the discriminative stimuli. In the present study, reinforcement density provided the ba- sis for discrimination. Using a signal detection analysis, Rilling and MacDiarmid (1965) studied how pigeons discriminated fixed ratio (FR) reinforcement schedules. When a center key came on, one of two FR schedules was arranged for respond- ing on it. Meeting the center-key requirement darkened that key and turned on two side keys. After the lower valued FR, which may be considered the signal plus noise (S + N), a left-key peck was reinforced and a right-key peck darkened the chamber. After higher val- ued FR, which may be considered noise (N), a right-key peck was reinforced and a left-key peck darkened the chamber. The discrimina- bility of the signal plus noise (S + N) sched- ules from the noise (N) schedules was mea- sured from isosensitivity curves where the coordinates were the conditional probabilities of left-key pecks given signal plus noise, P(Hit) = P(LIS + N), and of left-key pecks given noise alone, P(False Alarm) = p(LIN). These choice 101 1979, 32, 101-120 NUMBER I (JULY)

Upload: hms-harvard

Post on 27-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR

DECISION RULES AND SIGNAL DETECTABILITY INA REINFORCEMENT-DENSITY DISCRIMINA TION

MICHAEL L. COMMONS

COLUMBIA UNIVERSITY

Two probabilistic schedules of reinforcement, one richer in reinforcement, the otherleaner, were overlapping stimuli to be discriminated in a choice situation. One of twoschedules was in effect for 12 seconds. Then, during a 6-second choice period, the firstleft-key peck was reinforced if the richer schedule had been in effect, and the first right-key peck was reinforced if the leaner schedule had been in effect. The two schedule stimulimay be viewed as two binomial distributions of the number of reinforcement opportunities.Each schedule yielded different frequencies of 16 substimuli. Each substimulus had aparticular type of outcome pattern for the 12 seconds during which a schedule was ineffect, and consisted of four consecutive light-cued 3-second T-cycles, each having 0 or 1reinforced center-key pecks. Substimuli therefore contained 0 to 4 reinforcers. On any 3-second cycle, the first center-key peck darkened that key and was reinforced with proba-bility .75 or .25 in the richer or leaner schedules, respectively. In terms of the theory ofsignal detection, detectability neared the maximum possible d' for all four pigeons. Left-key peck probability increased when number of reinforcers in a substimulus increased,when these occurred closer to choice, or when pellets were larger for correct left-key pecksthan for correct right-key pecks. Averaged over different temporal patterns of reinforcementin a substimulus, substimuli with the same number of reinforcers produced choice proba-bilities that matched relative expected payoff rather than maximized one alternative.Key words: discrimination, decision rules, reinforcement, memory, matching, maximizing,

signal detection, T-schedule, key peck, pigeon

Most studies of reinforcement scheduleshave been concerned with the effect of rein-forcement schedules on responding maintainedby these schedules. The present approach isto examine discriminative properties of prob-abilistic reinforcement schedules in a choiceprocedure (Estes, Burke, Atkinson, & Frank-mann, 1957; Shimp, 1973). Psychophysicaltechniques then permit an analysis of stimu-lus control by reinforcement schedules. Ril-ling and MacDiarmid (1965), Pliskoff andGoldiamond (1966), Hobson (1975), and Lat-tal (1975) used schedules of reinforcement

This research was supported in part by Dare Associa-tion Grant #103. Portions of the data were reported at

the 1973 Eastern Psychological Association convention,and served to partially fulfill the requirements for a

PhD degree. The author wishes to thank John AnthonyNevin and Bruce Schneider for their helpful supervi-sion; Patrice M. Miller, Thomas R. Farrell, and PeterPastor for assistance in conducting the experiment; andmy numerous colleagues for editorial suggestions. Re-prints may be obtained from Michael L. Commons,Laboratory of Human Development, Harvard GraduateSchool of Education, Harvard University, Larsen Hall,Appian Way, Cambridge, Massachusetts 02138.

where the number of responses per reinforcerwere the discriminative stimuli. In the presentstudy, reinforcement density provided the ba-sis for discrimination.Using a signal detection analysis, Rilling

and MacDiarmid (1965) studied how pigeonsdiscriminated fixed ratio (FR) reinforcementschedules. When a center key came on, one oftwo FR schedules was arranged for respond-ing on it. Meeting the center-key requirementdarkened that key and turned on two sidekeys. After the lower valued FR, which maybe considered the signal plus noise (S + N),a left-key peck was reinforced and a right-keypeck darkened the chamber. After higher val-ued FR, which may be considered noise (N),a right-key peck was reinforced and a left-keypeck darkened the chamber. The discrimina-bility of the signal plus noise (S + N) sched-ules from the noise (N) schedules was mea-sured from isosensitivity curves where thecoordinates were the conditional probabilitiesof left-key pecks given signal plus noise, P(Hit)= P(LIS + N), and of left-key pecks given noisealone, P(False Alarm) = p(LIN). These choice

101

1979, 32, 101-120 NUMBER I (JULY)

MICHAEL L. COMMONS

probabilities provided a sensitive index of thepigeon's discrimination of the schedules.

Since a major difference between reinforce-ment schedules is their reinforcement density,and since that difference has been shown tocontrol rates and probabilities of responding(Catania and Reynolds, 1968; Herrnstein,1970), the present study examined the discrim-inative control exercised by different reinforce-ment densities.The discriminability of reinforcement den-

sity was examined in two ways. First, isosensi-tivity (ROC) curves were determined to seehow well pigeons discriminated schedules dif-fering in their reinforcement probability, p,and corresponding reinforcement density, Nd.A discriminability index, d', was found for theactual birds by comparing the obtained datato points and curves derived from assumingvarious models of optimizing ideal observerperformance (an observer that maximizes ob-tained reinforcement).

Second, decision rules were estimated fromthe data and compared to an ideal observer'sdecision rules. A decision rule is defined as theconditional probability of a particular choicegiven that certain situations or parameters ofsituations have occurred. It describes the con-trol of choice by various parameters of a stimu-lus, such as reinforcement amount and prox-imity of reinforcement opportunity to choice.It also provides a framework in terms of whichvarious versions of maximization laws andmatching laws may be examined.

METHOD

SubjectsFour male White Carneaux pigeons with ex-

tensive multiple-schedule histories served. Thebirds were maintained at approximately 80%of their free-feeding weights throughout theexperiment.

ApparatusTwo standard Lehigh Valley pigeon test

chambers with special 1519B Pigeon Intelli-gence Panels were used. Three translucentLehigh Valley pecking keys, 2.54 cm in diam-eter, were spaced 8.25 cm apart center to center,with the center of the outer keys 9.21 cm fromthe chamber sides. All keys were 25.40 cm fromthe floor. A model 111-05 stimulus light trans-illuminated each key. A peck of at least 9.0 g

(.9 N) operated a key's microswitch. The keycolors were red, green, and yellowish white forthe left, right, and center, respectively.The bottom of a feeder aperture (an opening

5.72 cm wide and 5.08 cm deep) was cen-tered 10.0 cm above the floor. Two pellet feed-ers in each box dispensed pellets into the hop-per. One feeder delivered 20-mg pellets afterdesignated center-key pecks, and after correctside-key pecks under some conditions. The sec-ond feeder delivered either 45-mg pellets or97-mg pellets, depending on the condition.The sounds of the different pellets as theyfell into the hopper were discriminably dif-ferent to a human observer. The hopper lightwas illuminated for 2.7 sec upon feeder oper-ation.A houselight located above the center key

was illuminated during most of the procedureexcept after a peck at an incorrect key duringpretraining.

StimuliThe two stimuli to be discriminated in the

present situation may be viewed from two dif-ferent perspectives. One emphasizes that eachstimulus is a schedule of reinforcement; theother looks at each stimulus as a distributionof the number of reinforcement opportunitiesprogrammed on a center key.From the first view, two T-schedules of re-

inforcement, one rich in reinforcement, Srich'the other lean, Slean were overlapping stimulito be discriminated in a choice situation. AT-schedule (Schoenfeld, Cumming, & Hearst,1956) consists of a T-sec cycle which containsa subcycle, tD, the time period when behaviormay be reinforced with some probability, p =p(SR+ IR). The T-schedule modifications usedhere were similar to Weissman's (1961)changes, particularly his procedure of cueingthe beginning of the tD portion of a 90-secT-cycle by illuminating the key. With thisprocedure, only the first response in tD pro-duces a reinforcer, with probability p(SR+ R)= 1. In the present experiment, each trial's12-sec stimulus period contained four 3-seccycles during which a substimulus sampledfrom one of the two p-valued T-schedule stim-uli was presented.From the other view, these two schedule

stimuli can be considered as consisting ofbinomial distributions of the number of re-inforcement opportunities in the sampled sub-

102

REINFORCEMENT DENSITY DISCRIMINATION USING CHOICE

stimulus, with each of the two schedule stim-uli yielding different frequencies of the same16 possible substimuli, as shown in both Fig-ure 1 and Table 1. As noted above, each sub-stimulus sample consisted of four consecutivelight-cued 3-sec cycles (tD = T = 3). On each3-sec cycle, the first center-key peck darkenedthat key and could be followed by reinforce-ment. The probability of reinforcement for acenter-key peck was p(SR+ IRa) = .75 when therich schedule was in effect and .25 when thelean schedule was in effect. If a bird failed topeck the key in 3 sec, food was not deliveredand the next 3-sec cycle was immediately initi-ated. Each set of four cycles will be referred toas a substimulus sample, S.. The pigeon's taskwas to discriminate whether the rich or thelean schedule was in effect during this sampleof four cycles.There are four levels of description of the

stimuli.1. On a "gross molar" or gross level, each

probabilistic reinforcement schedule, either the

) THEORETICAL--P=:0.25k

P = 0.75 V//l

Cl)

-J

gI-

IL.

0

Crw

z

O 1 2 3 4 -NUMBER OF CENTER KEY PECK

REINFORCEMENT OPPORTUNITIESPER TRIAL

Fig. 1. Two sets of binomial frequency distributionsshow the actual rich and lean stimulus reinforcementdistributions for center-key-pecks. The substimulusdensities, the number of possible reinforcement oppor-tunities in four cycles, ranged from 0 to 4. On the richschedule the probability of reinforcement for the firstcenter peck in a cycle, p, was .75, and on the lean pwas .25.

rich (p = .75) or the lean (p = .25) is viewedas a single "stimulus", here called Srich andSlean respectively: sampling considerations are

of no concern.2. At the "molar" level, all substimulus sam-

ples with the same number of reinforcementsare viewed as the same "stimulus". Each sub-stimulus, S,, has a reinforcement density,D(Sn), equal to Nd, the number of center-keypeck reinforcement opportunities over the fourcycles in the substimulus. Density, the defin-ing characteristic of a "stimulus" at this levelof analysis, ranges from zero to four rein-forcements per substimulus, giving rise to fivesuch "stimuli". The same-density substimu-lus, SNd, is the general member of a given"stimulus" set, with density Nd. This level ofanalysis distinguishes between number of re-inforcements within a sample stimulus but notbetween the particular 3-sec cycles within that12-sec stimulus period on which those rein-forcements are programmed.

3. At the "'molecular" level, the definitionof a "stimulus" involves the pattern of rein-forcement in a substimulus. Each substimulusis represented as a four-digit binary number.A number such as 0001 indicates a samplewith three cycles without reinforcement op-portunities followed by one cycle with an op-portunity. The substimuli, Sn, are numberedfrom 0000 to 1111. The left-most digit repre-sents the cycle furthest in time from choice,and the right-most represents the cycle im-mediately before choice.

4. At the "micro" level, the definition of a"stimulus" depends on whether or not there isa reinforcement opportunity on a particularcycle, t-seconds preceding choice, irrespectiveof what is programmed for its neighbors.There are 16 possible combinations in a 4-

cycle substimulus. These are shown in Table 1along with the probability that the particularsubstimulus will occur given that the rich (.75)or lean (.25) schedule was presented. The prob-ability that a particular set of four events willoccur was obtained by expanding the bino-mial (p + q)4 with p = .75 or .25 and q = 1- p. Actual presentation frequencies deviatedsomewhat from the expected frequencies be-cause of sampling. Therefore, both the ex-pected and actual frequencies of each eventacross the 256 trials in a session are given inthe table.

103

MICHAEL L. COMMONS

Table 1Description of stimuli. For each of the five possible molar classifications of substimuli withnumber of reinforced cycles, NI, going from 0 to 4, and each of the 16 possible molecularsubstimuli, S", going from 0000 to 1111, the theoretical and sampled probabilities of rein-forcement are given for the rich (LC), p = .75, and the lean (RC), p = .25, schedules. Thesum of the molecular frequencies equals the molar frequencies. On a molecular level 0indicates a nonreinforced cycle and 1 a reinforced cycle, the digit on the right beingclosest to choice. The theoretical and sampled numbers of occurrences of molar classifiedsubstimuli and molecular substimuli in the 256 trials from a session are also given.

Gross MolarMolar Level: Molecular Frequency ofNumber of Level: Rich frequency Lean frequency Correctsreinforcers Substimulus (LC) (RC) (LC + RC)

in a number in

substimulus binary Theo- Theo- Theo-Nd notation retical Sampled retical Sampled retical Sampled

0000 0.5 1 40.5 38 41 390 0.5 1 40.5 38 41 39

1000 1.5 1 13.5 17 15 180100 1.5 1 13.5 16 15 170010 1.5 3 13.5 14 15 170001 1.5 2 13.5 10 15 12

1 6.0 7 54.0 57 60 641100 4.5 3 4.5 4 9 71010 4.5 5 4.5 6 9 111001 4.5 6 4.5 4 9 100110 4.5 3 4.5 5 9 80101 4.5 6 4.5 6 9 120011 4.5 4 4.5 4 9 8

2 27.0 27 27.0 29 54 561110 13.5 17 1.5 2 15 191101 13.5 14 1.5 1 15 151011 13.5 11 1.5 0 15 110111 13.5 12 1.5 2 15 14

3 54.0 54 6.0 5 60 591111 405 38 0.5 0 41 38

4 40.5 38 0.5 0 41 38

128 127 128 129 256 256

A molecular-level stimulus, even 0000 or1111, could occur with either schedule stim-ulus. However, the greater the number of cy-cles having a reinforcement opportunity, thegreater the likelihood that the rich schedulewas in effect. Therefore, the lean (or rich)schedule was more likely to be in effect when0 or 1 (or 3 or 4) reinforcement opportuni-ties were presented on a trial (respectively).Substimuli with 2 reinforcement opportuni-ties occurred about equally often given eitherschedule.The procedure used here may be interpreted

not only in terms of four cycles of a cued T-schedule but also in terms of a cued random-interval (RI) or variable-interval (VI) schedule

with an average interval length equal to 3 sectimes p(SR+ IRe), which is % sec and % sec forthe lean and rich stimuli, respectively. The T-schedule in some ways resembles a VI sched-ule, and therefore the results from the presentexperiment may help to interpret multipleand concurrent VI-VI schedule performance.For the organism, a cued T-schedule differsfrom clocked VI (in which the beginning ofan interval is timed from the end of the firstinterval, irrespective of the organism's behav-ior) in two ways: (a) the beginning of everycycle (interval) was cued and (b) the first re-sponse in the first cycle (interval) could be re-inforced. The cue light that began the dockedcycle and the reinforcement schedule brought

104

REINFORCEMENT DENSITY DISCRIMINA TION USING CHOICE

the birds near the hopper and insured a shortlatency center-key peck at the beginning ofeach cycle. This presumably insured the ob-servation of reinforcement delivery. Clock-determined cycle length together with con-stant reinforcement probability within eachcycle provided a precise a priori descriptionof a trial's four-cycle reinforcement patternand a uniquely defined substimulus.A number of possible distributions could

have been used to program a substimulus' re-inforcement opportunities during the trial'sstimulus period. Rilling and MacDiarmid(1965), Pliskoff and Goldiamond (1966), andHobson (1975), used nonoverlapping distribu-tions of response events. Because decision rulesthat reflect how various aspects of the grossschedule stimuli control choice were of inter-est here, it was important to have the dis-crimination difficult enough to guarantee thatthe birds would make a number of errors. Toinsure this, there had to be trials where theparticular temporal sequence of reinforcementand nonreinforcement opportunities couldhave come from either schedule stimulus dis-tribution. This overlap of the two well-definedstimulus distributions could allow some of thenotions from signal detection theory to be ap-plied and examined. On a trial, for each cyclewithin a substimulus sample from a stimulusdistribution, the first key peck had to have thesame reinforcement probability. Hence, bino-mial distributions, 83(N,p) and /(N,l -p),with N = 4 (cycles), lean schedule p value of.25 and rich schedule p value of .75 were dic-tated.

ProcedureThe experiment was divided into two stages,

pretraining and training. Sessions were 76.8min and 256 trials long. During training, thestimuli to be discriminated were the lean andrich reinforcement schedules described above.During pretraining, however, the probabilityof reinforcement for a center-key peck for therich schedule was 1.0 and for the lean sched-ule was 0. During the first step of pretraining,the substimulus consisted of two 3-sec cycles.Within a trial, each cycle's first key peck wasreinforced with the same probability as allother cycles in that trial and always darkenedthe center key and, at first, also the house-light, both of which were reilluminated at thestart of the next cycle. If no pecks occurred the

light(s) remained on for 6 sec and no rein-forcement was delivered.On completion of the 12-sec stimulus pre-

sentation, a 6-sec choice period began with theillumination of the two side keys, while thecenter key either stayed dark or was darkened.For a rich schedule substimulus, the first left-key peck was reinforced ("Left Correct," orLC) and a right-key peck was unreinforced("Right Error," or RE). For a lean schedulesubstimulus, the first right-key peck was rein-forced ("Right Correct," or RC) and a left-key peck was unreinforced ("Left Error," orLE). The first side-key peck darkened and de-activated both keys until the choice periodended.The second step of pretraining began when

at least 95% of the choices were correct fortwo successive sessions. Each substimulus nowconsisted of four cycles. The probability thata center-key peck would be followed by rein-forcement remained at 1.0 when the richschedule was in effect and at 0 when the leanschedule was in effect. Corrective training(given as needed) for position preferences wasto present the rich schedule continually forexcessive right-key pecking and the lean sched-ule for excessive left-key pecking. Together, thetwo pretraining steps took up to 30 sessions.The four birds were then trained to discrim-

inate between the probabilistic rich (p = .75)and lean (p = .25) schedules. Each bird wastrained under different biasing conditions un-til discrimination performance was stable.These conditions differed in terms of size ofthe pellet obtained for correct choices of theleft and right keys: For a correct left-key peck,LC, pellet sizes were 97 mg, 45 mg, 20 mg,20 mg, 20 mg, and for a correct right-key peck,RC, they were 20 mg, 20 mg, 20 mg, 45 mg, and97 mg, for Conditions 1, 2, 3, 4, and 5, respec-tively. Thus the ratios of pellet sizes for cor-rect left-key to correct right-key pecks underthe five conditions were, in order, 5:1, 2.25:1,1:1, 1:2.25, 1:5. The corresponding relativeamounts of reinforcement for correct left-keypecks were .833, .692, .500, .308, and .167.Data were obtained at each biasing condi-

tion after choice behavior stabilized. Stabili-zation was said to have occurred when thedirection of change in left-key peck proba-bility, p(L), had reversed itself at least twiceafter a session, and the average of the next twosessions' change was less than ±.05. Data were

105

MICHAEL L. COMMONS

first collected following training with pelletsof equal size (20 mg) for both correct left-keyand correct right-key pecks (Baising Condi-tion 3). Then half the birds were given a 97-mg pellet for correct left-key pecks and a 20-mg pellet for correct right-key pecks (BiasingCondition 1), and the other half were givena 45-mg pellet for correct left-key pecks anda 20-mg pellet for correct right-key pecks (Bi-asing Condition 2). Then the birds in BiasingCondition 1 were switched to Biasing Condi-tion 2, and the birds in Biasing Condition 2were switched to Biasing Condition 1. Next,the conditions were reversed with the choicenow biased to the right; half the birds weregiven a 20-mg pellet for correct left-key pecksand a 97-mg pellet for correct right-key pecks(Biasing Condition 5), and half were given a20-mg pellet for correct left-key pecks and a45-mg pellet for correct right-key pecks (Bias-ing Condition 4). Then Biasing Conditions 4and 5 were switched for each group of birds.

RESULTSFigure 2 shows the sensitivity data of Birds

27, 29, 31, and 85. The probability of a left-key peck given that a rich schedule sample waspresented, P(Hit), is shown as a function ofthe probability of a left-key peck given thatthe lean schedule sample was presented,P(False Alarmn). The x's on the solid isosensi-tivity curves, also called receiver operatingcharacteristic (ROC) curves, are plots of theo-retical probabilities of hits versus false alarmsexpected for the distributions shown in Fig-ure 1, given (in a continuous model describedbelow) three different levels of sensitivity,or d'. These x's would be obtained if a pi-geon followed a decision rule described bystatistical decision theory. A particular inte-gral value of substimulus reinforcement den-sity serves as a criterion. For substimuli per-ceived to have a reinforcement density abovethis criterion only left-key pecks are made;for substimuli below, only right-key pecksare made.These three isosensitivity functions are

found by determining P(Hit) and P(FalseAlarm) coordinates from an appropriate Le-besguc integral of the binomial distributionsin Figure 1, on the assumption that the meansof the rich and lean stimuli were separated by2.0, 1.5, and 1.0 reinforcement opportunities(maximal to moderate separation). For the bi-

nomial distributions in Figure 1, criteria wereset at each boundary where the number of re-inforcement opportunities incremented. Thesecriteria were used to establish the points in-dicated by x's shown in Figure 2.To test whether or not birds in some sense

"counted" the number of reinforcements ina substimulus, performance was compared topredictions obtained from an ideal observerwho counted and whose performance dependedon an assumed discrete distribution. Perfor-mance was compared also to predictions ob-tained from an ideal observer who did notcount whose performance depended on an as-sumed pointwise discontinuous distribution.The details of the models are presented in theAppendix. Two versions of each model areconsidered. One assumes that only the differ-ence between the mean number of reinforce-ments for the rich and lean schedule decreases(while the variance remains constant), and theother assumes that the differences betweenprobability of reinforcement for the rich andlean schedules decrease, producing a decreasein separation between means. These two formsof the two models are presented so that the d'values corresponding to the theoretical pointsor curves may be more clearly interpretedand more nearly comparable to results fromother experiments. The actual data points didnot cluster around the points predicted byeither form of the discrete counting model,suggesting that the birds did not, in the sensedefined by these models, "count" reinforce-ments.At the gross molar level, if the birds fol-

lowed the statistical decision rule describedearlier and correctly identified each substimu-lus density, the probability of hits and falsealarms obtained would fall on the top isosen-sitivity curve. Points below this line indicatethat performance is worse than that of thesecond ideal observer (d' = 2.19). Biases intro-duced by changing relative amount of rein-forcement for correct left-key and correct right-key pecks should appear as different points onthe same isosensitivity curve.

Greater sensitivity is displayed in Figure 2by empirical points lying closer to the top oneof the curves for ideal observers. Point cluster-ing indicates a lack of day-to-day variation.The degree of the sensitivity for the four birds,in order from least to most sensitive bird, is85, 27, 29, and 31, with the sensitivities of

106

REINFORCEMENT DENSITY DISCRIMINATION USING CHOICE

CIRCLED SYMBOLS AREAVERAGES FOR 4 DAYS

BIRD 27

BIRD 31

0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 .0

P (FALSE ALARMS)Fig. 2. Isosensitivity curves. The probability of a Hit, P(Hit), is equal to the number of correct left-key pecks

divided by the number of possible correct left-key pecks. The probability of a False Alarm, P(False Alarm), isequal to the number of incorrect left-key pecks divided by the number of possible correct right-key pecks. Thesolid lines show isosensitivity curves for three values of d'. The top curve would be obtained if the subject fol-lowed a second ideal decision rule (see text).

Birds 27 and 29 being similar. The values ofd' for many of the conditions were usuallybetween the top and bottom isosensitivitycurves, except for Bird 85 at Biasing Condi-tions 3 and 5. The biasing effect was leastfor Bird 31.The difference between maximum attain-

able performance and the obtained perfor-

mance might be attributable to normally dis-tributed random errors. Later informationwill show that this assumption is not appro-priate here, insofar as it is possible to obtainrepresentations of the decision rules and tofind nonnormally distributed error sources,such as the effect of the decrease in stimuluscontrol as a function of increasing time and

1.0

BIRD 290.4

0.2

Uf)

I-0L-

0.4

0.2

0

107

MICHAEL L. COMMONS

theeve

thatherecmereiito

IIpecRCconorliv(of :letofleftofamrel

ItobeconThstaiFig

FjplotAO,,

refore decreasing memory for important ditions, the behavior of this bird was similar!nts. to that of the others.rhe effects of three experimental variables Recall from above that substimuli may beit may control choice will be analyzed: (a) described on three other levels. On the molarrelative reinforcement amount for a cor- level, only reinforcement density is considered

rt left-key peck, (b) the number of reinforce- and substimuli take on reinforcement densitynt opportunities in a substimulus, and (c) values from 0 to 4. On a molecular level, therenforcement opportunity proximity in time are 16 substimuli, each indicating a particularthe choice point. reinforcement pattern that may occur over theSias, which is the probability of a left-key 4 cycles. On the molecular level, substimuliIk, p(L) = (LC + LE)/(LC + LE + RE + are described using binary notation. For ex-

was examined as a function of biasing ample, substimuli with density value of 1 atidition in Figure 3. Relative Reinforcement the molar level can have molecular values ofbiasing, AOL, is the ratio of pellet size de- 0001, 0010, 0100, and 1000.ered for a correct left-key peck to the sum Figure 4 shows both molecular and molarleft-correct pellet size and right-correct pel- level data. At the molecular level, each rowsize. Bias, or p(L), is an increasing function shows one bird's p(L) for each substimulusincreasing ALL: as shown in Figure 3, the density, during the four final sessions of train-t-key peck probability ranged from a p(L) ing on that condition. Each point indicatesabout .31 with a relative reinforcement the probability of a left-key peck to a givenvount of .17 to a p(L) of about .80 with a substimulus, p(LjSn), under the given condi-ative reinforcement amount of .83. tion. At the molar level, for each biasing value,Bias did not differ substantially from bird the decision rules are indicated by the averagebird. Bird 85's one anomalous point might left-peck probability for substimuli at a givenattributable to its data being from the first density, P(LISNd). This weighted average wasLdition conducted (Biasing Condition 3). found by summing P(LISn) for each same-den-is one point corresponds to a loss of sub- sity substimulus, weighted by the frequency ofntial reinforcement as seen by the low d' in the substimulus, P(LISN) = X P(LjSn)P(Sn).ure 2. After exposure to other biasing con- D(8n) =Nd

The empirical fact that the molecular pointsdo not fall on the line through the weighted

BIAS NUMBER average P(LISNd) suggests that, although the5 4 3 2 I substimuli are equivalent on a molar level by1.0 l l definition, they are not equivalent on a molec-

ular level. That is, at the molecular level,0.8 _ BIRD 27 0 substimuli act differently in determining the

329 0 left-key peck probability. The proximity of31 a85 * reinforcement to choice appears to have a

0.6 substantial impact. This will be considerednext.The effect of the proximity to choice of a

0.4 - reinforcement opportunity will be shown first,followed by the effect of the proximity tochoice of the omission of a reinforcement op-

0.2 portunity. At the molar level, density 1 sub-stimuli, D(S") = Nd = 1, provide exactly onereinforcement opportunity, and density 3 sub-

0.0 0.2 0.4 0.6 0.8 1.0 stimuli provide exactly 3 reinforcement oppor-tunities. For the density 3 substimuli, one of

RELATIVE AMOUNT OF the four cycles has a reinforcement opportunityREINFORCEMENT omitted, making density 1 and 3 substimuli

ig. 3. The probability of a left-key peck, p(L), comparabile opposites.tted against the relative amount of reinforcement, Figure 5 shows the effects of the distance, (biasing), for correct left-key pecks. between a choice and a single reinforcement

108

REINFORCEMENT DENSITY DISCRIMINA TION USING CHOICE

Table 2Expected molar-level decisions to peck left to a substimulus with a reinforcement densityof Nd. The predictions of P(LJSiv), based upon matching the probability of a left-key peckfor a given density substimulus, to the relative expected payoff for that choice, are shownas a function of the number of cycles in a substimulus containing a reinforcement op-portunity (Nd) and the relative reinforcement amount for correct left-key pecks (BiasingCondition). Left correct probability to a given density substimulus, P(LISN ) times therelative reinforcement amount, APL, equals amount of reinforcement for correct left peckrelative to amount for left correct pecks and right correct pecks, yielding expected payoff.

Expected p(L)* at the Molar Level

Probability RelativeNumber of of a L or expectedreinforcers R being Relative payoff

in a correct reinforcement for asubstimulus p(LCISNd) amount, Ap Expected payoff left peck

Nd Biasing (p(RCISNd)) APL APR EP(LISNd) EP(RISNd) Rel EP(LISNd)4 1 1.0000 .833 .167 .833 .000 1.0000

2 (.0000) .692 .308 .692 .000 1.00003 .500 .500 .500 .000 1.00004 .308 .692 .308 .000 1.00005 .167 .833 .167 .000 1.0000

3 1 .9125 .833 .167 .763 .014 .9822 (.0845) .692 .308 .634 .026 .9613 .500 .500 .468 .042 .9154 .308 .692 .282 .059 .8285 .167 .833 .153 .071 .683

2 1 .4821 .833 .167 .402 .086 .8232 (.5179) .692 .308 .334 .166 .6683 .500 .500 .241 .259 .4824 .308 .692 .148 .359 .2925 .167 .833 .080 .432 .157

1 1 .1093 .833 .167 .091 .148 .2802 (.8907) .692 .308 .076 .274 .2163 .500 .500 .055 .445 .1094 .308 .692 .034 .617 .0525 .167 .833 .018 .742 .024

0 1 .0256 .833 .167 .021 .162 .1162 (.9744) .692 .308 .018 .300 .0563 .500 .500 .013 .487 .0264 .308 .692 .008 .675 .0125 .167 .833 .004 .812 .005

opportunity and a following choice at all fivebiasing conditions. The deviation in the left-key peck probability from the average proba-bility, Ap(L), is shown as a function of thecycle during which the reinforcement op-portunity occurred. Here, Ap(L) equals p(LIS,)minus the average of P(LISn) with D(S") = 1.In other words, Ap(L) equals p(L) for a sub-stimulus with a single reinforcement opportu-nity minus the average p(L) for all density 1substimuli. Note that the expected substimu-lus for density 1 substimuli is (1/4,1/4,1/4,1/4),which is close to substimulus (0,0,0,0). There-fore, AP(L) reflects not only the deviation ofp(L) from the average p(L), but to a greater

extent the effect of adding an opportunity toa given cycle. The value of Ap(L) is largestwhen a reinforcement opportunity occursclosest to choice, as is the case for substimulus0001. It rapidly decreases as the opportunitymoves away from choice. The change in slopefor biases 4 and 5 is less pronounced for Birds85 and 29, since for these last two cases thesebirds almost always pecked the right key. Forthe other birds, a smaller but similar effect isseen for bias 5.

Figure 6 shows -Ap(L) as a function of thedistance between a reinforcement omission anda following choice at all five biasing condi-tions. The value of -Ap(L) equals -(p(LIS,)

109

MICHAEL L. COMMONS

STIMULUS DENSITY GROUP .0:0 * 0000

STIMULUS DENSITY GROUP 1:* AVERAGE

1 ' 00012 ' 00104 a 01008 0 1000

1.0-

0.8

0.6-

0.4- / BIRD 27

0.2- BIAS I

0 . .

ISTIMULUS DENSITY GROUP 2:

* AVERAGE3 * 00115 A 01016 a 01109 V 100110 0 101012' 1100

I. __ _ _ __ _ _

STIMULUS DENSITY GROUP 3:* AVERAGE

7 *011111 A 101113a 1101140 1110

STIMULUS DENSITY GROUP 4:15* 11111

BIRD 29 BIRD 29

BIAS 2 BIAS 3

*a

II/U2 ~~~~V

BIRD 31 BIRD 31. 81AS 4 / QIAS 5

a

1. 1 a

1.0 B s-* BIRD 65 BIRD 85 BIRD 850°sSTRs@D~~~ & BID95 SIS40.6 AS IAS 3 B aS4 & BI1ASS5& Y0.6 A A0

0.2 BIAS I BIAS 2 lb.*0.

1 2 3 4 01 2 3. 4 01 2 4 0 1 2 3 4 0 1 2 3 4STIMULUS DENSITY GROUP

Fig. 4. Left-key-peck probability, p(L) for each individual substimulus at each biasing, arranged so that allsame-reinforcement-density substimuli are grouped above that reinforcement density number. The solid lineshows the average p(L) for each substimulus.

minus the average p(LISn, D(Sn) = 3)). That is,the right-hand side of the equation is the valueof p(L) for a substimulus with a single rein-forcement omission in the ith cycle, minus theaverage p(L) of those density 3 substimuli, alltimes minus one. This deviation reflects thecontrol exerted by a "missing" reinforcementopportunity. The curves in Figures 5 and 6have very similar shapes. The bias'ing effect

in Figure 6 is reversed from that in Figure 5,in the sense that there is more flattening forlower numbered biasing conditions when pay-off is greater for a correct left than for a rightchoice. In either case, the flattening may rep-resent a floor or a ceiling effect at the moreextreme values. The birds, ranked in order ofincreasing sensitivity to reinforcement omis-sion, were Birds 31, 27, 29, and 85, with Birds

110

REINFORCEMENT DENSITY DISCRIMINATION USING CHOICE

BIRD 27

BIRD 31

III

0001 0010 0100 1000

BIRD 85

I I I

0001 0010 010010,

1000STIMULUS

Fig. 5. The positive change in p(L), equal to the p(L) of a substimulus with a single reinforcement opportunityon the ith cycle, minus the average p(L) of those density 1 substimuli, plotted as a function of the cycle beforechoice, i, in which that reinforcement opportunity fell, p(L) = P(LIS,.) - average p(LIS, D(SR) = N4 = 1). Effect ofbiasing is also shown.

0.4-

0.2 -

0-

-0.2--

-J-

a.-0.4-

0.6-

0.4 -

0.2 -

Ot

-0.2-

-0.4- A.

III

MICHAEL L. COMMONS

0.2

0O

-i -0.4

0.4-

0.2-

0-

-0.2

BIRD 27

BIRD 31

1110 1101 1011 0111 1110 1101 10l 0111

STIMULUSFig. 6. The negative change in p(L), equal to the negative ot the p(L) to a substimulus with a single

reinforcement opportunity omission on the ith cycle, minus the average p(L) of those density 3 substimuli, isplotted as a function of the cycle before choice, i, in which that reinforcement opportunity fell (absence case),-Ap(L)=-(p(LjSn) - average p(LjS, D(Sn)=N4=3)). This is the complementary graph to the previous one.The effect of biasing is shown.

29 and 85 being very close. This ranking cor-

relates well with the overall sensitivity as seen

in isosensitivity curves in Figure 2.

DISCUSSION

Figure 2 suggests that the sensitivity dis-played by three birds was not greatly inferior,

at a gross molar level, to that of an ideal ob-server. Let us now ask how choice behaviorwas controlled at the gross molar, molar, mo-

lecular, and micro levels, and in what sense

birds did not discriminate reinforcement den-sity with maximum possible sensitivity.The present data suggest that choice behav-

ior was controlled by reinforcement factors,

BIRD 29

BIRD 85

112

REINFORCEMENT DENSITY DISCRIMINATION USING CHOICE

including both the amount of reinforcementfor a correct choice and the number of rein-forcement opportunities during the stimulusperiod, as well as the length of time by whichstimulus events preceded choice.

It will be shown here that the decision rulesat the molar level for making a left-key peckas a function of the number of reinforcementopportunities in a substimulus can be de-scribed as matching-to-relative-expected pay-off. This deviation from an optimizing idealobserver's rule in which the response alterna-tive with the higher expected payoff is maxi-mized appears to be due to diminution in thecontrol over choice by earlier events withinthe substimulus, i.e., by forgetting. This molarmatching law fails to account for systematicdifferences in P(L) at the molecular level.Maximizing and matching models will next

be contrasted with actual performance at thefour different levels of analysis. These contrastswill help to identify a bird's decision rule andto show how it differs from an ideal observer'srule.A general form of matching and maximiz-

ing derived from Herrnstein's (1970) accountwill be applied at each level. Assume first that,in a discrete-trial choice situation, obtained re-sponse probability may be treated as equiva-lent to the relative rate of responding in a freeoperant situation. Obtained response proba-bility, p(L), may be conditionalized upon agiven stimulus condition, Si, or p(LISi). As de-scribed above, this stimulus condition is speci-fied differently at each level of description.Assume second that programmed relative rein-forcement proportion is equivalent to pro-grammed relative reinforcement rate, ratherthan obtained relative reinforcement rate. Thespecification of P(LISi) calls for a generaliza-tion of the notion of programmed relative re-inforcement proportion to the programmedrelative expected payoff for a left-key peck ina given stimulus condition. The relative EP(LIS,) is the ratio of the expected payoff for aleft-key peck, EP(LISj) to the sum of the ex-pected payoffs for both key pecks:

Rel EP(L IS) =_ EP(LISj)EP(L Si) + EP(R IS)(1)

The expected payoff for a left-key peck in agiven stimulus condition, EP(LjSj), may beinterpreted as the product of the conditional

probability of reinforcement for a left-key peckin the given stimulus condition, p(LCISj), andits relative reinforcement amount, ALL:

EP(L S) = P(LCISi) * ALL;EP(RISi) = P(RCISj) * ApR

= [1 - P(LCJSi)] * (1 - ALL).

(2)

(3)The relative reinforcement amount, A3L =ALC!(ALC + ARC), where ALC is the amount ofreinforcement for a correct left-key peck andARC is the amount of reinforcement for a cor-rect right-key peck. Relative expected payoffretains the common notion of programmedrelative amount of reinforcement (Catania,1963).A decision rule, p(LISi)*, is designed to pre-

dict the obtained probability of a left peck,P(LISi). Specifically, such a rule predicts theleft-key peck proportion under a given set ofstimulus conditions as a function of the rela-tive expected payoff for making that responseunder those stimulus conditions. Two suchdecision rules will be considered here. One isa maximizing and the other is a matchingrule.To compactly write the maximizing rule, an

indicator notion will be introduced (Loeve,1963). The indicator of set A is the functionwhich assigns the value 1 to all points in Aand value 0 at all points in the complement,A'. When A = (xIx > k}, the indicator of Awill be denoted by Ik(X) (sometimes writtenk(X). Here, k = .5 and x = Rel EP(LISi).Therefore,

I.5(X) = 1 if x > .50 if x . .5.

The maximization relation states that P(LISj)is equal to the indicator of the set of all RelEP(LISj) whose values are greater than .5. Themaximizing rule then is:

P(LIS)* =15 Rel EP(LISi)1 if Rel EP(L Si) > .5

- 0 if Rel EP(L S) < .5, (4)and the matching rule is:

P(LISi)# = Rel EP(LISi).At the gross molar level, two sets of decision

rules might apply depending on the extentto which the rich- and lean-schedule stimuliwere discriminable from one another. Thecase where the two stimuli are not discrimi-

(5)

113

MICHAEL L. COMMONS

nable is equivalent to a probability learningsituation with p(LJSi)* = p(L)* since payoff-controlled response bias completely overshad-ows probability of stimulus-presentation con-trolled bias (Nevin, 1969). The correspondingfirst form of maximizing, p(L) = 15 Rel EP(L)is called overall response alternative maximiz-ing where, irrespective of stimulus differences,one alternative is always chosen. This may becontrasted with a first form of matching, p(L)*= Rel EP(L), which is called overall payoffmatching, where an alternative is chosen inproportion to overall payoff, a degenerateform of expected payoff. The case where thestimuli are perfectly discriminable may beidealized by a second form of either maxi-mizing or matching. According to conditionalmaximizing, p(LIS)* = I.5 Rel EP(L Si) wherei = lean, rich, one response alternative is al-ways chosen on trials where a particular stim-ulus is presented (Coulson, Koffer, & Coulson,1971; Herrnstein & Loveland, 1975). This isthe same as matching the probability of a re-sponse to the stimulus presentation probabil-ity, p(LjSi)* = p(Sj), a degenerate form ofmatching to relative expected payoff when rel-ative reinforcement amount is disregarded.

Figure 3 showed that as biasing increased,the overall bias, p(L), linearly increased. Thisresult suggests imperfect discrimination at thegross molar level. The slope of the lines sug-gests a slight degree of overall payoff under-matching in the direction away from overallresponse alternative maximizing. Conditionalmaximizing would not be expected because anideal observer could only imperfectly discrim-inate between the two overlapping probabil-istic stimuli. Although matching in the prob-ability learning sense at the gross molar levelseems more clearly indicated than conditionalmaximizing for unequal biasing, the birds diddiscriminate between the two stimuli. It isadvisable, therefore, to continue the analysisto see how overall matching arises when thereis this discrimination.Both a maximizing model and a matching

model are considered next at the molar level.For an optimizing ideal observer, the maxi-mizing model is:

P(LISN ) = 1.5 Rel EP(LISNd),Nd = 0, 1, 2, 3, 4; Si = SNd

(6)and P(LISN) is independent of the proximity

of a reinforcement opportunity to a followingchoice. Sensitivity could be reduced by any de-viation from this ideal decision rule due toany combination of control by: (a) proximityof events to choice, (b) biasing, and (c) a ratioof expected payoff for the optimal responserelative to the other response too close to one.To maximize reinforcement for choice withinthe range of biases used here, the optimizingideal observer always pecks right for density 0and 1 substimuli, left for density 3 and 4 sub-stimuli, and pecks either right or left for den-sity 2 substimuli, depending on whether bias-ing was to the right or left. This ideal decisionrule maximizes reinforcement for all biasingsince Rel EP(LISNd) < .3 and Rel EP(R ISNd) >.7 for density 0 and 1 substimuli, and RelEP(RISNd) < .3 and Rel EP(LISNd) > .7 to den-sity 3 and 4 substimuli, as was shown in Ta-ble 2. For density 2 substimuli, an ideal ob-server always maximizes by pecking the higherpayoff side key. This conditional maximizingcould produce matching of the response prob-ability to stimulus presentation probability,as described previously.At the molar level, empirical evidence of

Terman and Terman (1972), Coulson et al.(1971), and Herrnstein and Loveland (1975)suggests that an alternative will be maximizedin a noncorrection situation as long as relativereinforcement rate for alternative responsesdiffers significantly from .5. Also, the same keyshould be pecked following events with thesame value, where value is measured along acritical dimension such as reinforcement den-sity. If substimulus reinforcement density alonecontrols choice, one response alternative wouldbe maximized.A value of Rel EP(RjISi) somewhat close to

.5 does not always interfere with maximizing.For instance, rats maximized p(L) when RelEP(L) = .6 in a situation where a left-bar pressled to a variable-interval delay of 10 sec fol-lowed by a differential-reinforcement-of-other-behavior 5-sec schedule (tand VID 10 DRO 5),and a right-bar press led to a tand VID 25DRO 5-sec schedule (Coulson et al., 1971). Inthe present case, for low-reinforcement densitysubstimuli, p(L) should tend toward 0 and forhigh density substimuli, toward 1, since cor-rect choices were differentially reinforced, anddifferences in the temporal location of rein-forcement for a substimulus were irrelevant,

114

REINFORCEMENT DENSITY DISCRIMINATION USING CHOICE

as were the variations in the comparison stim-uli in the Coulson et al. experiment.

Pigeons on a concurrent variable-ratio vari-able-ratio schedule maximized reinforcementsper response by pecking the key correlated withthe smaller ratio so that p(L) often approxi-mated 0 or 1 (Herrnstein & Loveland, 1975).In the present study, both alternative keypecks were often reinforced for almost everysubstimulus and for same-density substimulisince the rich and lean schedule overlapped.However, this should not have much effect onmaximization as long as relative amount ofreinforcement for left- versus right-pecks is nottoo close to .5 (Herrnstein & Loveland).The points connected by the solid lines in

Figure 4 illustrate that the empirical decisionrules deviated from the ideal observer's maxi-mizing decision rule, Equation 6, in threeways. First, although density 0 and 4 substim-uli produced correct choices almost always byminimizing p(L) in the former and maximiz-ing p(L) in the latter case, density 1 and 3substimuli, which should have produced thesame p(L) as 0 and 4, respectively, did not.Hence, at the molar level density was not dis-criminated as an ideal observer would. Second,the proximity of a reinforcement opportunityto choice produced a large ordered deviationin each p(LISn) from the average p(LIS.) ofsame-density substimuli, as shown in Figures4, 5, and 6. Third, the probability of peckinga given key to same-density substimuli wasneither the same nor maximized for all same-density substimuli (except when there were noother substimuli of the same density). This in-dicated that the birds neither "counted" rein-forcement opportunities nor perfectly discrim-inated density; either each substimulus wasnot discriminated or one response alternativefor a given substimulus was not maximized.

Instead of behaving like ideal observers, thebirds behaved as if they had a second strategywhich minimized the reinforcement loss forchoice that is produced by poor memory forevents in the stimuli. With this second strat-egy, they matched some central tendency ofp(L), e.g., the average p(L) or the median p(L)for same density substimuli, to the relative ex-pected payoff for those same density substim-uli. The matching model's predicted left-peckprobability, p(LjSi)*, equals the relative ex-pected payoff for a left-key peck to a givendensity of reinforcement substimulus. The Si

in this case are all the substimuli with thesame density, SNd.

p(LISNd)*= Rel EP(L |SNd),D(S.) = Nd,Nd=O, 1,2,3,4. (7)

The predicted p(LISNd) versus substimulus re-inforcement density is plotted in the top leftpanel of Figure 7. The biasing parameter gen-erates the five curves, each plotted separatelyin its own panel.This prediction in effect states that a sub-

ject matches its choice to the relative expectedvalues of the alternatives (Herrnstein, 1970).If a correction procedure were used, instead ofthe present noncorrection procedure, so thatan incorrect choice did not "waste" a reinforce-ment opportunity, the present model wouldreduce to Herrnstein's model. Then the prob-ability of a trial ending with a correct peck,given any substimulus, would be one, and onlyrelative bias would have an effect. The bias-ing does not change the shape of the theoreti-cal p(LISNd) decision curves, it just shiftsthem. Also, sensitivity remains constant acrosschanges in biasing. Temporal location of rein-forcement opportunities within a substimulusshould have no effect.On a molar level, consider that the left-key

peck probability for each same-density sub-stimulus, p(LISNd), closely matches the relativeexpected payoff for left-key pecks to thosesame-density substimuli. Compare the averageP(LISNd) lines in Figure 4 to the predictedP(LISNd)* lines in Figure 7. Since the relation-ship between P(LISn) and reinforcement op-portunity proximity is nonlinear, as was shownin Figures 5 and 6, the median p(LIS.) at eachbiasing value and each substimulus densitybetter describes the decision rules than theaverage shown in Figure 4. However, the val-ues obtained with the two measures did notdiffer very much. A measure of central ten-dency that represents performance at the mo-lar level masks the effect of individual same-density substimuli. In Figure 7, the medianp(LISn), D(St) = Nd for Birds 27, 29, and 31, iscompared to the expected p(LISNd) for eachbias. Bird 85's data were omitted for threereasons: (a) the effect of biasing was disorderlyas shown by the amomalous point in Figure 3,and in the bottom panel, biasing condition 3,

115

MICHAEL L. COMMONS

CIRCLED SYMBOLS ARE PREDICTED VALUESBIAS I

BIAS I -a

5-0

i~~

BIAS 2

_

I

1.00] BIAS 4

.. ..-

BIAS 3

a

BIAS 5

0

I 9 *I I _ - ,I l. ,.

0 I 2 3 4 0 1 2 3NUMBER OF REINFORCEMENT OPPORTUNITIES

IN A SUBSTIMULUS

116

0

I .00 -

0.8-

- 0.6-

IL-0.4-

0.2-

0.6

0.4

0.2-

0-4

REINFORCEMENT DENSITY DISCRIMINATION USING CHOICE

in Figure 4; (d) there was not much of an ef-fect of including Bird 85's data other than atthe anomalous point; and (c) actual datapoints were preferred as representatives ofcentral tendency, rather than the average oftwo values. This situation is possible onlywhen using three birds. The predicted pointsare the circled points connected by lines.Where the obtained points fall on the pre-dicted values they are omitted. Only five (20%)of the obtained points differed by more than5% from the predicted P(LISNd)r values.The overall matching obtained at both gross

molar and molar levels was an average pro-duced by more left pecks to substimuli withreinforcement closer to choice than to sub-stimuli having the same reinforcement butwith reinforcement further from choice.Matching at the molar level should be the

second best strategy. Shifts away from over-all matching produce lower total amounts ofreinforcement. If a bird shifts to the leftmore for 1110 to maximize reinforcement forchoice given that substimulus, then it alsoshifts to the left for 0001. What is gainedby the shift for 1110 is more than offset bythe loss for the shift at 0001. This assumesthat a bird discriminates density as well aspossible given a limited memory. To haveconstant overall sensitivity, changes in re-sponse probability to one substimulus mustresult in changes in response probability toall. A bird cannot shift bias for just oneor two substimuli because that would changeits overall sensitivity to reinforcement den-sity. It can only shift overall bias. Thisforces a bird to set the overall bias and estab-lish an anchor point. The Rel EP(LISNd) setsthe bias for a set of same-reinforcement den-sity substimuli, thereby setting the overall bias.This rationale for matching does not, how-

ever, explain how matching behavior is gener-ated. Local maximizing in some form wouldoccur if the criteria for choice were condi-tional on, for instance, certainty about sub-stimuli. There is no way to distinguish localmaximizing of reinforcement from molecularmatching in the present study because thereis no independent estimate of the perceivedvalue of a substimulus; nor is there an inde-

pendent estimate of the degree of impairedsensitivity to the events within a substimulus.One property of substimuli is known, however-the temporal property.The effect of concatenating events in a tem-

poral sequence must be taken into account toexplain p(LISn) at the molecular level. First,the temporal properties of S,, will be discussed.At this level temporal factors partially accountfor the deterioration of reinforcement-densitydiscrimination. For same-density substimuli,p(L) decreased as the reinforcement opportu-nities occurred further from choice, as wasseen in Figure 5. For the obverse case, p(L)increased to a smaller extent as the missingreinforcement opportunity occurred furtherfrom choice, as was seen in Figure 6. Re-inforcement opportunity occurrence exertedmore control over choice than nonoccurrence.At the micro level, control by a single event

preceding choice decreased as a function ofthe event's temporal separation from choice.In Figure 8, control by an event x secondsbefore a choice in the present experiment iscompared to control in a delayed matching-to-sample experiment by Berryman, Cumming,and Nevin (1963). There, the sample was eithera red, green, or blue center key. An -observingcenter-key peck turned it off and, after a delay,turned on the two side-key comparison stimuli.A peck on a side key with a hue that matchedthe center key's hue produced reinforcement;a nonmatching side-key peck was followed bya blackout.To directly compare these results, compara-

ble strength-of-control measures must be used.In the Berryman et al. (1963) experiment, thecorrect choice probability, p(C), may be saidto define stimulus control by a sample event.As time between the sample event and thechoice increased from 0 to 25 seconds, themedian p(C) for three birds decreased fromapproximately 1.0 to .5.

Matters were more complicated in the pres-ent study. In the unbiased condition, the eventwhose control over choice was examined waseither a single addition of a reinforcementopportunity on one of four cycles (density 1substimuli) or a single omission of a reinforce-ment opportunity on one of four cycles (den-

Fig. 7. Median left key peck probability for three birds, median P(LjSN ), to same density substimuli. Thecurves shown in panel 1 are the expected left-key-peck probabilities, P(LISNj)*, given Equation 7.

117

MICHAEL L. COMMONS

w

w

0R

l00-

90-so-X

70-

60 -

50-

40 .......................

- 0.6

- 0.4

-0.22

- 0.0

-0.2

-0.4

0 5 lo Is 20 25SECONDS

Fig. 8. The left y-axis has the median probability of a

correct match-to-sample for three pigeons. The tri-angles represent that median plotted against differentdelays between the discriminative stimulus and thechoice, from Berryman, Cumming, and Nevin (1963).The right-hand axis has the probability of a correctdiscrimination scaled in the reverse manner. The inde-pendent variable in the second case is time beforechoice of the singular occurrence of a reinforcementopportunity for substimuli with a reinforcement den-sity of 1, and time before choice for the singular non-occurrence of a reinforcement opportunity for sub-stimuli with a reinforcement density of 3.

sity 3 substimuli). As time between either thesingle addition or omission and choice in-creases, p(C) = p(LC) + P(RC) increases.These increases in control show up as in-

creases in the probability of an error, p(E). Fordensity 1 substimuli, an ideal observer wouldalways peck the right key, since EP(L Sn) <.055; any increase in p(L) as the single eventoccurred closer to choice, would lead to slightincreases in p(LC) and to large decreases inp(RC) with their sum, p(C), decreasing. Simi-larly, as the omitted reinforcement opportu-nity occurred closer to choice, p(L) would de-crease and error probability, p(E), wouldincrease. Hence, an increase in control by thesesingle events would be reflected in a decreasein p(C), which would go from approximately.9 for no control to .5 for the maximum con-

trol achieved in the present experiment. Themaximum control would occur for the shortesttime span; that is, the cycle only 3 sec awayfrom choice, with p(C) =.5. To read strengthof control directly in both experiments, theright hand y-axis of Figure 8 was inverted.The median strength of single event controlfor Birds 29, 31, and 85 in the present experi-ment (on the right hand y-axis) may then be

directly compared to the median strength ofdiscriminative control for Birds 170, 171, and172 in the Berryman et al. (1963) experiment(on the left-hand y-axis), as a function of theintervening interval size. The three curveshave similar shapes, the upward displacementof the extra reinforcement curves from themissing reinforcement curves being due to thedifferential effect of an additional event on ablank background versus a missing event ona background involving reinforcement.Two models show how results from the mo-

lar and micro levels may be combined to ex-plain performance at the molecular level. Be-cause the class of same-density substimuli hasonly one member each for density 0 and 4,any predictions at the molar level for themare identical to predictions at the molecularlevel; therefore, differences between those pre-dictions are trivial. For density 1, 2, ana 3,there are four, six, and four substimuli, re-spectively. In the first model, a weighted meanvalue of p(LfSn) = Nd is set by the mat-chinglaw, and the deviations from this weightedmean are set by some forgetting law. For den-sity 1 and 3 substimuli the memory curves arerelatively simple. Both the curve for singlereinforcement events (density 1) and the flat-ter curve for single missing reinforcementevents (density 3) show the decrement in con-trol exerted by events over time as found indelayed-matching-to-sample studies. For den-sity 2 substimuli, there are two alternative ap-proaches. Either a third kind of forgettingcurve has to be invoked to describe control bythe compound of two reinforcement opportu-nities and two missing reinforcement oppor-tunities, or control of the ensemble of eventshas to be predicted from how each event'scontrol over choice combines with the controlby other events. A memory scale can be con-structed from the concatenation of such events(Krantz, Luce, Suppes, & Tversky, 1971). Alter-natively, in a second model, the rememberedvalue of each substimulus Js probabilisticallydetermined, with each value of S,, being a ran-dom variable with a mean and a variance.One response alternative would be maximizedas a function of the perceived relative expectedpayoff which changes trial to trial, an ideasimilar in some respects to Shimp's (1976) mo-mentary maximizing model, the average acrosstrials equaling that predicted by the match-ing law.

118

REINFORCEMENT DENSITY DISCRIMINA TION USING CHOICE

APPENDIXTwo models of an ideal observer could be

assumed: one based on a discrete binomial dis-tribution, and the other based on a pointwisediscontinuous binomial distribution. The dis-crete form has probability represented by a ver-tical line at each possible integral substimulusdensity value, Nd, that could serve as a cri-terion. The pointwise discontinuous (almost-everywhere continuous) distribution has prob-ability represented by bars centered over thesesame criterion points, extending % a reinforce-ment opportunity to the left and to the right.If the discrete distribution is assumed, eachisosensitivity function consists of a set of dis-crete points as shown by the x's on the topleft isosensitivity curve, d' = 2/a-. To findP(Hit) and P(False Alarm) for a given crite-rion, Nd, and separation between means, AM,the discrete binomial probability function isintegrated from the criterion value to plusinfinity. For P(Hit), the discrete probabilityfunction for the rich stimulus is integratedby summing all values of P(Hit) at substimu-lus density values, Nd, above the criterion.The value of P(Hit) at a given substimulusdensity value, Nd, is equal to the frequencyof LC divided by the frequency of LC + REat that density. For P(False Alarm) the dis-crete probability function for the lean stimu-lus is integrated by summing each value ofP(False Alarm) determined at a given Ndvalue above the same criterion.A second ideal observer model can be repre-

sented as having only a pointwise discontin-uous probability density distribution. In theonly pointwise discontinuous case, the discretebinomial distributions are modified in orderto obtain isosensitivity points intermediate tovalues arising at the discrete criterion pointsthat were obtained from the discrete binomialprobability distributions. To connect theseisosensitivity points, probability at a givenintegral value of Nd in the probability distri-bution is replaced by the numerically propor-tional probability density. The probabilitydensity at the left of an interval, p(SNd), is as-signed to non-integral values within the inter-val, Nd to Nd + . This converts the discreteprobability distribution into a probabilitydensity distribution (as was done by Egan,1975). To find the theoretical values of P(Hit)and P(False Alarm) for the only pointwise dis-

continuous case, one integrates over each con-tinuous portion up to the criterion and thensums the integrals of these portions.

Different d' values were assigned to the dis-crete and the only pointwise discontinuousisosensitivity functions. First, d' was calculatedin the discrete case. For the two functionslower on the graph, the two additional d'swere then found by either assuming a fixedstandard deviation while the difference be-tween means of the two distributions, AM,gets smaller as sensitivity deteriorates, or byassuming that the degradation in sensitivityis due to a decrease in the difference betweenp and q, where q = 1 - p.Three theoretical sets of isosensitivity points

in the discrete probability case, correspondingto d' = 2.31, 1.73, and 1.15, were obtained inthe following fashion. The value of d' = (M1- M2)/o- (Egan, 1975), were found for the dis-tributions when their means were maximallyseparated (2.0 reinforcement opportunities);moved together by % maximal separation(1.5); and by 1/2 maximal separation (1.0).The means, Ms ich and Mgl, or M1 and M2,were found from M = Np, where M = Nd, thedensity of the gross level stimulus, N = num-ber of cycles, and p = probability of reinforce-ment for the first center-key peck in a cycle.For the maximum separation of the two dis-tributions, AM = Me -Ms , is calculatedas follows: M rich = 4 * 3 = 3 = Nd for therich density substimuli; Mlean = 4 * - 1 =Nd for the lean density substimuli, andAM = Mih -Ms = 3-1 = 2.0. For threesets of points, a fixed standard deviation, a- =VNP(l-P) = V4(%)(¼) = V% = .866 wasassumed for this first discrete case. Since theinverse, I/a- = 1.1547, for the constant a-, andd' = I /a- AM, d' values were 2.31, 1.73, and1.15, when AM = 2.0, 1.5, and 1.0, respectively.

In the only pointwise discontinuous case,the area from the criterion to infinity is thesame as the corresponding sum in the discretecase so that P(Hit) and P(False Alarm) are thesame at the criterion points and are linearin between. The d' values assigned to the iso-sensitivity curves shown in Figure 2 are de-rived from the only pointwise discontinuousprobability distribution case next. These d'values in the pointwise discontinuous case are

119

120 MICHAEL L. COMMONS

2.19, 1.64, and 1.10, which are smaller thanthe corresponding isosensitivity point values.Remember that d' = 1 v- AM. While the dif-ference between the means is the same in thediscrete and only pointwise discontinuous case,the standard deviation in the former, a- = .866is smaller than the a- = 1.10 of the latter, aswill be shown for the only-pointwise discon-tinuous case. The mean, ,u, used to calculatethe variance and the corresponding standarddeviation is found from:

4 n+l/ _ X xp(x)dx = 3.5,

n=O n

for the rich stimulus and 1.5 for the leanstimulus. Note that the maximal separation,AM = 2 = 3.5 - 1.5, which was the same asin the discrete case.The variance is found from:

4 n+lIc 2 = X; | x(-,)p(x)dx.

n=O n

Instead of setting a fixed a-, if the degra-dation in sensitivity were due to moving ptowards q, AM values would be the same. Asp - q approaches 0, in the discrete case a- ap-roaches 1, which is not very different from .866,and in the almost continuous case, a- wouldapproach 1.04, which is not very different from1.10. The values of d' in either case wouldnot be very different.

REFERENCESBerryman, R., Cumming, W. W., & Nevin, J. A. Acqui-

sition of delayed matching in the pigeon. Journalof the Experimental Analysis of Behavior, 1963, 6(1), 101-107.

Catania, A. C. Concurrent performances: a baselinefor the study of reinforcement magnitude. Journalof the Experimental Analysis of Behavior, 1963, 6,299-301.

Catania, A. C., & Reynolds, G. S. A quantitative analy-sis of responding maintained by interval schedulesof reinforcement. Journal of the Experimental Anal-ysis of Behavior, 1968, 11, 327-383.

Coulson, G., Koffer, K., & Coulson, V. Maximizationof pellet density by rats in a two-choice operantsituation. Psychonomic Science, 1971, 24(3), 145-146.

Egan, J. P. Signal Detection theory and ROC analysis.New York: Academic Press, 1975.

Estes, W. K., Burke, C. J., Atkinson, R. C., & Frank-mann, J. P. Probabilistic discrimination learning.Journal of Experimental Psychology, 1957, 54(4),233-239.

Herrnstein, R. J. On the law of effect. Journal of theExperimental Analysis of Behavior, 1970, 13, 243-266.

Herrnstein, R. J., & Loveland, D. H. Maximizing andmatching on concurrent ratio schedules. Journal ofthe Experimental Analysis of Behavior, 1975, 24,107-116.

Hobson, S. L. Discriminability of fixed-ratio schedulesfor pigeons: effect of absolute ratio size. Journal ofthe Experimental Analysis of Behavior, 1975, 23, 25-35.

Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A.Foundations of Measurement. (Vol. 1) New York:Academic Press, 1971.

Lattal, K. A. Reinforcement contingencies as discrimi-native stimuli. Journal of the Experimental Analysisof Behavior, 1975, 23, 241-246.

Loeve, M. Probability Theory. (3rd Ed.) Princeton:Van Nostrand, 1963.

Nevin, J. A. Signal detection theory and operant be-havior. Review of D. M. Green and J. A. SwetsSignal detection theory and psychophysics. Journalof the Experimental Analysis of Behavior, 1969, 12,475480.

Pliskoff, S. S., & Goldiamond, I. Some discriminationproperties of fixed-ratio performance in the pigeon.Journal of the Experimental Analysis of Behavior,1966, 9, 1-9.

Rilling, M., & MacDiarmid, C. Signal detection infixed-ratio schedules. Science, 1965, 148, 526-527.

Schoenfeld, W. N., Cumming, W. W., & Hearst, E. Onthe classification of reinforcement schedules. Pro-ceedings of the National Academy of Sciences, 1956,42, 563-570.

Shimp, C. P. Probabilistic discrimination learning inthe pigeon. Journal of Experimental Psychology,1973, 97, 292-304.

Shimp, C. P. Short-term memory in the pigeon: rela-tive recency. Journal of the Experimental Analysisof Behavior, 1976, 25, 55-62.

Swets, J. A., Tanner, W. P., & Birdsall, T. G. Decisionprocesses in perception. In J. A. Swets (ed.), Signaldetection and recognition by human observers. NewYork: Wiley, 1964.

Terman, M., & Terman, J. S. Concurrent variation ofresponse bias and sensitivity in an operant psycho-physical test. Perception and Psychophysics, 1972,11(6), 428-432.

Weissman, A. Impairment of performance when adiscriminative stimulus is correlated with a rein-forcement contingency. Journal of the ExperimentalAnalysis of Behavior, 1961, 4, 365-369.

Received November 8, 1976Final acceptance January 30,1979