wixted, j. t. & gaitan, s. (2002)

17
Animal Learning & Behavior 2002, 30 (4), 289-305 In a well-known article entitled “Why I Am Not a Cog- nitive Psychologist,” B. F. Skinner (1977) made the fol- lowing points: The variables of which human behavior is a function lie in the environment. (p. 1) Cognitive psychologists study these relations between or- ganism and environment, but they seldom deal with them directly. Instead, they invent internal surrogates which be- come the subject matter of their science. (p. 1) The mental apparatus studied by cognitive psychology is simply a rather crude version of contingencies of reinforce- ment and their effects. (p. 9) And in his well-known book Science and Human Behavior (1953), Skinner also asserted that the practice of looking inside the organism for an explana- tion of behavior has tended to obscure the variables which are immediately available for a scientific analysis. These variables lie outside the organism, in its immediate envi- ronment and in its environmental history. (p. 31) Implicit in these quotations (and rather explicit in some of Skinner’s other writings) is an indictment of cognitive theories and the research they generate. According to Skinner, the status of cognitive theories as reinforcement history surrogates and the dubious value of such theories for the understanding of behavior are tightly connected notions. However, to us, the two notions seem to be quite distinct. The utility of cognitive theories seems well estab- lished, and the research generated by them has revealed in- teresting and nonobvious human capabilities (some of which we will review in this article). Nevertheless, it might still be true that cognitive models often serve as reinforce- ment history surrogates, thereby obscuring the environ- mental variables that are the more distal and potentially ob- servable and controllable causes of behavior. In that sense, cognitive theories, useful though they may be, are open to criticism. Skinner (1977) developed the cognitive-theories-as- surrogates idea by considering very general explanatory constructs that show up in one way or another in many different theories. Thus, for example, explanations appeal- ing to an organism’s “knowledge” or “representations,” ideas that form the foundation of many cognitive models, were criticized as offering no real understanding and as drawing attention away from the true causes of behavior. Criticizing the very foundation of cognitive theories ren- ders the argument widely applicable while, perhaps, si- multaneously (and somewhat paradoxically) limiting its impact. Unless the argument is addressed to specific the- ories that have been advanced to explain specific findings in the literature, it may come across as philosophical spec- ulation that is of interest mainly to philosophers. That the argument does apply to specific theories is what we will try to show in this article, and our case in point will be signal- detection–based likelihood ratio models of human recog- nition memory. Several comprehensive models of human recognition memory have been proposed in recent years that are based on the idea that the memory system com- 289 Copyright 2002 Psychonomic Society, Inc. This article was supported by NIMH Grant 2 R01 MH55648-04A1. Correspondence concerning this article should be addressed to J. T. Wixted, Department of Psychology, University of California, San Diego, La Jolla, CA 92093 (e-mail: [email protected]). Note—This article was invited by the editors. Cognitive theories as reinforcement history surrogates: The case of likelihood ratio models of human recognition memory JOHN T. WIXTED and SANTINO C. GAITAN University of California, San Diego, La Jolla, California B. F. Skinner (1977) once argued that cognitive theories are essentially surrogates for the organism’s (usually unknown) reinforcement history. In this article, we argue that this notion applies rather di- rectly to a class of likelihood ratio models of human recognition memory. The point is not that such models are fundamentally flawed or that they are not useful and should be abandoned. Instead, the point is that the role of reinforcement history in shaping memory decisions could help to explain what otherwise must be explained by assuming that subjects are inexplicably endowed with the relevant dis- tributional information and computational abilities. To the degree that a role for an organism’s rein- forcement history is appreciated, the importance of animal memory research in understanding human memory comes into clearer focus. As Skinner was also fond of pointing out, it is only in the animal lab- oratory that an organism’s history of reinforcement can be precisely controlled and its effects on be- havior clearly understood.

Upload: dangthu

Post on 31-Dec-2016

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Wixted, J. T. & Gaitan, S. (2002)

Animal Learning amp Behavior2002 30 (4) 289-305

In a well-known article entitled ldquoWhy I Am Not a Cog-nitive Psychologistrdquo B F Skinner (1977) made the fol-lowing points

The variables of which human behavior is a function lie inthe environment (p 1)

Cognitive psychologists study these relations between or-ganism and environment but they seldom deal with themdirectly Instead they invent internal surrogates which be-come the subject matter of their science (p 1)

The mental apparatus studied by cognitive psychology issimply a rather crude version of contingencies of reinforce-ment and their effects (p 9)

And in his well-known book Science and Human Behavior(1953) Skinner also asserted that

the practice of looking inside the organism for an explana-tion of behavior has tended to obscure the variables whichare immediately available for a scientific analysis Thesevariables lie outside the organism in its immediate envi-ronment and in its environmental history (p 31)

Implicit in these quotations (and rather explicit in someof Skinnerrsquos other writings) is an indictment of cognitivetheories and the research they generate According toSkinner the status of cognitive theories as reinforcementhistory surrogates and the dubious value of such theories

for the understanding of behavior are tightly connectednotions However to us the two notions seem to be quitedistinct The utility of cognitive theories seems well estab-lished and the research generated by them has revealed in-teresting and nonobvious human capabilities (some ofwhich we will review in this article) Nevertheless it mightstill be true that cognitive models often serve as reinforce-ment history surrogates thereby obscuring the environ-mental variables that are the more distal and potentially ob-servable and controllable causes of behavior In that sensecognitive theories useful though they may be are open tocriticism

Skinner (1977) developed the cognitive-theories-as-surrogates idea by considering very general explanatoryconstructs that show up in one way or another in manydifferent theories Thus for example explanations appeal-ing to an organismrsquos ldquoknowledgerdquo or ldquorepresentationsrdquoideas that form the foundation of many cognitive modelswere criticized as offering no real understanding and asdrawing attention away from the true causes of behaviorCriticizing the very foundation of cognitive theories ren-ders the argument widely applicable while perhaps si-multaneously (and somewhat paradoxically) limiting itsimpact Unless the argument is addressed to specific the-ories that have been advanced to explain specific findingsin the literature it may come across as philosophical spec-ulation that is of interest mainly to philosophers That theargument does apply to specific theories is what we will tryto show in this article and our case in point will be signal-detectionndashbased likelihood ratio models of human recog-nition memory Several comprehensive models of humanrecognition memory have been proposed in recent yearsthat are based on the idea that the memory system com-

289 Copyright 2002 Psychonomic Society Inc

This article was supported by NIMH Grant 2 R01 MH55648-04A1Correspondence concerning this article should be addressed to J TWixted Department of Psychology University of California San DiegoLa Jolla CA 92093 (e-mail jwixteducsdedu)

NotemdashThis article was invited by the editors

Cognitive theories as reinforcement history surrogates The case of likelihood ratio models

of human recognition memory

JOHN T WIXTED and SANTINO C GAITANUniversity of California San Diego La Jolla California

B F Skinner (1977) once argued that cognitive theories are essentially surrogates for the organismrsquos(usually unknown) reinforcement history In this article we argue that this notion applies rather di-rectly to a class of likelihood ratio models of human recognition memory The point is not that suchmodels are fundamentally flawed or that they are not useful and should be abandoned Instead thepoint is that the role of reinforcement history in shaping memory decisions could help to explain whatotherwise must be explained by assuming that subjects are inexplicably endowed with the relevant dis-tributional information and computational abilities To the degree that a role for an organismrsquos rein-forcement history is appreciated the importance of animal memory research in understanding humanmemory comes into clearer focus As Skinner was also fond of pointing out it is only in the animal lab-oratory that an organismrsquos history of reinforcement can be precisely controlled and its effects on be-havior clearly understood

290 WIXTED AND GAITAN

putes a likelihood ratio in order to determine whether ornot an item was recently encountered on a list These mod-els assume that the understanding of human recognitionmemory is contingent on the understanding of the under-lying computational machinery Our central claim is thatSkinner may have been right about this theoretical ma-chinery It is largely a stand-in for the organismrsquos learninghistory To the extent that this point is appreciated the rel-evance of animal memory research to the understanding ofhuman memory becomes much clearer than it otherwisemight be Indeed it is only in the animal laboratory thatthe connection between an organismrsquos prior reinforcementhistory and its current memory decisions is clearly evident(Alsop 1998) To quote Skinner (1953) again ldquoWe studythe behavior of animals because it is simpler Basic pro-cesses are revealed more easily and can be recorded overlonger periods of time We may arrange genetic histo-ries to control certain variables and special life histories tocontrol othersrdquo (p 38) Illustrating the validity of this pointis the main purpose of the present article

The likelihood ratio models that we will consider inmaking our case represent one form of signal detectiontheory so we will begin our inquiry into this issue with abrief overview of that basic framework

Signal Detection TheorySince the late 1960s theories of human recognition

memory have involved concepts drawn from signal detec-tion theory In a typical recognition memory task subjectsfirst study a list of items (eg words or pictures) and arethen presented with a recognition test in which items arepresented one at a time for an oldnew recognition deci-sion A response of old means that the subject believesthat the word appeared on the list new means that the sub-ject believes that it did not Typically half of the test itemsactually did appear on the list (targets) and half did not(lures) Imagine that subjects correctly respond old to84 of the targets (ie the hit rate is 84) and incorrectlyrespond old to 16 of the lures (ie the false alarm rateis 16) How should we conceptualize a performance likethis Signal detection theory offers one appealing answerThis theory comes in two basic versions one of which isfairly simple and the other of which is more complex

In its simplest form signal detection theory holds thata decision about whether or not an item was recently en-countered on a list depends on its level of familiarity If theitemrsquos level of familiarity exceeds a criterion value it isjudged to be old otherwise it is judged to be new As isillustrated in the upper panel of Figure 1 familiarity val-ues associated with the old and the new items (ie the tar-gets and the lures respectively) are usually assumed to benormally distributed with the mean of the target distribu-tion located at a higher point on the familiarity axis thanthe mean of the lure distribution The decision criterionspecifies the familiarity value above which an item is de-clared to be old and it can be placed anywhere along thefamiliarity axis In Figure 1 it is placed exactly halfwaybetween the means of the target and the lure distributionsThe shaded area in Figure 1 represents the proportion of

lures that are incorrectly judged to be old because they areassociated with familiarity values that fall above the cri-terion That proportion is known as the false alarm rate Thefalse alarm rate for the situation depicted in Figure 1 wouldbe about 16 (whereas the hit rate would be 84)

An important assumption of the detection theory justdescribed is that the decision axis represents a strength-of-evidence variable such as familiarity Although manysignal detection models of recognition memory make pre-cisely that assumption another class of detection modelsdoes not These models assume that the decision axis rep-resents a log likelihood ratio scale According to a likeli-hood ratio model recognition decisions are based on a sta-tistical computation If the computed odds that the itemappeared on the list are high enough (usually greater thaneven) the item is declared to be old otherwise it is de-

Figure 1 Upper panel standard familiarity-based signal de-tection model of recognition memory The decision axis repre-sents a familiarity scale that ranges from low to high Lowerpanel likelihood ratio version of the signal detection account ofrecognition memory The decision axis represents a log likelihoodratio scale that ranges from minus infinity to plus infinity

REINFORCEMENT HISTORY SURROGATES 291

clared to be new The lower panel of Figure 1 illustratesthe likelihood ratio model and it is immediately apparentthat it looks a lot like the familiarity model shown in theupper panel of Figure 1 The only real difference is the de-cision axis which now represents a log likelihood scale

Likelihood ratio models assume that the operative psy-chological variable is an odds ratio associated with the testitem not its level of familiarity The odds ratio is equal tothe likelihood that the item was drawn from the target dis-tribution divided by the likelihood that it was drawn fromthe lure distribution Graphically this value is given by theheight of the target distribution divided by the height ofthe lure distribution at the point on the familiarity axiswhere the item falls Consider for example an item thatgenerates a level of familiarity equal to the mean famil-iarity of the targets The height of the target distributionat its mean is 738 times the height of the lure distribu-tion (hL) As such a test item that generates this level offamiliarity (ie a familiarity of mTarget) is 738 times aslikely to have been drawn from the target distribution asfrom the lure distribution Whenever the odds are greaterthan even as they are in this case an unbiased subjectwould declare the item to be old Note that the natural logof 738 is 20 so the same item that generates a level offamiliarity falling at the mean of the target distributionin the upper panel of Figure 1 generates a log likelihoodratio of 20 in the lower panel of Figure 1 Similarly anitem whose familiarity value falls midway between themeans of the target and the lure distributions is associatedwith a likelihood ratio of 10 (ie the heights of the twodistributions are equal at that point) which translates to alog likelihood ratio of 0 in the lower panel of Figure 1

Whereas the strength version of signal detection theoryrequires only that the subject have some criterion famil-iarity value in mind in order to arrive at a recognition de-cision the likelihood ratio version requires much morecomplete knowledge of the shapes and locations of theunderlying familiarity distributions in order to make therelevant statistical computations Although those addedassumptions may seem implausible it seems clear thatsubjects sometimes behave as if they do possess the rel-evant knowledge The evidence bearing on this claim willbe reviewed next and the question we will later ask is thisAre humans biologically endowed with the requisiteknowledge and computational ability or have they beentrained to behave that way as a function of prior (extra-laboratory) consequences

Explaining the Mirror EffectThe oldnew decision criterion What is the advan-

tage of assuming a likelihood ratio model of memory Thefamiliarity and likelihood ratio accounts are equally ableto explain say a hit rate of 84 and a false alarm rate of16 (as is shown in Figure 1) but Figure 2 illustrates an im-portant difference between them This figure shows whatthe expected result would be if subjects maintained thesame decision criterion as d9 (the standardized distance be-tween the means of the target and the lure distributions)changed from low to high The upper panel shows the

familiarity-based model and the lower panel shows thelikelihood ratio model We assume that d9 was affected bya standard strength manipulation (eg in the strong con-dition subjects had more time to study the items than inthe weak condition) and that this manipulation affectedthe mean familiarity of the target items without affectingthe mean familiarity of the lures (which in both condi-tions are simply items that did not appear on the list)

Note that the definition of the decision criterion differsbetween the two accounts In the strength version the cri-terion is a particular level of familiarity (so that itemsyielding higher familiarity levels than that are judged to beold) In the likelihood ratio version the criterion is a par-ticular odds ratio (so that items yielding higher odds thanthat are judged to be old)

In the familiarity-based model shown in the upper panelof Figure 2 the criterion is placed 075 standard devia-tions above the mean of the lure distribution whether thetargets are weak or strong (ie the criterion is fixed at aparticular point on the familiarity axis) As such the hitrate increases considerably as a function of strength butthe false alarm rate remains constant If the familiarity ofthe lures does not change as the targets are strengthened(which is the simplest assumption) the only way that thefalse alarm rate would change is if the criterion movedacross conditions

In the likelihood ratio model the criterion is placed atthe point where the odds are even that the item was drawnfrom the target or the lure distributions (ie where theodds ratio is 10) The point of even odds occurs wherethe heights of the two distributions are equal and that oc-curs where the distributions intersect In the familiaritymodel the point of intersection occurs at one level of fa-miliarity in the weak case and at a higher level of famil-iarity in the strong case But these two familiarity valuesboth translate to a value of 0 on the log likelihood ratioaxis (ie ln[10] = 0) Thus from a likelihood ratio pointof view the criterion remains fixed at 0 across the twostrength conditions

Note the different patterns of hit and false alarm ratesyielded by these two fixed-criterion models Unlike thefamiliarity model the likelihood ratio model naturally pre-dicts a mirror effect (ie as the hit rate goes up the falsealarm rate goes down) and that pattern happens to be anearly universal finding in the recognition memory liter-ature (Glanzer Adams Iverson amp Kim 1993) The mir-ror effect is typically observed not only for strength ma-nipulations but also for manipulations of word frequencyconcreteness and a variety of other variables (Glanzer ampAdams 1990) The ability of the likelihood ratio model toaccount for the mirror effect in a natural way is one of itsmost attractive features Indeed largely because of themirror effect recent theorizing about recognition memoryappears to reflect a shift away from the idea that the de-cision axis represents a strength-of-evidence scale (likefamiliarity) and toward the idea that it represents a likeli-hood ratio scale In fact three of the most recent theoriesof human recognition memory are based fundamentallyon the notion that recognition memory decisions are

292 WIXTED AND GAITAN

based on a likelihood ratio computation These includetheories known as attention-likelihood theory (Glanzeret al 1993) retrieving effectively from memory (Shiff-rin amp Steyvers 1997) and subjective-likelihood theory(McClelland amp Chappell 1998)

Although the very existence of the mirror effect sup-ports the likelihood ratio view that effect can still be eas-

ily represented (and explained) with the simpler versionthat assumes a familiarity axis Thus for example imag-ine that in one condition the hit rate is 84 and the falsealarm rate is 16 (so that d9 = 2) and in another condi-tion the hit rate is 69 and the false alarm rate is 31 (sothat d9 = 1) Further imagine that the mean familiarity ofthe lure distribution is the same in both conditions but

Figure 2 Upper panel familiarity-based signal detection model of weak andstrong conditions over which the decision criterion remains fixed on the decisionaxis Lower panel likelihood ratio signal detection model of weak and strongconditions over which the decision criterion remains fixed on the decision axis

REINFORCEMENT HISTORY SURROGATES 293

the mean familiarity of the targets is greater in the strongcondition than in the weak condition Figure 3 shows howto represent this situation assuming a familiarity axisBasically when the target distribution weakens the cri-terion shifts to the left in such a way as to remain midwaybetween the means of the target and the lure distribu-tions Subjects might very well be inclined to do this be-cause after a strong list they would presumably know thata test item had it appeared on the list would be very fa-miliar so a stringent criterion could be used without miss-ing too many of the targets After a weak list by contrastthey would know that a target item might be somewhatunfamiliar even if it had appeared on the list so a less strin-gent criterion might be used so as not to miss too many ofthe targets If the criterion did shift in this way across con-ditions (which would not involve a likelihood ratio com-putation) a mirror effect would be observed Neverthe-less a likelihood ratio theorist would say that the apparentcriterion shift is an illusion What looks like a criterionshift on the familiarity axis actually reflects the fact thatthe subject uses the same criterion odds ratio in both cases

Confidence criteria The argument just presented inconnection with the movement of the oldnew decision cri-terion as a function of strength can be generalized to the

situation in which subjects are asked not only for an oldnew judgment but also for a confidence rating Thus forexample when asked to decide whether or not a test itemappeared on a study list the subject might be offered sixchoice alternatives labeled 222 22 2 1 11111 where the first three represent new responses ofvarying confidence and the last three represent old re-sponses of varying confidence If the subject is absolutelysure that a particular test item appeared on the list theldquo111rdquo button would be pressed A less certain old re-sponse would be indicated by pressing the ldquo11rdquo buttoninstead

A straightforward extension of the basic detectionmodel holds that confidence judgments are reached inmuch the same way that oldnew decisions are reached(eg Macmillan amp Creelman 1991) That is as is illus-trated in Figure 4 a different criterion for each confidencerating is theoretically placed somewhere along the fa-miliarity axis Familiarity values that fall above OH receivean old response with high confidence (ie 111) thosethat fall between OM and OH receive an old response withmedium confidence (11) and those that fall between Cand OM receive an old response with low confidence(+)1 On the other side of the criterion familiarity valuesthat fall below C but above NM receive a new response withlow confidence (2) those that fall below NM but aboveNH receive a new response with medium confidence (22)and those that fall below NH receive a new response withhigh confidence (222)

As was indicated earlier the likelihood ratio modelholds that the mirror effect occurs because subjects use thesame likelihood ratio to make decisions in both the strongand the weak conditions A fixed likelihood ratio criterionshows up as a shifting criterion as a function of strengthwhen the situation is represented by the familiarity versionof detection theory (as in Figure 3) A similar story appliesto the way in which confidence criteria change as a func-tion of strength If subjects maintain constant likelihoodratios for each of the confidence criteria then when rep-resented in terms of the familiarity version of detectiontheory the confidence criteria should move in a predictableway as d9 decreases More specifically as will be describednext the criteria should diverge or fan out as d9 ap-proaches zero Figure 5 shows three different ways the con-fidence criteria might shift as d9 changes from high to lowIn all three cases the oldnew decision criterion shifts tothe left (thereby yielding a mirror effect) but the confi-dence criteria shift in different ways The upper panelshows the confidence criteria shifting in lockstep with thedecision criterion the middle panel shows the confidencecriteria converging toward the oldnew decision criterionand the lower panel shows the confidence criteria diverg-ing (as the likelihood ratio account uniquely predicts)

According to the likelihood ratio account the oldnewdecision criterion C is placed in such as way as to main-tain a likelihood ratio of 1 for unbiased responding Sim-ilarly the OH criterion is placed in such a way as to main-tain a larger likelihood ratio such as 10 to 1 That is anyitem that is 10 or more times as likely to have come from

Figure 3 Familiarity-based signal detection model of weak andstrong conditions over which the decision shifts on the decisionaxis

294 WIXTED AND GAITAN

the target distribution than the lure distribution receivesan old response with high confidence Different subjectsmight use different specific odds ratios before making ahigh-confident response but we will assume that OHfalls where the likelihood ratio is 10 to 1 for the sake ofsimplicity Similarly the NH criterion (ie the criterionfor saying new with high confidence) might be placed insuch a way as to maintain a small likelihood ratio suchas 1 to 10 (1) That is any item with only a 1 in 10 chance(or less) of having been drawn from the target distributionreceives a no response with high confidence As d9 de-creases this model assumes that the criteria move in sucha way as to maintain these confidence-specific likelihoodratios

How could subjects possibly adjust the criteria in thisway The likelihood ratio model assumes that the subjectnot only has information about the location of the targetand lure distributions but also has information about theirmathematical forms Where the latter information mightcome from is usually underspecified (and is the mainfocus of this article) If the target and lure distributions areGaussian with means of d9 and 0 respectively and com-mon standard deviation of s the probability of encoun-tering a test item with a familiarity of f given that it is atarget p( f | target) is

(1)

and the probability of encountering a test item with a fa-miliarity of f given that it is a lure p( f | lure) is

(2)

where f is a value along the familiarity axis The mean ofthe lure distribution is arbitrarily set to 0 for the sake ofsimplicity All other values (eg the mean of the target

distribution and the location of the decision criterion) aremeasured with respect to the mean of the lure distribu-tion The mean of the target distribution d9 representsthe number of standard deviations the mean of that dis-tribution falls above the mean of the lure distribution In-deed whether we are considering the mean of the targetdistribution or the location of the decision and confi-dence criteria the values represent the number of stan-dard deviation units above (or below) the mean of thelure distribution

According to likelihood ratio models of human recog-nition memory subjects use their knowledge of the math-ematical forms of the target and lure distributions to placethe oldnew decision criterion C at the point on the fa-miliarity axis where the height of the signal distributionequals that of the noise distributionmdashthat is where thelikelihood ratio p( f | target) p( f | lure) is equal to 1 Thisis the optimal location of the decision criterion becausethis is where the odds that the test item was drawn from thetarget distribution exactly equals the odds that it wasdrawn from the lure distribution For any familiarity valuegreater than that the odds that the item was drawn fromthe target distribution (ie that it appeared on the list) arebetter than even in which case an old response makessense As was indicated above the remaining confidencecriteria are placed in such a way as to maintain specificodds ratios that are greater than or less than 1 If OH is as-sociated with a likelihood ratio of 10 to 1 that confidencecriterion is placed on the familiarity axis at the pointwhere the height of the target distribution is 10 times theheight of the lure distribution regardless of the value of d9

For any given likelihood ratio L the criterion is placedon the familiarity axis at the point f that satisfies the fol-lowing equation

(3)

The right side of the equation is simply the ratio of theheight of the target (signal plus noise) distribution to theheight of the lure (noise) distribution at the point f on thefamiliarity axis That is using the identities expressed inEquations 1 and 2

which after setting s to 1 for the sake of simplicity andrearranging terms simplifies to

Solving this equation for f yields

(4)

To determine where a particular criterion should beplaced on the familiarity axis to satisfy a particular oddsratio one need only enter the value of L theoretically as-

f d L f

d= cent +

cent2

ln[ ( )]

L f e f d f( ) ( ) = - - cent +5 52 2

L f

e

e

f d

f( )

( )

( ) =

- - cent

- -

1

21

2

2

5

2

5 0

2 2

2 2

ps

ps

s

s

L fp f

p f( )

( | )

( | )= target

lure

p f e f( | ( ) lure) = 1

2 2pss- -5 0 2 2

p f e f d( | ( ) target) = 1

2 2pss- - cent5 2 2

Figure 4 Illustration of the placement of confidence criteria ina signal detection framework The oldnew decision criterion isrepresented by C and the remaining confidence criteria are rep-resented by OM OH NM and NH Familiarity values falling aboveOM or OH receive medium- or high-confident old responses (re-spectively) whereas familiarity values that fall below NM or NH re-ceive medium- or high-confident new responses (respectively)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 2: Wixted, J. T. & Gaitan, S. (2002)

290 WIXTED AND GAITAN

putes a likelihood ratio in order to determine whether ornot an item was recently encountered on a list These mod-els assume that the understanding of human recognitionmemory is contingent on the understanding of the under-lying computational machinery Our central claim is thatSkinner may have been right about this theoretical ma-chinery It is largely a stand-in for the organismrsquos learninghistory To the extent that this point is appreciated the rel-evance of animal memory research to the understanding ofhuman memory becomes much clearer than it otherwisemight be Indeed it is only in the animal laboratory thatthe connection between an organismrsquos prior reinforcementhistory and its current memory decisions is clearly evident(Alsop 1998) To quote Skinner (1953) again ldquoWe studythe behavior of animals because it is simpler Basic pro-cesses are revealed more easily and can be recorded overlonger periods of time We may arrange genetic histo-ries to control certain variables and special life histories tocontrol othersrdquo (p 38) Illustrating the validity of this pointis the main purpose of the present article

The likelihood ratio models that we will consider inmaking our case represent one form of signal detectiontheory so we will begin our inquiry into this issue with abrief overview of that basic framework

Signal Detection TheorySince the late 1960s theories of human recognition

memory have involved concepts drawn from signal detec-tion theory In a typical recognition memory task subjectsfirst study a list of items (eg words or pictures) and arethen presented with a recognition test in which items arepresented one at a time for an oldnew recognition deci-sion A response of old means that the subject believesthat the word appeared on the list new means that the sub-ject believes that it did not Typically half of the test itemsactually did appear on the list (targets) and half did not(lures) Imagine that subjects correctly respond old to84 of the targets (ie the hit rate is 84) and incorrectlyrespond old to 16 of the lures (ie the false alarm rateis 16) How should we conceptualize a performance likethis Signal detection theory offers one appealing answerThis theory comes in two basic versions one of which isfairly simple and the other of which is more complex

In its simplest form signal detection theory holds thata decision about whether or not an item was recently en-countered on a list depends on its level of familiarity If theitemrsquos level of familiarity exceeds a criterion value it isjudged to be old otherwise it is judged to be new As isillustrated in the upper panel of Figure 1 familiarity val-ues associated with the old and the new items (ie the tar-gets and the lures respectively) are usually assumed to benormally distributed with the mean of the target distribu-tion located at a higher point on the familiarity axis thanthe mean of the lure distribution The decision criterionspecifies the familiarity value above which an item is de-clared to be old and it can be placed anywhere along thefamiliarity axis In Figure 1 it is placed exactly halfwaybetween the means of the target and the lure distributionsThe shaded area in Figure 1 represents the proportion of

lures that are incorrectly judged to be old because they areassociated with familiarity values that fall above the cri-terion That proportion is known as the false alarm rate Thefalse alarm rate for the situation depicted in Figure 1 wouldbe about 16 (whereas the hit rate would be 84)

An important assumption of the detection theory justdescribed is that the decision axis represents a strength-of-evidence variable such as familiarity Although manysignal detection models of recognition memory make pre-cisely that assumption another class of detection modelsdoes not These models assume that the decision axis rep-resents a log likelihood ratio scale According to a likeli-hood ratio model recognition decisions are based on a sta-tistical computation If the computed odds that the itemappeared on the list are high enough (usually greater thaneven) the item is declared to be old otherwise it is de-

Figure 1 Upper panel standard familiarity-based signal de-tection model of recognition memory The decision axis repre-sents a familiarity scale that ranges from low to high Lowerpanel likelihood ratio version of the signal detection account ofrecognition memory The decision axis represents a log likelihoodratio scale that ranges from minus infinity to plus infinity

REINFORCEMENT HISTORY SURROGATES 291

clared to be new The lower panel of Figure 1 illustratesthe likelihood ratio model and it is immediately apparentthat it looks a lot like the familiarity model shown in theupper panel of Figure 1 The only real difference is the de-cision axis which now represents a log likelihood scale

Likelihood ratio models assume that the operative psy-chological variable is an odds ratio associated with the testitem not its level of familiarity The odds ratio is equal tothe likelihood that the item was drawn from the target dis-tribution divided by the likelihood that it was drawn fromthe lure distribution Graphically this value is given by theheight of the target distribution divided by the height ofthe lure distribution at the point on the familiarity axiswhere the item falls Consider for example an item thatgenerates a level of familiarity equal to the mean famil-iarity of the targets The height of the target distributionat its mean is 738 times the height of the lure distribu-tion (hL) As such a test item that generates this level offamiliarity (ie a familiarity of mTarget) is 738 times aslikely to have been drawn from the target distribution asfrom the lure distribution Whenever the odds are greaterthan even as they are in this case an unbiased subjectwould declare the item to be old Note that the natural logof 738 is 20 so the same item that generates a level offamiliarity falling at the mean of the target distributionin the upper panel of Figure 1 generates a log likelihoodratio of 20 in the lower panel of Figure 1 Similarly anitem whose familiarity value falls midway between themeans of the target and the lure distributions is associatedwith a likelihood ratio of 10 (ie the heights of the twodistributions are equal at that point) which translates to alog likelihood ratio of 0 in the lower panel of Figure 1

Whereas the strength version of signal detection theoryrequires only that the subject have some criterion famil-iarity value in mind in order to arrive at a recognition de-cision the likelihood ratio version requires much morecomplete knowledge of the shapes and locations of theunderlying familiarity distributions in order to make therelevant statistical computations Although those addedassumptions may seem implausible it seems clear thatsubjects sometimes behave as if they do possess the rel-evant knowledge The evidence bearing on this claim willbe reviewed next and the question we will later ask is thisAre humans biologically endowed with the requisiteknowledge and computational ability or have they beentrained to behave that way as a function of prior (extra-laboratory) consequences

Explaining the Mirror EffectThe oldnew decision criterion What is the advan-

tage of assuming a likelihood ratio model of memory Thefamiliarity and likelihood ratio accounts are equally ableto explain say a hit rate of 84 and a false alarm rate of16 (as is shown in Figure 1) but Figure 2 illustrates an im-portant difference between them This figure shows whatthe expected result would be if subjects maintained thesame decision criterion as d9 (the standardized distance be-tween the means of the target and the lure distributions)changed from low to high The upper panel shows the

familiarity-based model and the lower panel shows thelikelihood ratio model We assume that d9 was affected bya standard strength manipulation (eg in the strong con-dition subjects had more time to study the items than inthe weak condition) and that this manipulation affectedthe mean familiarity of the target items without affectingthe mean familiarity of the lures (which in both condi-tions are simply items that did not appear on the list)

Note that the definition of the decision criterion differsbetween the two accounts In the strength version the cri-terion is a particular level of familiarity (so that itemsyielding higher familiarity levels than that are judged to beold) In the likelihood ratio version the criterion is a par-ticular odds ratio (so that items yielding higher odds thanthat are judged to be old)

In the familiarity-based model shown in the upper panelof Figure 2 the criterion is placed 075 standard devia-tions above the mean of the lure distribution whether thetargets are weak or strong (ie the criterion is fixed at aparticular point on the familiarity axis) As such the hitrate increases considerably as a function of strength butthe false alarm rate remains constant If the familiarity ofthe lures does not change as the targets are strengthened(which is the simplest assumption) the only way that thefalse alarm rate would change is if the criterion movedacross conditions

In the likelihood ratio model the criterion is placed atthe point where the odds are even that the item was drawnfrom the target or the lure distributions (ie where theodds ratio is 10) The point of even odds occurs wherethe heights of the two distributions are equal and that oc-curs where the distributions intersect In the familiaritymodel the point of intersection occurs at one level of fa-miliarity in the weak case and at a higher level of famil-iarity in the strong case But these two familiarity valuesboth translate to a value of 0 on the log likelihood ratioaxis (ie ln[10] = 0) Thus from a likelihood ratio pointof view the criterion remains fixed at 0 across the twostrength conditions

Note the different patterns of hit and false alarm ratesyielded by these two fixed-criterion models Unlike thefamiliarity model the likelihood ratio model naturally pre-dicts a mirror effect (ie as the hit rate goes up the falsealarm rate goes down) and that pattern happens to be anearly universal finding in the recognition memory liter-ature (Glanzer Adams Iverson amp Kim 1993) The mir-ror effect is typically observed not only for strength ma-nipulations but also for manipulations of word frequencyconcreteness and a variety of other variables (Glanzer ampAdams 1990) The ability of the likelihood ratio model toaccount for the mirror effect in a natural way is one of itsmost attractive features Indeed largely because of themirror effect recent theorizing about recognition memoryappears to reflect a shift away from the idea that the de-cision axis represents a strength-of-evidence scale (likefamiliarity) and toward the idea that it represents a likeli-hood ratio scale In fact three of the most recent theoriesof human recognition memory are based fundamentallyon the notion that recognition memory decisions are

292 WIXTED AND GAITAN

based on a likelihood ratio computation These includetheories known as attention-likelihood theory (Glanzeret al 1993) retrieving effectively from memory (Shiff-rin amp Steyvers 1997) and subjective-likelihood theory(McClelland amp Chappell 1998)

Although the very existence of the mirror effect sup-ports the likelihood ratio view that effect can still be eas-

ily represented (and explained) with the simpler versionthat assumes a familiarity axis Thus for example imag-ine that in one condition the hit rate is 84 and the falsealarm rate is 16 (so that d9 = 2) and in another condi-tion the hit rate is 69 and the false alarm rate is 31 (sothat d9 = 1) Further imagine that the mean familiarity ofthe lure distribution is the same in both conditions but

Figure 2 Upper panel familiarity-based signal detection model of weak andstrong conditions over which the decision criterion remains fixed on the decisionaxis Lower panel likelihood ratio signal detection model of weak and strongconditions over which the decision criterion remains fixed on the decision axis

REINFORCEMENT HISTORY SURROGATES 293

the mean familiarity of the targets is greater in the strongcondition than in the weak condition Figure 3 shows howto represent this situation assuming a familiarity axisBasically when the target distribution weakens the cri-terion shifts to the left in such a way as to remain midwaybetween the means of the target and the lure distribu-tions Subjects might very well be inclined to do this be-cause after a strong list they would presumably know thata test item had it appeared on the list would be very fa-miliar so a stringent criterion could be used without miss-ing too many of the targets After a weak list by contrastthey would know that a target item might be somewhatunfamiliar even if it had appeared on the list so a less strin-gent criterion might be used so as not to miss too many ofthe targets If the criterion did shift in this way across con-ditions (which would not involve a likelihood ratio com-putation) a mirror effect would be observed Neverthe-less a likelihood ratio theorist would say that the apparentcriterion shift is an illusion What looks like a criterionshift on the familiarity axis actually reflects the fact thatthe subject uses the same criterion odds ratio in both cases

Confidence criteria The argument just presented inconnection with the movement of the oldnew decision cri-terion as a function of strength can be generalized to the

situation in which subjects are asked not only for an oldnew judgment but also for a confidence rating Thus forexample when asked to decide whether or not a test itemappeared on a study list the subject might be offered sixchoice alternatives labeled 222 22 2 1 11111 where the first three represent new responses ofvarying confidence and the last three represent old re-sponses of varying confidence If the subject is absolutelysure that a particular test item appeared on the list theldquo111rdquo button would be pressed A less certain old re-sponse would be indicated by pressing the ldquo11rdquo buttoninstead

A straightforward extension of the basic detectionmodel holds that confidence judgments are reached inmuch the same way that oldnew decisions are reached(eg Macmillan amp Creelman 1991) That is as is illus-trated in Figure 4 a different criterion for each confidencerating is theoretically placed somewhere along the fa-miliarity axis Familiarity values that fall above OH receivean old response with high confidence (ie 111) thosethat fall between OM and OH receive an old response withmedium confidence (11) and those that fall between Cand OM receive an old response with low confidence(+)1 On the other side of the criterion familiarity valuesthat fall below C but above NM receive a new response withlow confidence (2) those that fall below NM but aboveNH receive a new response with medium confidence (22)and those that fall below NH receive a new response withhigh confidence (222)

As was indicated earlier the likelihood ratio modelholds that the mirror effect occurs because subjects use thesame likelihood ratio to make decisions in both the strongand the weak conditions A fixed likelihood ratio criterionshows up as a shifting criterion as a function of strengthwhen the situation is represented by the familiarity versionof detection theory (as in Figure 3) A similar story appliesto the way in which confidence criteria change as a func-tion of strength If subjects maintain constant likelihoodratios for each of the confidence criteria then when rep-resented in terms of the familiarity version of detectiontheory the confidence criteria should move in a predictableway as d9 decreases More specifically as will be describednext the criteria should diverge or fan out as d9 ap-proaches zero Figure 5 shows three different ways the con-fidence criteria might shift as d9 changes from high to lowIn all three cases the oldnew decision criterion shifts tothe left (thereby yielding a mirror effect) but the confi-dence criteria shift in different ways The upper panelshows the confidence criteria shifting in lockstep with thedecision criterion the middle panel shows the confidencecriteria converging toward the oldnew decision criterionand the lower panel shows the confidence criteria diverg-ing (as the likelihood ratio account uniquely predicts)

According to the likelihood ratio account the oldnewdecision criterion C is placed in such as way as to main-tain a likelihood ratio of 1 for unbiased responding Sim-ilarly the OH criterion is placed in such a way as to main-tain a larger likelihood ratio such as 10 to 1 That is anyitem that is 10 or more times as likely to have come from

Figure 3 Familiarity-based signal detection model of weak andstrong conditions over which the decision shifts on the decisionaxis

294 WIXTED AND GAITAN

the target distribution than the lure distribution receivesan old response with high confidence Different subjectsmight use different specific odds ratios before making ahigh-confident response but we will assume that OHfalls where the likelihood ratio is 10 to 1 for the sake ofsimplicity Similarly the NH criterion (ie the criterionfor saying new with high confidence) might be placed insuch a way as to maintain a small likelihood ratio suchas 1 to 10 (1) That is any item with only a 1 in 10 chance(or less) of having been drawn from the target distributionreceives a no response with high confidence As d9 de-creases this model assumes that the criteria move in sucha way as to maintain these confidence-specific likelihoodratios

How could subjects possibly adjust the criteria in thisway The likelihood ratio model assumes that the subjectnot only has information about the location of the targetand lure distributions but also has information about theirmathematical forms Where the latter information mightcome from is usually underspecified (and is the mainfocus of this article) If the target and lure distributions areGaussian with means of d9 and 0 respectively and com-mon standard deviation of s the probability of encoun-tering a test item with a familiarity of f given that it is atarget p( f | target) is

(1)

and the probability of encountering a test item with a fa-miliarity of f given that it is a lure p( f | lure) is

(2)

where f is a value along the familiarity axis The mean ofthe lure distribution is arbitrarily set to 0 for the sake ofsimplicity All other values (eg the mean of the target

distribution and the location of the decision criterion) aremeasured with respect to the mean of the lure distribu-tion The mean of the target distribution d9 representsthe number of standard deviations the mean of that dis-tribution falls above the mean of the lure distribution In-deed whether we are considering the mean of the targetdistribution or the location of the decision and confi-dence criteria the values represent the number of stan-dard deviation units above (or below) the mean of thelure distribution

According to likelihood ratio models of human recog-nition memory subjects use their knowledge of the math-ematical forms of the target and lure distributions to placethe oldnew decision criterion C at the point on the fa-miliarity axis where the height of the signal distributionequals that of the noise distributionmdashthat is where thelikelihood ratio p( f | target) p( f | lure) is equal to 1 Thisis the optimal location of the decision criterion becausethis is where the odds that the test item was drawn from thetarget distribution exactly equals the odds that it wasdrawn from the lure distribution For any familiarity valuegreater than that the odds that the item was drawn fromthe target distribution (ie that it appeared on the list) arebetter than even in which case an old response makessense As was indicated above the remaining confidencecriteria are placed in such a way as to maintain specificodds ratios that are greater than or less than 1 If OH is as-sociated with a likelihood ratio of 10 to 1 that confidencecriterion is placed on the familiarity axis at the pointwhere the height of the target distribution is 10 times theheight of the lure distribution regardless of the value of d9

For any given likelihood ratio L the criterion is placedon the familiarity axis at the point f that satisfies the fol-lowing equation

(3)

The right side of the equation is simply the ratio of theheight of the target (signal plus noise) distribution to theheight of the lure (noise) distribution at the point f on thefamiliarity axis That is using the identities expressed inEquations 1 and 2

which after setting s to 1 for the sake of simplicity andrearranging terms simplifies to

Solving this equation for f yields

(4)

To determine where a particular criterion should beplaced on the familiarity axis to satisfy a particular oddsratio one need only enter the value of L theoretically as-

f d L f

d= cent +

cent2

ln[ ( )]

L f e f d f( ) ( ) = - - cent +5 52 2

L f

e

e

f d

f( )

( )

( ) =

- - cent

- -

1

21

2

2

5

2

5 0

2 2

2 2

ps

ps

s

s

L fp f

p f( )

( | )

( | )= target

lure

p f e f( | ( ) lure) = 1

2 2pss- -5 0 2 2

p f e f d( | ( ) target) = 1

2 2pss- - cent5 2 2

Figure 4 Illustration of the placement of confidence criteria ina signal detection framework The oldnew decision criterion isrepresented by C and the remaining confidence criteria are rep-resented by OM OH NM and NH Familiarity values falling aboveOM or OH receive medium- or high-confident old responses (re-spectively) whereas familiarity values that fall below NM or NH re-ceive medium- or high-confident new responses (respectively)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 3: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 291

clared to be new The lower panel of Figure 1 illustratesthe likelihood ratio model and it is immediately apparentthat it looks a lot like the familiarity model shown in theupper panel of Figure 1 The only real difference is the de-cision axis which now represents a log likelihood scale

Likelihood ratio models assume that the operative psy-chological variable is an odds ratio associated with the testitem not its level of familiarity The odds ratio is equal tothe likelihood that the item was drawn from the target dis-tribution divided by the likelihood that it was drawn fromthe lure distribution Graphically this value is given by theheight of the target distribution divided by the height ofthe lure distribution at the point on the familiarity axiswhere the item falls Consider for example an item thatgenerates a level of familiarity equal to the mean famil-iarity of the targets The height of the target distributionat its mean is 738 times the height of the lure distribu-tion (hL) As such a test item that generates this level offamiliarity (ie a familiarity of mTarget) is 738 times aslikely to have been drawn from the target distribution asfrom the lure distribution Whenever the odds are greaterthan even as they are in this case an unbiased subjectwould declare the item to be old Note that the natural logof 738 is 20 so the same item that generates a level offamiliarity falling at the mean of the target distributionin the upper panel of Figure 1 generates a log likelihoodratio of 20 in the lower panel of Figure 1 Similarly anitem whose familiarity value falls midway between themeans of the target and the lure distributions is associatedwith a likelihood ratio of 10 (ie the heights of the twodistributions are equal at that point) which translates to alog likelihood ratio of 0 in the lower panel of Figure 1

Whereas the strength version of signal detection theoryrequires only that the subject have some criterion famil-iarity value in mind in order to arrive at a recognition de-cision the likelihood ratio version requires much morecomplete knowledge of the shapes and locations of theunderlying familiarity distributions in order to make therelevant statistical computations Although those addedassumptions may seem implausible it seems clear thatsubjects sometimes behave as if they do possess the rel-evant knowledge The evidence bearing on this claim willbe reviewed next and the question we will later ask is thisAre humans biologically endowed with the requisiteknowledge and computational ability or have they beentrained to behave that way as a function of prior (extra-laboratory) consequences

Explaining the Mirror EffectThe oldnew decision criterion What is the advan-

tage of assuming a likelihood ratio model of memory Thefamiliarity and likelihood ratio accounts are equally ableto explain say a hit rate of 84 and a false alarm rate of16 (as is shown in Figure 1) but Figure 2 illustrates an im-portant difference between them This figure shows whatthe expected result would be if subjects maintained thesame decision criterion as d9 (the standardized distance be-tween the means of the target and the lure distributions)changed from low to high The upper panel shows the

familiarity-based model and the lower panel shows thelikelihood ratio model We assume that d9 was affected bya standard strength manipulation (eg in the strong con-dition subjects had more time to study the items than inthe weak condition) and that this manipulation affectedthe mean familiarity of the target items without affectingthe mean familiarity of the lures (which in both condi-tions are simply items that did not appear on the list)

Note that the definition of the decision criterion differsbetween the two accounts In the strength version the cri-terion is a particular level of familiarity (so that itemsyielding higher familiarity levels than that are judged to beold) In the likelihood ratio version the criterion is a par-ticular odds ratio (so that items yielding higher odds thanthat are judged to be old)

In the familiarity-based model shown in the upper panelof Figure 2 the criterion is placed 075 standard devia-tions above the mean of the lure distribution whether thetargets are weak or strong (ie the criterion is fixed at aparticular point on the familiarity axis) As such the hitrate increases considerably as a function of strength butthe false alarm rate remains constant If the familiarity ofthe lures does not change as the targets are strengthened(which is the simplest assumption) the only way that thefalse alarm rate would change is if the criterion movedacross conditions

In the likelihood ratio model the criterion is placed atthe point where the odds are even that the item was drawnfrom the target or the lure distributions (ie where theodds ratio is 10) The point of even odds occurs wherethe heights of the two distributions are equal and that oc-curs where the distributions intersect In the familiaritymodel the point of intersection occurs at one level of fa-miliarity in the weak case and at a higher level of famil-iarity in the strong case But these two familiarity valuesboth translate to a value of 0 on the log likelihood ratioaxis (ie ln[10] = 0) Thus from a likelihood ratio pointof view the criterion remains fixed at 0 across the twostrength conditions

Note the different patterns of hit and false alarm ratesyielded by these two fixed-criterion models Unlike thefamiliarity model the likelihood ratio model naturally pre-dicts a mirror effect (ie as the hit rate goes up the falsealarm rate goes down) and that pattern happens to be anearly universal finding in the recognition memory liter-ature (Glanzer Adams Iverson amp Kim 1993) The mir-ror effect is typically observed not only for strength ma-nipulations but also for manipulations of word frequencyconcreteness and a variety of other variables (Glanzer ampAdams 1990) The ability of the likelihood ratio model toaccount for the mirror effect in a natural way is one of itsmost attractive features Indeed largely because of themirror effect recent theorizing about recognition memoryappears to reflect a shift away from the idea that the de-cision axis represents a strength-of-evidence scale (likefamiliarity) and toward the idea that it represents a likeli-hood ratio scale In fact three of the most recent theoriesof human recognition memory are based fundamentallyon the notion that recognition memory decisions are

292 WIXTED AND GAITAN

based on a likelihood ratio computation These includetheories known as attention-likelihood theory (Glanzeret al 1993) retrieving effectively from memory (Shiff-rin amp Steyvers 1997) and subjective-likelihood theory(McClelland amp Chappell 1998)

Although the very existence of the mirror effect sup-ports the likelihood ratio view that effect can still be eas-

ily represented (and explained) with the simpler versionthat assumes a familiarity axis Thus for example imag-ine that in one condition the hit rate is 84 and the falsealarm rate is 16 (so that d9 = 2) and in another condi-tion the hit rate is 69 and the false alarm rate is 31 (sothat d9 = 1) Further imagine that the mean familiarity ofthe lure distribution is the same in both conditions but

Figure 2 Upper panel familiarity-based signal detection model of weak andstrong conditions over which the decision criterion remains fixed on the decisionaxis Lower panel likelihood ratio signal detection model of weak and strongconditions over which the decision criterion remains fixed on the decision axis

REINFORCEMENT HISTORY SURROGATES 293

the mean familiarity of the targets is greater in the strongcondition than in the weak condition Figure 3 shows howto represent this situation assuming a familiarity axisBasically when the target distribution weakens the cri-terion shifts to the left in such a way as to remain midwaybetween the means of the target and the lure distribu-tions Subjects might very well be inclined to do this be-cause after a strong list they would presumably know thata test item had it appeared on the list would be very fa-miliar so a stringent criterion could be used without miss-ing too many of the targets After a weak list by contrastthey would know that a target item might be somewhatunfamiliar even if it had appeared on the list so a less strin-gent criterion might be used so as not to miss too many ofthe targets If the criterion did shift in this way across con-ditions (which would not involve a likelihood ratio com-putation) a mirror effect would be observed Neverthe-less a likelihood ratio theorist would say that the apparentcriterion shift is an illusion What looks like a criterionshift on the familiarity axis actually reflects the fact thatthe subject uses the same criterion odds ratio in both cases

Confidence criteria The argument just presented inconnection with the movement of the oldnew decision cri-terion as a function of strength can be generalized to the

situation in which subjects are asked not only for an oldnew judgment but also for a confidence rating Thus forexample when asked to decide whether or not a test itemappeared on a study list the subject might be offered sixchoice alternatives labeled 222 22 2 1 11111 where the first three represent new responses ofvarying confidence and the last three represent old re-sponses of varying confidence If the subject is absolutelysure that a particular test item appeared on the list theldquo111rdquo button would be pressed A less certain old re-sponse would be indicated by pressing the ldquo11rdquo buttoninstead

A straightforward extension of the basic detectionmodel holds that confidence judgments are reached inmuch the same way that oldnew decisions are reached(eg Macmillan amp Creelman 1991) That is as is illus-trated in Figure 4 a different criterion for each confidencerating is theoretically placed somewhere along the fa-miliarity axis Familiarity values that fall above OH receivean old response with high confidence (ie 111) thosethat fall between OM and OH receive an old response withmedium confidence (11) and those that fall between Cand OM receive an old response with low confidence(+)1 On the other side of the criterion familiarity valuesthat fall below C but above NM receive a new response withlow confidence (2) those that fall below NM but aboveNH receive a new response with medium confidence (22)and those that fall below NH receive a new response withhigh confidence (222)

As was indicated earlier the likelihood ratio modelholds that the mirror effect occurs because subjects use thesame likelihood ratio to make decisions in both the strongand the weak conditions A fixed likelihood ratio criterionshows up as a shifting criterion as a function of strengthwhen the situation is represented by the familiarity versionof detection theory (as in Figure 3) A similar story appliesto the way in which confidence criteria change as a func-tion of strength If subjects maintain constant likelihoodratios for each of the confidence criteria then when rep-resented in terms of the familiarity version of detectiontheory the confidence criteria should move in a predictableway as d9 decreases More specifically as will be describednext the criteria should diverge or fan out as d9 ap-proaches zero Figure 5 shows three different ways the con-fidence criteria might shift as d9 changes from high to lowIn all three cases the oldnew decision criterion shifts tothe left (thereby yielding a mirror effect) but the confi-dence criteria shift in different ways The upper panelshows the confidence criteria shifting in lockstep with thedecision criterion the middle panel shows the confidencecriteria converging toward the oldnew decision criterionand the lower panel shows the confidence criteria diverg-ing (as the likelihood ratio account uniquely predicts)

According to the likelihood ratio account the oldnewdecision criterion C is placed in such as way as to main-tain a likelihood ratio of 1 for unbiased responding Sim-ilarly the OH criterion is placed in such a way as to main-tain a larger likelihood ratio such as 10 to 1 That is anyitem that is 10 or more times as likely to have come from

Figure 3 Familiarity-based signal detection model of weak andstrong conditions over which the decision shifts on the decisionaxis

294 WIXTED AND GAITAN

the target distribution than the lure distribution receivesan old response with high confidence Different subjectsmight use different specific odds ratios before making ahigh-confident response but we will assume that OHfalls where the likelihood ratio is 10 to 1 for the sake ofsimplicity Similarly the NH criterion (ie the criterionfor saying new with high confidence) might be placed insuch a way as to maintain a small likelihood ratio suchas 1 to 10 (1) That is any item with only a 1 in 10 chance(or less) of having been drawn from the target distributionreceives a no response with high confidence As d9 de-creases this model assumes that the criteria move in sucha way as to maintain these confidence-specific likelihoodratios

How could subjects possibly adjust the criteria in thisway The likelihood ratio model assumes that the subjectnot only has information about the location of the targetand lure distributions but also has information about theirmathematical forms Where the latter information mightcome from is usually underspecified (and is the mainfocus of this article) If the target and lure distributions areGaussian with means of d9 and 0 respectively and com-mon standard deviation of s the probability of encoun-tering a test item with a familiarity of f given that it is atarget p( f | target) is

(1)

and the probability of encountering a test item with a fa-miliarity of f given that it is a lure p( f | lure) is

(2)

where f is a value along the familiarity axis The mean ofthe lure distribution is arbitrarily set to 0 for the sake ofsimplicity All other values (eg the mean of the target

distribution and the location of the decision criterion) aremeasured with respect to the mean of the lure distribu-tion The mean of the target distribution d9 representsthe number of standard deviations the mean of that dis-tribution falls above the mean of the lure distribution In-deed whether we are considering the mean of the targetdistribution or the location of the decision and confi-dence criteria the values represent the number of stan-dard deviation units above (or below) the mean of thelure distribution

According to likelihood ratio models of human recog-nition memory subjects use their knowledge of the math-ematical forms of the target and lure distributions to placethe oldnew decision criterion C at the point on the fa-miliarity axis where the height of the signal distributionequals that of the noise distributionmdashthat is where thelikelihood ratio p( f | target) p( f | lure) is equal to 1 Thisis the optimal location of the decision criterion becausethis is where the odds that the test item was drawn from thetarget distribution exactly equals the odds that it wasdrawn from the lure distribution For any familiarity valuegreater than that the odds that the item was drawn fromthe target distribution (ie that it appeared on the list) arebetter than even in which case an old response makessense As was indicated above the remaining confidencecriteria are placed in such a way as to maintain specificodds ratios that are greater than or less than 1 If OH is as-sociated with a likelihood ratio of 10 to 1 that confidencecriterion is placed on the familiarity axis at the pointwhere the height of the target distribution is 10 times theheight of the lure distribution regardless of the value of d9

For any given likelihood ratio L the criterion is placedon the familiarity axis at the point f that satisfies the fol-lowing equation

(3)

The right side of the equation is simply the ratio of theheight of the target (signal plus noise) distribution to theheight of the lure (noise) distribution at the point f on thefamiliarity axis That is using the identities expressed inEquations 1 and 2

which after setting s to 1 for the sake of simplicity andrearranging terms simplifies to

Solving this equation for f yields

(4)

To determine where a particular criterion should beplaced on the familiarity axis to satisfy a particular oddsratio one need only enter the value of L theoretically as-

f d L f

d= cent +

cent2

ln[ ( )]

L f e f d f( ) ( ) = - - cent +5 52 2

L f

e

e

f d

f( )

( )

( ) =

- - cent

- -

1

21

2

2

5

2

5 0

2 2

2 2

ps

ps

s

s

L fp f

p f( )

( | )

( | )= target

lure

p f e f( | ( ) lure) = 1

2 2pss- -5 0 2 2

p f e f d( | ( ) target) = 1

2 2pss- - cent5 2 2

Figure 4 Illustration of the placement of confidence criteria ina signal detection framework The oldnew decision criterion isrepresented by C and the remaining confidence criteria are rep-resented by OM OH NM and NH Familiarity values falling aboveOM or OH receive medium- or high-confident old responses (re-spectively) whereas familiarity values that fall below NM or NH re-ceive medium- or high-confident new responses (respectively)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 4: Wixted, J. T. & Gaitan, S. (2002)

292 WIXTED AND GAITAN

based on a likelihood ratio computation These includetheories known as attention-likelihood theory (Glanzeret al 1993) retrieving effectively from memory (Shiff-rin amp Steyvers 1997) and subjective-likelihood theory(McClelland amp Chappell 1998)

Although the very existence of the mirror effect sup-ports the likelihood ratio view that effect can still be eas-

ily represented (and explained) with the simpler versionthat assumes a familiarity axis Thus for example imag-ine that in one condition the hit rate is 84 and the falsealarm rate is 16 (so that d9 = 2) and in another condi-tion the hit rate is 69 and the false alarm rate is 31 (sothat d9 = 1) Further imagine that the mean familiarity ofthe lure distribution is the same in both conditions but

Figure 2 Upper panel familiarity-based signal detection model of weak andstrong conditions over which the decision criterion remains fixed on the decisionaxis Lower panel likelihood ratio signal detection model of weak and strongconditions over which the decision criterion remains fixed on the decision axis

REINFORCEMENT HISTORY SURROGATES 293

the mean familiarity of the targets is greater in the strongcondition than in the weak condition Figure 3 shows howto represent this situation assuming a familiarity axisBasically when the target distribution weakens the cri-terion shifts to the left in such a way as to remain midwaybetween the means of the target and the lure distribu-tions Subjects might very well be inclined to do this be-cause after a strong list they would presumably know thata test item had it appeared on the list would be very fa-miliar so a stringent criterion could be used without miss-ing too many of the targets After a weak list by contrastthey would know that a target item might be somewhatunfamiliar even if it had appeared on the list so a less strin-gent criterion might be used so as not to miss too many ofthe targets If the criterion did shift in this way across con-ditions (which would not involve a likelihood ratio com-putation) a mirror effect would be observed Neverthe-less a likelihood ratio theorist would say that the apparentcriterion shift is an illusion What looks like a criterionshift on the familiarity axis actually reflects the fact thatthe subject uses the same criterion odds ratio in both cases

Confidence criteria The argument just presented inconnection with the movement of the oldnew decision cri-terion as a function of strength can be generalized to the

situation in which subjects are asked not only for an oldnew judgment but also for a confidence rating Thus forexample when asked to decide whether or not a test itemappeared on a study list the subject might be offered sixchoice alternatives labeled 222 22 2 1 11111 where the first three represent new responses ofvarying confidence and the last three represent old re-sponses of varying confidence If the subject is absolutelysure that a particular test item appeared on the list theldquo111rdquo button would be pressed A less certain old re-sponse would be indicated by pressing the ldquo11rdquo buttoninstead

A straightforward extension of the basic detectionmodel holds that confidence judgments are reached inmuch the same way that oldnew decisions are reached(eg Macmillan amp Creelman 1991) That is as is illus-trated in Figure 4 a different criterion for each confidencerating is theoretically placed somewhere along the fa-miliarity axis Familiarity values that fall above OH receivean old response with high confidence (ie 111) thosethat fall between OM and OH receive an old response withmedium confidence (11) and those that fall between Cand OM receive an old response with low confidence(+)1 On the other side of the criterion familiarity valuesthat fall below C but above NM receive a new response withlow confidence (2) those that fall below NM but aboveNH receive a new response with medium confidence (22)and those that fall below NH receive a new response withhigh confidence (222)

As was indicated earlier the likelihood ratio modelholds that the mirror effect occurs because subjects use thesame likelihood ratio to make decisions in both the strongand the weak conditions A fixed likelihood ratio criterionshows up as a shifting criterion as a function of strengthwhen the situation is represented by the familiarity versionof detection theory (as in Figure 3) A similar story appliesto the way in which confidence criteria change as a func-tion of strength If subjects maintain constant likelihoodratios for each of the confidence criteria then when rep-resented in terms of the familiarity version of detectiontheory the confidence criteria should move in a predictableway as d9 decreases More specifically as will be describednext the criteria should diverge or fan out as d9 ap-proaches zero Figure 5 shows three different ways the con-fidence criteria might shift as d9 changes from high to lowIn all three cases the oldnew decision criterion shifts tothe left (thereby yielding a mirror effect) but the confi-dence criteria shift in different ways The upper panelshows the confidence criteria shifting in lockstep with thedecision criterion the middle panel shows the confidencecriteria converging toward the oldnew decision criterionand the lower panel shows the confidence criteria diverg-ing (as the likelihood ratio account uniquely predicts)

According to the likelihood ratio account the oldnewdecision criterion C is placed in such as way as to main-tain a likelihood ratio of 1 for unbiased responding Sim-ilarly the OH criterion is placed in such a way as to main-tain a larger likelihood ratio such as 10 to 1 That is anyitem that is 10 or more times as likely to have come from

Figure 3 Familiarity-based signal detection model of weak andstrong conditions over which the decision shifts on the decisionaxis

294 WIXTED AND GAITAN

the target distribution than the lure distribution receivesan old response with high confidence Different subjectsmight use different specific odds ratios before making ahigh-confident response but we will assume that OHfalls where the likelihood ratio is 10 to 1 for the sake ofsimplicity Similarly the NH criterion (ie the criterionfor saying new with high confidence) might be placed insuch a way as to maintain a small likelihood ratio suchas 1 to 10 (1) That is any item with only a 1 in 10 chance(or less) of having been drawn from the target distributionreceives a no response with high confidence As d9 de-creases this model assumes that the criteria move in sucha way as to maintain these confidence-specific likelihoodratios

How could subjects possibly adjust the criteria in thisway The likelihood ratio model assumes that the subjectnot only has information about the location of the targetand lure distributions but also has information about theirmathematical forms Where the latter information mightcome from is usually underspecified (and is the mainfocus of this article) If the target and lure distributions areGaussian with means of d9 and 0 respectively and com-mon standard deviation of s the probability of encoun-tering a test item with a familiarity of f given that it is atarget p( f | target) is

(1)

and the probability of encountering a test item with a fa-miliarity of f given that it is a lure p( f | lure) is

(2)

where f is a value along the familiarity axis The mean ofthe lure distribution is arbitrarily set to 0 for the sake ofsimplicity All other values (eg the mean of the target

distribution and the location of the decision criterion) aremeasured with respect to the mean of the lure distribu-tion The mean of the target distribution d9 representsthe number of standard deviations the mean of that dis-tribution falls above the mean of the lure distribution In-deed whether we are considering the mean of the targetdistribution or the location of the decision and confi-dence criteria the values represent the number of stan-dard deviation units above (or below) the mean of thelure distribution

According to likelihood ratio models of human recog-nition memory subjects use their knowledge of the math-ematical forms of the target and lure distributions to placethe oldnew decision criterion C at the point on the fa-miliarity axis where the height of the signal distributionequals that of the noise distributionmdashthat is where thelikelihood ratio p( f | target) p( f | lure) is equal to 1 Thisis the optimal location of the decision criterion becausethis is where the odds that the test item was drawn from thetarget distribution exactly equals the odds that it wasdrawn from the lure distribution For any familiarity valuegreater than that the odds that the item was drawn fromthe target distribution (ie that it appeared on the list) arebetter than even in which case an old response makessense As was indicated above the remaining confidencecriteria are placed in such a way as to maintain specificodds ratios that are greater than or less than 1 If OH is as-sociated with a likelihood ratio of 10 to 1 that confidencecriterion is placed on the familiarity axis at the pointwhere the height of the target distribution is 10 times theheight of the lure distribution regardless of the value of d9

For any given likelihood ratio L the criterion is placedon the familiarity axis at the point f that satisfies the fol-lowing equation

(3)

The right side of the equation is simply the ratio of theheight of the target (signal plus noise) distribution to theheight of the lure (noise) distribution at the point f on thefamiliarity axis That is using the identities expressed inEquations 1 and 2

which after setting s to 1 for the sake of simplicity andrearranging terms simplifies to

Solving this equation for f yields

(4)

To determine where a particular criterion should beplaced on the familiarity axis to satisfy a particular oddsratio one need only enter the value of L theoretically as-

f d L f

d= cent +

cent2

ln[ ( )]

L f e f d f( ) ( ) = - - cent +5 52 2

L f

e

e

f d

f( )

( )

( ) =

- - cent

- -

1

21

2

2

5

2

5 0

2 2

2 2

ps

ps

s

s

L fp f

p f( )

( | )

( | )= target

lure

p f e f( | ( ) lure) = 1

2 2pss- -5 0 2 2

p f e f d( | ( ) target) = 1

2 2pss- - cent5 2 2

Figure 4 Illustration of the placement of confidence criteria ina signal detection framework The oldnew decision criterion isrepresented by C and the remaining confidence criteria are rep-resented by OM OH NM and NH Familiarity values falling aboveOM or OH receive medium- or high-confident old responses (re-spectively) whereas familiarity values that fall below NM or NH re-ceive medium- or high-confident new responses (respectively)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 5: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 293

the mean familiarity of the targets is greater in the strongcondition than in the weak condition Figure 3 shows howto represent this situation assuming a familiarity axisBasically when the target distribution weakens the cri-terion shifts to the left in such a way as to remain midwaybetween the means of the target and the lure distribu-tions Subjects might very well be inclined to do this be-cause after a strong list they would presumably know thata test item had it appeared on the list would be very fa-miliar so a stringent criterion could be used without miss-ing too many of the targets After a weak list by contrastthey would know that a target item might be somewhatunfamiliar even if it had appeared on the list so a less strin-gent criterion might be used so as not to miss too many ofthe targets If the criterion did shift in this way across con-ditions (which would not involve a likelihood ratio com-putation) a mirror effect would be observed Neverthe-less a likelihood ratio theorist would say that the apparentcriterion shift is an illusion What looks like a criterionshift on the familiarity axis actually reflects the fact thatthe subject uses the same criterion odds ratio in both cases

Confidence criteria The argument just presented inconnection with the movement of the oldnew decision cri-terion as a function of strength can be generalized to the

situation in which subjects are asked not only for an oldnew judgment but also for a confidence rating Thus forexample when asked to decide whether or not a test itemappeared on a study list the subject might be offered sixchoice alternatives labeled 222 22 2 1 11111 where the first three represent new responses ofvarying confidence and the last three represent old re-sponses of varying confidence If the subject is absolutelysure that a particular test item appeared on the list theldquo111rdquo button would be pressed A less certain old re-sponse would be indicated by pressing the ldquo11rdquo buttoninstead

A straightforward extension of the basic detectionmodel holds that confidence judgments are reached inmuch the same way that oldnew decisions are reached(eg Macmillan amp Creelman 1991) That is as is illus-trated in Figure 4 a different criterion for each confidencerating is theoretically placed somewhere along the fa-miliarity axis Familiarity values that fall above OH receivean old response with high confidence (ie 111) thosethat fall between OM and OH receive an old response withmedium confidence (11) and those that fall between Cand OM receive an old response with low confidence(+)1 On the other side of the criterion familiarity valuesthat fall below C but above NM receive a new response withlow confidence (2) those that fall below NM but aboveNH receive a new response with medium confidence (22)and those that fall below NH receive a new response withhigh confidence (222)

As was indicated earlier the likelihood ratio modelholds that the mirror effect occurs because subjects use thesame likelihood ratio to make decisions in both the strongand the weak conditions A fixed likelihood ratio criterionshows up as a shifting criterion as a function of strengthwhen the situation is represented by the familiarity versionof detection theory (as in Figure 3) A similar story appliesto the way in which confidence criteria change as a func-tion of strength If subjects maintain constant likelihoodratios for each of the confidence criteria then when rep-resented in terms of the familiarity version of detectiontheory the confidence criteria should move in a predictableway as d9 decreases More specifically as will be describednext the criteria should diverge or fan out as d9 ap-proaches zero Figure 5 shows three different ways the con-fidence criteria might shift as d9 changes from high to lowIn all three cases the oldnew decision criterion shifts tothe left (thereby yielding a mirror effect) but the confi-dence criteria shift in different ways The upper panelshows the confidence criteria shifting in lockstep with thedecision criterion the middle panel shows the confidencecriteria converging toward the oldnew decision criterionand the lower panel shows the confidence criteria diverg-ing (as the likelihood ratio account uniquely predicts)

According to the likelihood ratio account the oldnewdecision criterion C is placed in such as way as to main-tain a likelihood ratio of 1 for unbiased responding Sim-ilarly the OH criterion is placed in such a way as to main-tain a larger likelihood ratio such as 10 to 1 That is anyitem that is 10 or more times as likely to have come from

Figure 3 Familiarity-based signal detection model of weak andstrong conditions over which the decision shifts on the decisionaxis

294 WIXTED AND GAITAN

the target distribution than the lure distribution receivesan old response with high confidence Different subjectsmight use different specific odds ratios before making ahigh-confident response but we will assume that OHfalls where the likelihood ratio is 10 to 1 for the sake ofsimplicity Similarly the NH criterion (ie the criterionfor saying new with high confidence) might be placed insuch a way as to maintain a small likelihood ratio suchas 1 to 10 (1) That is any item with only a 1 in 10 chance(or less) of having been drawn from the target distributionreceives a no response with high confidence As d9 de-creases this model assumes that the criteria move in sucha way as to maintain these confidence-specific likelihoodratios

How could subjects possibly adjust the criteria in thisway The likelihood ratio model assumes that the subjectnot only has information about the location of the targetand lure distributions but also has information about theirmathematical forms Where the latter information mightcome from is usually underspecified (and is the mainfocus of this article) If the target and lure distributions areGaussian with means of d9 and 0 respectively and com-mon standard deviation of s the probability of encoun-tering a test item with a familiarity of f given that it is atarget p( f | target) is

(1)

and the probability of encountering a test item with a fa-miliarity of f given that it is a lure p( f | lure) is

(2)

where f is a value along the familiarity axis The mean ofthe lure distribution is arbitrarily set to 0 for the sake ofsimplicity All other values (eg the mean of the target

distribution and the location of the decision criterion) aremeasured with respect to the mean of the lure distribu-tion The mean of the target distribution d9 representsthe number of standard deviations the mean of that dis-tribution falls above the mean of the lure distribution In-deed whether we are considering the mean of the targetdistribution or the location of the decision and confi-dence criteria the values represent the number of stan-dard deviation units above (or below) the mean of thelure distribution

According to likelihood ratio models of human recog-nition memory subjects use their knowledge of the math-ematical forms of the target and lure distributions to placethe oldnew decision criterion C at the point on the fa-miliarity axis where the height of the signal distributionequals that of the noise distributionmdashthat is where thelikelihood ratio p( f | target) p( f | lure) is equal to 1 Thisis the optimal location of the decision criterion becausethis is where the odds that the test item was drawn from thetarget distribution exactly equals the odds that it wasdrawn from the lure distribution For any familiarity valuegreater than that the odds that the item was drawn fromthe target distribution (ie that it appeared on the list) arebetter than even in which case an old response makessense As was indicated above the remaining confidencecriteria are placed in such a way as to maintain specificodds ratios that are greater than or less than 1 If OH is as-sociated with a likelihood ratio of 10 to 1 that confidencecriterion is placed on the familiarity axis at the pointwhere the height of the target distribution is 10 times theheight of the lure distribution regardless of the value of d9

For any given likelihood ratio L the criterion is placedon the familiarity axis at the point f that satisfies the fol-lowing equation

(3)

The right side of the equation is simply the ratio of theheight of the target (signal plus noise) distribution to theheight of the lure (noise) distribution at the point f on thefamiliarity axis That is using the identities expressed inEquations 1 and 2

which after setting s to 1 for the sake of simplicity andrearranging terms simplifies to

Solving this equation for f yields

(4)

To determine where a particular criterion should beplaced on the familiarity axis to satisfy a particular oddsratio one need only enter the value of L theoretically as-

f d L f

d= cent +

cent2

ln[ ( )]

L f e f d f( ) ( ) = - - cent +5 52 2

L f

e

e

f d

f( )

( )

( ) =

- - cent

- -

1

21

2

2

5

2

5 0

2 2

2 2

ps

ps

s

s

L fp f

p f( )

( | )

( | )= target

lure

p f e f( | ( ) lure) = 1

2 2pss- -5 0 2 2

p f e f d( | ( ) target) = 1

2 2pss- - cent5 2 2

Figure 4 Illustration of the placement of confidence criteria ina signal detection framework The oldnew decision criterion isrepresented by C and the remaining confidence criteria are rep-resented by OM OH NM and NH Familiarity values falling aboveOM or OH receive medium- or high-confident old responses (re-spectively) whereas familiarity values that fall below NM or NH re-ceive medium- or high-confident new responses (respectively)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 6: Wixted, J. T. & Gaitan, S. (2002)

294 WIXTED AND GAITAN

the target distribution than the lure distribution receivesan old response with high confidence Different subjectsmight use different specific odds ratios before making ahigh-confident response but we will assume that OHfalls where the likelihood ratio is 10 to 1 for the sake ofsimplicity Similarly the NH criterion (ie the criterionfor saying new with high confidence) might be placed insuch a way as to maintain a small likelihood ratio suchas 1 to 10 (1) That is any item with only a 1 in 10 chance(or less) of having been drawn from the target distributionreceives a no response with high confidence As d9 de-creases this model assumes that the criteria move in sucha way as to maintain these confidence-specific likelihoodratios

How could subjects possibly adjust the criteria in thisway The likelihood ratio model assumes that the subjectnot only has information about the location of the targetand lure distributions but also has information about theirmathematical forms Where the latter information mightcome from is usually underspecified (and is the mainfocus of this article) If the target and lure distributions areGaussian with means of d9 and 0 respectively and com-mon standard deviation of s the probability of encoun-tering a test item with a familiarity of f given that it is atarget p( f | target) is

(1)

and the probability of encountering a test item with a fa-miliarity of f given that it is a lure p( f | lure) is

(2)

where f is a value along the familiarity axis The mean ofthe lure distribution is arbitrarily set to 0 for the sake ofsimplicity All other values (eg the mean of the target

distribution and the location of the decision criterion) aremeasured with respect to the mean of the lure distribu-tion The mean of the target distribution d9 representsthe number of standard deviations the mean of that dis-tribution falls above the mean of the lure distribution In-deed whether we are considering the mean of the targetdistribution or the location of the decision and confi-dence criteria the values represent the number of stan-dard deviation units above (or below) the mean of thelure distribution

According to likelihood ratio models of human recog-nition memory subjects use their knowledge of the math-ematical forms of the target and lure distributions to placethe oldnew decision criterion C at the point on the fa-miliarity axis where the height of the signal distributionequals that of the noise distributionmdashthat is where thelikelihood ratio p( f | target) p( f | lure) is equal to 1 Thisis the optimal location of the decision criterion becausethis is where the odds that the test item was drawn from thetarget distribution exactly equals the odds that it wasdrawn from the lure distribution For any familiarity valuegreater than that the odds that the item was drawn fromthe target distribution (ie that it appeared on the list) arebetter than even in which case an old response makessense As was indicated above the remaining confidencecriteria are placed in such a way as to maintain specificodds ratios that are greater than or less than 1 If OH is as-sociated with a likelihood ratio of 10 to 1 that confidencecriterion is placed on the familiarity axis at the pointwhere the height of the target distribution is 10 times theheight of the lure distribution regardless of the value of d9

For any given likelihood ratio L the criterion is placedon the familiarity axis at the point f that satisfies the fol-lowing equation

(3)

The right side of the equation is simply the ratio of theheight of the target (signal plus noise) distribution to theheight of the lure (noise) distribution at the point f on thefamiliarity axis That is using the identities expressed inEquations 1 and 2

which after setting s to 1 for the sake of simplicity andrearranging terms simplifies to

Solving this equation for f yields

(4)

To determine where a particular criterion should beplaced on the familiarity axis to satisfy a particular oddsratio one need only enter the value of L theoretically as-

f d L f

d= cent +

cent2

ln[ ( )]

L f e f d f( ) ( ) = - - cent +5 52 2

L f

e

e

f d

f( )

( )

( ) =

- - cent

- -

1

21

2

2

5

2

5 0

2 2

2 2

ps

ps

s

s

L fp f

p f( )

( | )

( | )= target

lure

p f e f( | ( ) lure) = 1

2 2pss- -5 0 2 2

p f e f d( | ( ) target) = 1

2 2pss- - cent5 2 2

Figure 4 Illustration of the placement of confidence criteria ina signal detection framework The oldnew decision criterion isrepresented by C and the remaining confidence criteria are rep-resented by OM OH NM and NH Familiarity values falling aboveOM or OH receive medium- or high-confident old responses (re-spectively) whereas familiarity values that fall below NM or NH re-ceive medium- or high-confident new responses (respectively)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 7: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 295

sociated with that criterion For example theoretically theoldnew decision criterion C is placed at the point wherethe odds ratio is 1 which is to say L( f ) = 1 Substituting1 for L( f ) (and C for f ) in Equation 2 reveals that C =d92 That is according to this model the decision crite-rion should be placed midway between the target and thelure distributions no matter what the value of d9 is If thecriterion were always placed in that location in order to al-ways maintain a likelihood ratio of 1 a mirror effectwould always be observed (and it almost always is) If OHis placed at the point on the familiarity axis where the odds

are 10 to 1 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

In other words OH is placed at a point higher than d92 butby an amount that varies with d9 The more general ver-sion of this equation is OH = d92 + k1d9 where k1 is theconstant log likelihood ratio associated with OH (which is

O dd

dd

H = cent +cent

= cent +cent

2

10

22 30

ln( )

Figure 5 Illustration of the movement of the confidence criteria as afunction of d9 according to the lockstep range and likelihood ratio mod-els (upper middle and lower panels respectively) In each panel the topfigure represents a strong condition and the bottom figure represents aweak condition

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 8: Wixted, J. T. & Gaitan, S. (2002)

296 WIXTED AND GAITAN

10 in this example but could differ for different subjects)Similarly if NH is placed at the point where the odds areonly 1 in 10 in favor of the itemrsquos having been drawn fromthe target distribution Equation 2 indicates that

The more general version of this equation is NH = d92 +k2d9 where k2 is the constant log likelihood ratio associ-ated with NH This model predicts that the distance be-tween OH and NH on the familiarity axis will be inverselyrelated to d9 That distance (ie OH minus NH ) is given by

Thus as d9 becomes very large OH and NH should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two criteria should moveinfinitely far apart

The upshot of all of this is that when the situation isrepresented with a familiarity scale likelihood ratio mod-els predict that the decision criterion (C) should move insuch a way that the hit rate decreases and the false alarmrate increases as d9 decreases (ie a mirror effect shouldbe observed) because C will always be placed at d92 onthe familiarity axis since that is the point where the oddsratio is equal to 1 The more general version of this accountis that all of the confidence criteria should diverge as d9decreases (as in the lower panel of Figure 5) Mirror effectsare usually observed in real data and as will be describedin more detail next Stretch and Wixted (1998a) showedthat the confidence criteria move in essentially the mannerpredicted by the likelihood ratio account as well

How are the locations of the decision criterion and thevarious confidence criteria inferred from actual behavioraldata The process is not as mysterious as it might seemConsider first how d9 is computed For a given hit and falsealarm rate only one arrangement of the target and lure dis-tributions and the criterion is possible assuming an equal-variance detection model For example symmetrical hitand false alarm rates of 84 and 16 yield a d9 of 20 witha criterion placement of d92 (ie C = 10 in this exam-ple) Standard deviation units are used in both cases Thatis the only way to represent these hit and false alarm ratesin an equal-variance signal detection model is for themeans of the two distributions to be separated by two stan-dard deviations and for the criterion to be placed one stan-dard deviation above the mean of the lure distribution Noother placement of the criterion would correspond to afalse alarm rate of 16 and given that no other placementof the mean of the target distribution would yield a hit rate

of 84 In practice standard equations exist for making therelevant computations

and

Thus in this example C = 2z(16) = 2(210) = 10which is to say that the criterion is placed one standarddeviation above the mean of the lure distribution Thismethod of computing the placement of C involves all oldresponses to lures with a confidence rating of maybe oldor greater (which is to say all old responses to lures 111 or 111) A very similar method can be used todetermine the locations of the various confidence crite-ria To determine where the OM confidence criterion isplaced for example one can use only those responses tolures that received a confidence rating of probably old orgreater (ie all false alarms committed by choosing the11 or 111 alternatives) From these responses a newfalse alarm rate can be computed and the location of theOM criterion can be estimated by converting that falsealarm rate into a z score (exactly as one usually does forold responses that exceed the oldnew decision criterion)In practice a more comprehensive method is used (onethat allows for unequal variance and that takes into accountconfidence-specific hit rates as well) but the method justdescribed works about as well and is conceptually muchsimpler (Stretch amp Wixted 1998a) The locations of the var-ious confidence criteria on the familiarity axis are simplythe cumulative confidence-specific false alarm rates con-verted to a z score

Stretch and Wixted (1998a) tested the predictions of thelikelihood ratio account with a very simple experiment Inone condition subjects studied lists of words that were pre-sented quite slowly after which they received a recognitiontest (this was the strong condition) In another they stud-ied lists of words that were presented more rapidly fol-lowed by a recognition test (the weak condition) For bothconditions the locations of the various confidence criteriawere computed in essentially the manner described aboveAs uniquely predicted by the likelihood ratio account theconfidence criteria fanned out on the familiarity axis as d9decreased Figure 6 presents the relevant estimates of thelocations of the confidence criteria in the strong and theweak conditions Note that whereas the location of the de-cision criterion C decreased on the familiarity axis as con-ditions changed from weak to strong the location of thehigh-confident old criterion OH remained essentially con-stant which means that even though the overall false alarmrate increased in the weak condition the high-confidentfalse alarm rate did not This phenomenon is uniquely pre-dicted by the likelihood ratio account

Thus whether we consider the basic mirror effect or themore complete picture provided by ratings of confidencesubjects behave as if they are able to maintain essentially(although not necessarily exactly) constant likelihood ra-tios across conditions The fact that subjects are able to do

C z= - ( ) FA

cent = -d z z( ) ( )HIT FA

O N d k

dd k

d

k k

d

H H- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

=-cent

2 21 2

1 2

N dd

dd

H = cent +cent

= cent -cent

20 10

22 30

ln( )

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 9: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 297

this is rather mysterious at least on the surface because itis not clear how they would have acquired the detailed in-formation it requires (and likelihood ratio models typicallyhave little to say about this key issue)

Within-list strength manipulations It is worth not-ing that although subjects often behave in accordance withthe predictions of likelihood ratio models there are con-ditions under which they do not All this really means isthat it is a mistake to think of humans as being fully in-formed likelihood ratio computers For the sake of com-pleteness we will turn now to a brief discussion of theconditions under which subjects do not behave in the man-ner predicted by likelihood ratio accounts after which wewill return to the central question of how it is that theyoften do

Strength is usually manipulated between lists to producea mirror effect but Stretch and Wixted (1998b) also ma-nipulated strength within list In each list half the wordswere colored red and they were presented five times eachThe other half were colored blue and they appeared onlyonce each The red and the blue words were randomlyintermixed and the red word repetitions were randomlyscattered throughout the list On the subsequent recog-nition test the red and the blue targets were randomly in-termixed with red and blue lures Obviously the hit ratein the strong (red) condition was expected to exceed thehit rate in the weak (blue) condition and it did Of inter-est was whether or not the false alarm rate in the strongcondition (ie to red lures) would be less than the falsealarm rate in the weak condition (ie to blue lures) If sothe within-list strength manipulation would yield resultssimilar to those produced by the between-list strength ma-nipulation However the results reported by Stretch andWixted (1998b) showed a different pattern In two exper-iments in which strength was cued by color the hit rate to

the strong items was much higher than the hit rate to theweak items (as was expected) but the false alarm rateswere nearly identical Similar results when other list ma-terials were used were also recently reported by MorrellGaitan and Wixted (2002) These results correspond tothe situation depicted earlier in the upper panel of Fig-ure 2 according to which the decision criterion remainsfixed on the familiarity axis as a function of strength Afully informed likelihood ratio computer would not re-spond in this manner but would instead adjust the cri-terion as a function of strength in order to maintain aconstant likelihood ratio across conditions (as in the between-list strength manipulation)

The main difference between the within- and thebetween-list strength manipulations lies in how rapidlythe conditions change during testing In the between-listcase the conditions change slowly (eg all the targetitems are weak after one list and all are strong after theother) In the within-list case the conditions change at arapid rate (eg a strong test item might be followed by aweak test item followed by a strong test item and so on)Under such conditions subjects tend to aggregate overthose conditions rather than respond in accordance witheach condition separately We shall see later that a simi-lar issue arises in recognition memory experiments in-volving animal subjects but the important point for thetime being is that these data show that organisms do notalways behave as if they were fully informed likelihoodratio computers

Even though subjects do not always behave in exactlythe manner one would expect according to a likelihoodratio model they usually do and the question of interestconcerns how they acquire the information needed to dothat More specifically what accounts for the fact thatwhen strength is manipulated between lists subjects have

Figure 6 Estimates of the locations of the confidence criteria (OHC and NH ) in the strong and weak conditions of a recognition memoryexperiment reported by Stretch and Wixted (1998a)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 10: Wixted, J. T. & Gaitan, S. (2002)

298 WIXTED AND GAITAN

information available so that the confidence criteria fanout on the familiarity axis as d9 decreases in just the man-ner predicted by the likelihood ratio account That is aquestion left largely unanswered by likelihood ratio ac-counts and suggesting an answer to it is the issue we willconsider next

Mirror Effects in PigeonsWhat is often overlooked in the human cognitive liter-

ature is the possibility that subjects have experiencedyears of making recognition decisions and expressingvarious levels of confidence Presumably those expres-sions of confidence have been followed by social conse-quences that have served to shape their behavior Con-ceivably it is those consequences that shaped behavior tomimic what a history-less likelihood ratio computermight do We begin our investigation into this possibilityby making the point that pigeons show a mirror effecttoo In pigeon research unlike in human research recog-nition memory decisions are followed by experimentallyarranged consequences and the connection betweenwhere a pigeon places its decision criterion and the con-sequences it has received in the past is easier to appreci-ate That explicitly arranged reinforcers could influencethe location of the decision criterion was recognized fromthe earliest days of detection theory (eg Green amp Swets1966) but research into that issue never really blossomedin the literature on humans The most influential accountalong these lines in the literature on animals where rele-vant research did blossom was proposed some time laterby Davison and Tustin (1978) Nevertheless it is proba-bly fair to say that the connection between their research(and the research their ideas generated) and what a like-lihood ratio theorist might say about the movement ofconfidence criteria across conditions is not easy to seeMaking that connection explicit is what we will try to ac-complish in this section and the next

Mirror effects in pigeons can be observed by using asimple procedure in which the birds are required to dis-criminate the prior presence or absence of a sample stim-ulus In a typical experiment of this kind sample and no-sample trials are randomly intermixed and the pigeonrsquostask is to report whether or not a sample appeared earlierin the trial On sample trials the end of the intertrial in-terval (ITI) is followed by the presentation of a sample(eg the brief illumination of a keylight or the houselight)and then by the retention interval On no-sample trials bycontrast the end of the ITI is followed immediately by theonset of the nominal retention interval (ie no sample ispresented) Following the retention interval two choicestimuli are presented (eg red vs green) A response toone choice alternative is correct on sample trials and a re-sponse to the other is correct on no-sample trials Pigeonslearn to perform this task with high levels of accuracywhen the retention interval is short and manipulating theretention interval across conditions yields a mirror effect

In Wixtedrsquos (1993) Experiment 2 4 pigeons were ex-posed to an ascending series of retention intervals (050

2 6 12 and 24 sec) with each retention interval conditionin effect for 15 sessions The results are shown in Table 1The hit rate represents the probability of a correct re-sponse on sample trials (ie a correct declaration that thesample occurred) and the false alarm rate represents theprobability of an incorrect response (ie an incorrect de-claration that the sample occurred) on no-sample trialsThe data presented in boldface in Table 1 were averagedover the last 5 sessions of each condition and over the 4subjects and these data constitute the main results of theexperiment The data shown in standard typeface are fromthe first session of each new retention interval condition(save for the 050 sec condition because the birds had notyet learned the task in the 1st session of that condition)The data from the 1st session of each condition are in-structive and will be discussed in more detail after we con-sider the main findings The main results show that notsurprisingly d9 decreased as the retention interval in-creased The more important point is that as the hit rate de-creased the false alarm rate reliably increased In otherwords a mirror effect was observed just as it almost al-ways is in human recognition memory experiments

What explains the existence of a mirror effect in thiscase Figure 7 presents sense-of-prior-occurrence distri-butions for three retention interval conditions (shortmedium and long) These distributions are exactly anal-ogous to the familiarity distributions previously discussedin relation to human recognition memory and they repre-sent the birdrsquos subjective sense that a sample was pre-sented That subjective sense is assumed to vary from trialto trial on both sample and no-sample trials (according toGaussian or Gaussian-like distributions) even for a con-stant retention interval Some sense of prior occurrence isevident even on no-sample trials perhaps owing to the lin-gering influence of samples presented on earlier trials orto inherent noise in the memory system

When the retention interval is short (upper panel) thesample distribution falls far to the right because despitesome variability the subjective sense of prior occurrence

Table 1Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Between Conditions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 10 28320 (1st) 84 0820 88 14 22660 (1st) 60 1460 88 18 210

120 (1st) 67 19120 74 19 152240 (1st) 46 24240 65 28 97

NotemdashThe data were taken from Experiment 2 of Wixted (1993) Thevalues in boldface were averaged over the last five sessions of each con-dition and over the 4 subjects The values in standard typeface are fromthe first session of each new retention interval condition (except for the050 sec condition)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 11: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 299

will generally be quite high (and will fall well above thedecision criterion) Thus on the majority of sample trialsthe subject would respond correctly (yes) When the re-tention interval increases (and the sense of prior occur-rence weakens) the sample distribution gradually shiftstoward the no-sample distribution (middle panel) The no-sample distribution does not change as a function of re-tention interval because there is no memory trace estab-lished on those trials (so there is no trace that weakenswith retention interval) When the retention interval is very

long so that memory for the sample fades completely thesample and no-sample distributions will overlap so thatperformance on sample and no-sample trials will be in-distinguishable (a situation that is approached in the lowerpanel)

Because each retention interval was in effect for 15 ses-sions the birds had time to adjust the location of the deci-sion criterion in light of the reinforcement consequencesfor doing so For an equal-variance model reinforcers willbe maximized when the criterion is placed midway be-tween the two distributions Leaving a particular retentioninterval in effect for a prolonged period gives the birdstime to realize that and to adjust the decision criterion ac-cordingly By adjusting the decision criterion in such away as to maximize reinforcement the performance of thebirds exhibits a mirror effect

That the birds actually do shift the criterion in light ofthe reinforcement consequences is perhaps seen most eas-ily by considering how they behave when a retention in-terval is first increased from one condition to the next Therelevant data are presented in standard typeface in Table 1Note that each time a retention interval was increased(eg from 05 to 2 sec or from 2 to 6 sec) the hit ratedropped immediately (and precipitously) whereas thefalse alarm rate generally did not change Over the next 15sessions both the hit rate and to a lesser extent the falsealarm rate increased

What explains that pattern Presumably on the first daythat a new longer retention interval was introduced the ef-fect on the mean of the sample distribution was immediateThat is because of the longer retention interval the mem-ory trace of the sample faded to a greater extent leading toa less intense sense of prior occurrence on average Butthat was the only immediate effect The mean of the no-sample distribution did not change when the retention in-terval was increased because there was no memory traceestablished on such trials to fade away The criterion didnot change because the bird did not have enough time tolearn where the optimal placement of the decision criterionmight be in order to yield a higher number of reinforcersAs time passed though the bird learned to shift the crite-rion to the left on the sense-of-prior-occurrence axis (as isshown in Figure 7) thereby increasing both the hit rate andthe false alarm rate

The increase in the hit and the false alarm rates as afunction of training at a particular retention interval is un-mistakably asymmetric That is generally speaking thehit rate increased a lot whereas the false alarm rate in-creased only a little Part of the explanation for this is thatthe appropriate detection model for this situation is not theequal-variance model we have used throughout this articlebut an unequal-variance model (so that the no-sample dis-tribution is more variable than the sample distribution) Acriterion sweeping across a narrow sample distributionwill affect the hit rate to a much greater extent than a cri-terion sweeping across a wide no-sample distribution willaffect the false alarm rate In fact in a separate experi-

Figure 7 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulated be-tween conditions The location of the decision criterion C changes with each condition

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 12: Wixted, J. T. & Gaitan, S. (2002)

300 WIXTED AND GAITAN

ment Wixted (1993) reported an analysis of the receiveroperating characteristic that provided clear support for theunequal-variance model As was discussed in detail byStretch and Wixted (1998) the unequal-variance detectionmodel complicates likelihood ratio models that assume un-derlying Gaussian distributions (eg under such condi-tions there are two points on the sense-of-prior-occurrenceaxis where the likelihood ratio equals 10) The moststraightforward way out of this problem is to assume thatthe distributions are Gaussian-like (eg Glanzer et al1993 assume a binomial distribution) The essence of thelikelihood ratio account does not change by assuming dif-ferent forms so we will continue to use the equal-varianceGaussian model for the sake of simplicity

It might still be tempting to argue that the increasingfalse alarm rate as a function of retention interval evidentin Table 1 actually reflects the increasing strength of theno-sample distribution on the sense-of-prior-occurrenceaxis as the retention interval increased rather than a cri-terion shift An upward movement of the no-sample distri-bution would certainly increase the false alarm rate even ifthe location of the decision criterion remained fixed (al-though why it would increase in strength is not clear) How-ever evidence against this fixed-criterion interpretationis provided by studies that manipulate retention intervalwithin sessions Just as human subjects appear to be re-luctant to shift the decision criterion item by item duringthe course of a recognition test pigeons appear to be re-luctant to shift the decision criterion trial by trial withina session Thus when retention interval is manipulatedwithin sessions on a sampleno-sample task the typicalresult is that the false alarm rate remains constant but thehit rate decreases rapidly as a function of retention inter-val (Dougherty amp Wixted 1996 Gaitan amp Wixted 2000Wixted 1993) Some relevant data from Wixted are pre-sented in Table 2 and the theoretical representation of thissituation in terms of signal detection theory is shown inFigure 8 The fact that the false alarm rate remains con-stant suggests that variations in the size of the retentioninterval do not affect the properties of the no-sample dis-tribution Instead the simplest interpretation is that theno-sample distribution and the decision criterion are un-affected by the within-session retention interval manipu-lation When retention interval is manipulated betweenconditions there is no reason to assume that the propertiesof the no-sample distribution would now be affected by

that manipulation although such an explanation is tech-nically possible Instead it is much simpler (and muchmore theoretically sensible) to assume that the location ofthe decision criterion changes as a function of d9

As an aside it should be noted that pigeons are not al-ways reluctant to shift the criterion from trial to trial In astandard delayed matching-to-sample procedure for ex-ample pigeons appear to be quite willing to do just thatWhite and Cooney (1996) provided the most compellingevidence in this regard by showing that birds could learnto associate different reinforcement histories with differ-ent retention intervals (manipulated within session) when

Table 2Hit and False Alarm (FA) Rates and d9 Scores From

a SampleNo-Sample Procedure in Which Retention IntervalWas Manipulated Within Sessions

RetentionInterval (sec) Hit Rate FA Rate d9

05 94 14 26320 93 20 23260 73 17 156

120 63 16 132

NotemdashThe data were taken from Experiment 1 of Wixted (1993)

Figure 8 Hypothetical signal and noise distributions (corre-sponding to sample and no-sample trials respectively) for threeretention intervals (short medium and long) manipulatedwithin sessions The location of the decision criterion C is fixed

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 13: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 301

the scheduled relative reinforcer probabilities for the twochoice alternatives differed as a function of retention in-terval Thus although subjects are apparently reluctant toshift the criterion trial by trial on a sampleno-sample task(in the case of pigeons) or item by item in a yesno recog-nition task (in the case of humans) it is clear that this isnot an inviolable principle Although this issue is clearlyimportant it is orthogonal to the main issue at hand whichis where the information comes from that allows subjectsto behave like likelihood ratio computers whenever theydo shift the criterion

White and Wixtedrsquos (1999) Reinforcement History Model

It is important to emphasize that the decision criterionis a device used by theoreticians to represent one aspect ofa birdrsquos (or a humanrsquos) decision-making process Whiteand Wixted (1999) proposed that pigeons do not actuallyplace a decision criterion on a subjective sense-of-prior-occurrence axis even though it can be convenient to rep-resent the situation as such Instead they suggested thatgiven a particular sense of prior occurrence on a given trialthe birdrsquos behavior is governed by its history of reinforce-ment for making one choice or the other under similarconditions in the past Consider for example a particularsense of prior occurrence equal to f in Figure 9 In the pastthis value has sometimes been generated on no-sampletrials and sometimes on sample trials Usually thoughthat value of f was generated on sample trials so the prob-ability of reinforcement for choosing the sample choicealternative given f is considerably higher than the prob-ability of reinforcement for choosing the no-samplechoice al-ternative given f According to White and Wixtedrsquos modelbirds learn the relative probabilities of reinforcement foreach value of f along the sense-of-prior-occurrence axis2If for a given value of f the probability of reinforcementfor choosing the sample option is four times that for choos-ing the no-sample option then in accordance with thematching law the bird will be four times as likely to choosesample as no sample Thus instead of computing the oddsthat the current value of f was produced by a sample trial

(as opposed to a no-sample trial) the pigeon learns theodds that a reinforcer is arranged on the sample choice al-ternative (as opposed to the no-sample choice alternative)for the current value of f

Note how similar this model is to the likelihood ratiomodel described earlier Just as in the likelihood ratio ac-count the ratio of the height of the signal distribution to theratio of the noise distribution for a particular value of f isof critical importance Where the relevant distributional in-formation comes from though is not well specified in con-temporary likelihood ratio accounts White and Wixted(1999) assumed that the organism had learned the relevantreinforcer ratios on the basis of prior experience and re-sponded accordingly As will be described next a modellike this winds up making predictions that are a lot like alikelihood ratio account

Earlier we noted that if the underlying sense-of-prior-occurrence distributions are Gaussian the sample (or tar-get) distribution with mean d9 and standard deviation s isgiven by

(5)

and the no-sample (or lure) distribution with mean 0 andstandard deviation s is given by

(6)

where f now represents the sense of prior occurrence (anal-ogous to familiarity in human recognition memory tasks)on a particular trial and p( f | S) and p( f | NS) represent theprobability that a particular sense of prior occurrence willoccur given a sample (S) or no-sample (NS) trial respec-tively If birds were likelihood ratio computers we wouldassume that they know the forms of these distributions andcould compute the ratio of p( f | S) p( f | NS) on a giventrial What the pigeons instead learn according to Whiteand Wixtedrsquos (1999) account is the probability that a rein-forcer is arranged on the sample choice alternative given fwhich will be denoted as p(RS | f ) and the probability thata reinforcer is arranged on the no-sample choice alterna-tive given f which will be denoted as p(RNS | f ) Assum-ing that a sample trial is as likely to occur as a no-sampletrial the probability that a reinforcer is arranged on thesample choice alternative given f is

(7)

where rS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rS is usually 10) Similarly the probability that a rein-forcer is arranged on the no-sample choice alternativegiven f is

(8)

where rNS is the experimenter-arranged probability that acorrect choice of the sample alternative will be reinforced(rNS is also usually 10)

p R fp f

p f p fr( | )

( | )( | ) ( | )

NS NSNS

S NS=

+

p R fp f

p f p fr( | )

( | )( | ) ( | )

S SS

S NS=

+

p f e f( | ) ( ) NS = - -1

2 2

5 0 2 2

pss

p f e f d( | ) ( ) S = - - cent1

2 2

5 2 2

pss

Figure 9 Hypothetical sample and no-sample distributionsillustrating their relative heights at a particular point f on thesense-of-prior-occurrence axis

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 14: Wixted, J. T. & Gaitan, S. (2002)

302 WIXTED AND GAITAN

White and Wixtedrsquos (1999) model assumes that pi-geons learn p(RS | f )and p(RNS | f ) during the course oftraining and in light of this information they respond tothe choice alternatives in accordance with the matchinglaw If p(BS | f ) represents the probability of choosing thesample choice alternative given f and p(BNS | f ) representsthe probability of choosing the no-sample choice alterna-tive given f then

(9)

which after replacing p(RS | f ) and p(RNS | f ) with theirequivalent values from Equations 7 and 8 can be rewrit-ten as

(10)

These equations state that the behavior ratio given f isequal to the reinforcer ratio given f which is essentiallythe matching law for each value of f along the sense-of-prior-occurrence axis

If rS = rNS = 10 (ie if the probability of reinforcementfor a correct response on sample and no-sample trials re-spectively is 10) as is typically the case Equation 10 re-duces to

Note that the term on the right side of this equation is thelikelihood ratio That is after replacing p( f | S) and p( f |NS) with their equivalent values from Equations 5 and 6this equation becomes

Although the choice behavior ratio (left side of theequation) happens to equal the likelihood ratio (right sideof the equation) when rS = rNS a likelihood ratio compu-tation is not assumed to be governing the birdrsquos behaviorWhat drives the birdrsquos choice according to White andWixtedrsquos (1999) model are the learned odds that a rein-forcer is arranged on one alternative or the other not thecomputed odds that the current value of f was generatedby a sample as opposed to no sample In a typical con-current VI VI experiment pigeons are generally assumedto learn the probabilities of reinforcement associated withthe two choice alternatives and to respond in accordancewith the matching law White and Wixtedrsquos account makesexactly the same assumption except that it assumes thatthe birds learn different reinforcement probabilities asso-ciated with the two alternatives depending on what f hap-pens to be What f happens to be is determined in part bythe form of the underlying sense-of-prior-occurrence dis-tributions although the birds are not assumed to have anydirect knowledge of that

There is some point on the sense-of-prior-occurrenceaxis (ie some value of f ) that leads the bird to be indiffer-ent between the two choice alternatives because at thatpoint the experienced probability of reinforcement hap-pens to be the same for both the sample and the no-samplechoice alternatives At the indifference point p(BS | f ) =p(BNS | f ) If rS = rNS = 10 it is easily shown that the valueof f that yields indifference is d92 If p(BS | f ) = p(BNS | f )which is to say that p(BS | f ) p(BNS | f ) = 1 then accord-ing to Equation 10

Solving this equation for f provides the point on thesense-of-prior-occurrence axis that yields indifferenceSubstituting the appropriate Gaussian density functionsfor p( f | S) and p( f | NS) from Equations 5 and 6 andcanceling out the 1Iuml2ps2 terms that appear in the nu-merator and denominator yields

(assuming that s = 1 for simplicity) When solved for fthis equation yields

(11)

Note the similarity between Equation 11 and Equation 4According to Equation 11 if rS = rNS so that ln(rNSrS) =0 the indifference point occurs at d92 This familiarityvalue corresponds to the indifference point and could berepresented as f11 = d92 where the subscript 11 repre-sents the ratio of programmed reinforcer probabilities(rNSrS) This is exactly analogous to a central predictionof the likelihood ratio model according to which the cri-terion will be located midway between the target and thelure distributions even when d9 changes (thereby yieldinga mirror effect) The difference is that White and Wixtedrsquos(1999) model assumes that no actual criterion is placed ona decision axis Instead the organism is responding in ac-cordance with the matching law on the basis of its priorhistory of reinforcement for choosing the sample and no-sample alternatives for particular values of f In this casethe pigeon has learned that when f = d92 the probabilityof reinforcement is the same for the sample and the no-sample choice alternatives For any value greater than d92the odds favor a reinforcerrsquos being available for choosingthe sample alternative (because that is the way it has beenin the past) so the bird will no longer be indifferent Forany value less than d92 the odds favor a reinforcerrsquos beingavailable for choosing the no-sample alternative so thebird will tend to prefer that alternative Although rein-forcement history and likelihood ratio accounts differ fun-damentally the predictions of the two accounts are thesamemdashnamely that a mirror effect should be observed

Again although we assume that the indifference pointrepresents the learned value of f associated with equal re-

f d

r

r

d= cent +

aeligegraveccedil

oumloslashdivide

cent2

ln

NS

S

15

5 0

2

2= aelig

egraveccediloumloslashdivideeacute

eumlecirc

- - cent

- -

r

re

e

f d

f

S

NS

( )

( )

1 = aeligegraveccedil

oumloslashdivideeacuteeumlecirc

r

rp f

p fS

NS

S

NS( | )

( | )

p B f

p B f

e

e

f d

f

( | )

( | )

( )

( )

S

NS

=

eacute

eumlecirc

eacute

eumlecirc

- - cent

- -

1

2

1

2

25

2

5 0

2 2

2

ps

ps

s

s

p B f

p B f

p f

p f

( | )

( | )( | )

( | )S

NS

SNS

=

p B f

p B f

r

r

p f

p f

( | )

( | )( | )

( | )S

NS

S

NS

SNS

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )S

NS

S

NS

=

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 15: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 303

inforcement probabilities this is not to say that the situa-tion cannot be represented with a criterion placed on a de-cision axis as in Figure 7 What the criterion representsthough is not a value that the organism has calculated onthe basis of its inherent knowledge of the situation or avalue that the bird has decided to use as a criterion valueagainst which to evaluate all trial-by-trial sense-of-prior-occurrence values Instead the criterion simply depictsthe point on the sense-of-prior-occurrence axis where ac-cording to the subjectrsquos learning history the probability ofreinforcement for choosing the sample alternative is thesame as the probability of reinforcement for choosing theno-sample alternative

The similarity between the standard likelihood ratioaccount and the reinforcement history account proposedby White and Wixted (1999) extends beyond the mirroreffect In fact if the probabilities of reinforcement for cor-rect responses on sample and no-sample trials differ (egif rNS gt rS) the effect of that manipulation on the indif-ference point along the sense-of-prior-occurrence axis isexactly analogous to the effect of changes in d9 on thelocation of the confidence criteria discussed earlier inconnection with studies of human recognition memoryEarlier we considered an example in which the high-confident old criterion was placed on the decision axis atthe point where the odds that the test item was drawn fromthe target distribution was 10 to 1 in favor (and the high-confident new criterion was placed where the odds were10 to 1 against) What we showed is that the distance be-tween these two extreme criteria should increase (ie theconfidence criteria should fan out as in the lower panelof Figure 5) as d9 decreases and we also showed that thedata qualitatively correspond to this prediction (as is shownin Figure 6) The same story emerges from the reinforce-ment history account proposed by White and Wixted Con-sider for example where the choice indifference pointwould fall on the sense-of-prior-occurrence axis if theprobability of reinforcement for a correct no-sample re-sponse was 10 times that of a correct sample response (egrNS = 10 and rS = 1) Because reinforcers are so likely tobe arranged on the no-sample choice alternative the pi-geon should demand an especially high sense of prior oc-currence that the sample occurred before actually choosingthe sample choice alternative In other words according toone way of thinking the bird should have high confidencethat the sample really did occur before choosing that alter-native How high should this indifference point be on thesense-of-prior-occurrence axis when rNS = 10rS Substi-tuting 10 for rNSrS in Equation 11 and solving for f yields

In other words f101 (the indifference point when rNS =10rS) occurs at a point higher than d92 but by an amountthat varies with d9 The more general version of this equa-

tion is f101 = d92 + k1d9 where k1 is the constant rNSrSSimilarly if rS = 10rNS then

The more general version of this equation is f110 = d92 +k2d9 where k2 is the constant rNSrS This model predictsthat the distance between f101 and f110 on the sense-of-prior-occurrence axis will be inversely related to d9 Thatdistance (ie f101 minus f110) is given by

Thus as d9 becomes very large f101 and f110 should con-verge to the point that they coincide on the evidence axis(ie the distance between them should decrease to 0) Bycontrast as d9 approaches 0 those two indifference pointsshould move infinitely far apart

All of this is just another way of saying that the pigeonrsquossensitivity to variations in relative reinforcer probabilitiesshould be low when d9 is large and high when d9 is smallThis was actually the prediction tested by the experimentsconducted by White and Wixted (1999) When d9 washigh (owing to a short retention interval) variations in rel-ative reinforcer probabilities (ie variations in the ratio ofrNS to rS) had little effect on behavior When d9 was low(owing to a long retention interval) identical variations inrelative reinforcer probabilities had a large effect on be-havior In that sense the birds were behaving like likeli-hood ratio computers Because White and Wixted knewthe reinforcer probabilities they were able to plot relativechoice probabilities as a function of relative reinforcerprobabilities to show that sensitivity changed in the pre-dicted manner as d9 decreased However they could haveinstead computed indifference points for each reinforcerratio condition and the data would have shown that thesepoints fan out on the sense-of-prior-occurrence axis as d9decreased (because that is just another way of saying thatsensitivity to reinforcement increased as d9 decreased)The pattern is the same as that observed for humans mak-ing confidence judgments but the connection to the or-ganismrsquos reinforcement history is transparent only for thepigeons whose reinforcement history we had control over

Why Do Humans Behave Like Likelihood Ratio Computers

These considerations offer a new way to understand whyhumans tend to behave like likelihood ratio computers Aswe considered earlier humans tend to exhibit a mirror ef-fect in recognition memory tasks and more generally theyadjust their confidence criteria across conditions in a man-ner predicted by likelihood ratio models Let us consider

f f d k

dd k

d

k k

d

10 1 1 101 2

1 2

2 2

- = cent +cent

aeligegraveccedil

oumloslashdivide

- cent +cent

aeligegraveccedil

oumloslashdivide

= -cent

f dd

dd

1 10 20 10

22 30

ln( )

= cent +cent

= cent -cent

f dd

dd

10 1 210

22 30

ln( )

= cent +cent

= cent +cent

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 16: Wixted, J. T. & Gaitan, S. (2002)

304 WIXTED AND GAITAN

again why a subject might provide a high-confident old re-sponse when an item is so familiar that the odds are 10 to1 in favor of its having been drawn from the target distrib-ution From the likelihood ratio point of view this occursbecause the subject computes the relative heights of the tar-get and lure distributions on the basis of knowledge of theshapes of the underlying distributions If

a high-confidentold response is given According to the re-inforcement history account by contrast a high-confidentold response is given when for the current value of f theodds that an old response would be correct on the basis ofprior feedback is 10 to 1 in favor

where p (ROld | f ) represents the past probability thatgiven this value of f socially reinforcing feedback for anold response occurred and p(RNew | f ) represents the pastprobability that given this value of f socially reinforc-ing feedback for a new response occurred If humansobey the matching law then

(12)

which is exactly analogous to Equation 9 According tothis equation a high-confidentold response would not al-ways occur even if the reinforcer ratio on the right side ofthe equation were equal to 10 because the choice rule in-volves matching not maximizing Still such a responsewould be 10 times as likely as a (low-confident) new re-sponse If humans maximize rather than match theywould always respond old when the right side of the equa-tion exceeds 1 (and would always respond old with highconfidence when it exceeds say 10) The essence of ourmodel does not change whether we assume that humansmatch or maximize

By using exactly the same reasoning that convertedEquation 9 into Equation 10 Equation 12 becomes

(13)

Thus a high-confident old response is likely to occurwhen for example

where rOld and rNew represent the probabilities that rein-forcing feedback occurred for correct old and new re-sponses respectively These values are exactly analogousto rS and rNS the experimenter-arranged probabilities ofreinforcement for correct sample and no-sample choicesThe values of rS and rNS in a pigeon experiment are con-trolled by the experimenter and are usually set to 10 Bycontrast we have no way of knowing if in life rOld =

rNew Assuming for the sake of simplicity that they areequal the mathematics of the reinforcement history ac-count becomes equivalent to the mathematics of the likeli-hood ratio account Conceptually though the two accountsare quite different The learning model assumes that ourhuman subjects like our pigeons have learned the relevantreinforcer ratios associated with each value of f along thefamiliarity axis Whereas our birds would be 10 times aslikely to choose say the sample alternative over the no-sample alternative under these conditions our humanswould be 10 times as likely to say old as new and if theydid say old they would do so with high confidence (know-ing full well that the probability of reinforcing social feed-back would be high because that is the way it has been inthe past) In essence the math of the reinforcement historyaccount reflects what experience has taught the subjectabout reinforcement outcomes for each value of f

For pigeons we needed to adjust the scheduled rein-forcement probabilities in order to move the indifferencepoint on the sense-of-prior-occurrence axis to the rela-tively high value at which the experienced probabilities ofreinforcement for the two choice alternatives were equalThat is because we cannot easily ask the pigeon for a con-fidence rating we instead arrange the reinforcement out-come probabilities in such a way that the bird would bereluctant to choose the sample alternative unless it werehighly confident that the sample was actually presentedon the current trial Thus for example if rNS = 10rS thebird would generally not be inclined to choose the samplealternative because reinforcers are so much more likely tobe available on the no-sample choice alternative Indeedthe indifference point on the sense-of-prior-occurrenceaxis under these conditions occurs not at d92 but at ahigher pointmdashnamely d92 + 230d9 For humans weusually do not alter the relative probability of reinforce-ment but their confidence ratings can be used to tell us(theoretically) where on the familiarity axis the odds ofprior reinforcement for an old response were high In thissense a typical human recognition memory experiment isessentially a probe test revealing the effects of prior learn-ing A subject expresses high confidence in an old deci-sion when the familiarity of a test item falls at a pointwhere in the past the probability of reinforcement for anold response was especially high (eg 10 to 1 in favor)

The reinforcement learning model becomes identical tothe likelihood ratio model if we assume that organisms donot learn specific f reg reinforcement outcome relationsbut instead that they learn the mathematical forms of theunderlying distributions and make the relevant familiarity-specific reinforcer ratio computations (such as a trainedneural network might do) Even if this assumption weremade however the interesting part of the explanation ofwhy humans behave like likelihood ratio computers lies inthe training history To take one analogy if a dog is trainedto catch a Frisbee while running on its hind legs and wink-ing one eye we could say that the dogrsquos neural network isperforming miraculous feats But the neural network is justdoing what it was trained to do (and it would be doing

r

r

p f

p fOld

New

target

lure

aeligegraveccedil

oumloslashdivideeacuteeumlecirc

gt( | )

( | )10

p B f

p B f

r

r

p f

p f

( | )

( | )( | )( | )

Old

New

Old

New

targetlure

= aeligegraveccedil

oumloslashdivideeacuteeumlecirc

p B f

p B f

p R f

p R f

( | )

( | )

( | )

( | )Old

New

Old

New

=

p R f

p R f

( | )

( | )Old

New

gt 10

p f

p f( |( |

target)lure)

gt 10

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)

Page 17: Wixted, J. T. & Gaitan, S. (2002)

REINFORCEMENT HISTORY SURROGATES 305

something else if it had been trained in a different way) Thereal explanation for why it is that the dog behaves in suchan interesting way lies in the specifics of the dogrsquos traininghistory Our argument is that the same considerations applyto humans expressing recognition memory confidencejudgments and the best way to gain insight into this pro-cess may be to study the effects of reinforcement on thememory behavior of nonhuman subjects

ConclusionBoth humans and pigeons can be shown to behave in

accordance with likelihood ratio models that appear to re-quire inherent knowledge of the underlying familiaritydistributions Prevailing explanations for such behaviortend to focus on the computational abilities of the organ-ismrsquos neural network without detailing the origins of therelevant distributional information A neural networkmay indeed perform such computations in just the man-ner suggested by likelihood ratio models but our point isthat the network does so because of the way it was trainedby life or (in the case of pigeons) by laboratory experi-ence The neural network would presumably perform in adifferent way had the subjectrsquos reinforcement history beensuch that for example incorrect high-confident old re-sponses were reinforced with higher probability than cor-rect ones In reality we assume that socially reinforcingfeedback is much more likely for correct than for incorrecthigh-confident old responses

Abundant research in recent years has suggested thatthe behavior of humans working on signal detection tasksis governed in lawful ways by reinforcement outcomes(eg Erev 1998 Johnstone amp Alsop 1999 2000) It wouldbe surprising indeed if only laboratory-based reinforcersaffected their behavior in this way Indeed our central as-sumption is that humans are exquisitely sensitive to rein-forcers that are arranged by life experiences From thispoint of view the interesting part of the story the part thatexplains why subjects (including pigeons) appear to behavelike likelihood ratio computers lies not in the inherentproperties of the subjectrsquos neural network but in the sub-jectrsquos history of reinforcement The behavior of the neuralnetwork (and the behavior of the subject) reflects that his-tory and in that sense Skinner (1977) was probably rightwhen he asserted that ldquothe mental apparatus studied bycognitive psychology is simply a rather crude version ofcontingencies of reinforcement and their effectsrdquo (p 9) Ifso much of what is interesting about human memory willbe illuminated only by studying animals whose reinforce-ment history can be studied in a much more direct way

REFERENCES

Alsop B (1998) Receiver operating characteristics from nonhuman an-imals Some implications and directions for research with humans Psy-chonomic Bulletin amp Review 5 239-252

Davison M C amp Tustin R D (1978) The relation between the gen-eralized matching law and signal detection theory Journal of the Ex-perimental Analysis of Behavior 29 331-336

Dougherty D H amp Wixted J T (1996) Detecting a nonevent De-layed presence-vs-absence discrimination in pigeons Journal of theExperimental Analysis of Behavior 65 81-92

Erev I (1998) Signal detection by human observers A cutoff rein-forcement learning model of categorization decisions under uncer-tainty Psychological Review 105 280-298

Gaitan S C amp Wixted J T (2000) The role of ldquonothingrdquo in mem-ory for event duration in pigeons Animal Learning amp Behavior 28147-161

Glanzer M amp Adams J K (1990) The mirror effect in recognitionmemory Data and theory Journal of Experimental Psychology Learn-ing Memory amp Cognition 16 5-16

Glanzer M Adams J K Iverson G J amp Kim K (1993) The reg-ularities of recognition memory Psychological Review 100 546-567

Green D M amp Swets J A (1966) Signal detection theory and psycho-physics New York Wiley

Johnstone V amp Alsop B (1999) Stimulus presentation ratios and theoutcomes for correct responses in signal-detection procedures Journalof the Experimental Analysis of Behavior 72 1-20

Johnstone V amp Alsop B (2000) Reinforcer control and human signal-detection performance Journal of the Experimental Analysis of Behav-ior 73 275-290

Macmillan N A amp Creelman C D (1991) Detection theory A userrsquosguide New York Cambridge University Press

McClelland J L amp Chappell M (1998) Familiarity breeds differ-entiation A subjective-likelihood approach to the effects of experiencein recognition memory Psychological Review 105 734-760

Morrell H E R Gaitan C amp Wixted J T (2002) On the natureof the decision axis in signal-detection-based models of recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 28 1095-1110

Shiffrin R M amp Steyvers M (1997) A model for recognition mem-ory REMmdashretrieving effectively from memory Psychonomic Bulletinamp Review 4 145-166

Skinner B F (1953) Science and human behavior New YorkMacmillan

Skinner B F (1977) Why I am not a cognitive psychologist Behav-iorism 5 1-10

Stretch V amp Wixted J T (1998a) Decision rules for recognitionmemory confidence judgments Journal of Experimental PsychologyLearning Memory amp Cognition 24 1397-1410

Stretch V amp Wixted J T (1998b) On the difference betweenstrength-based and frequency-based mirror effects in recognitionmemory Journal of Experimental Psychology Learning Memory ampCognition 24 1379-1396

White K G amp Cooney E B (1996) Consequences of rememberingIndependence of performance at different retention levels Journal ofExperimental Psychology Animal Behavior Processes 22 51-59

White K G amp Wixted J T (1999) Psychophysics of rememberingJournal of the Experimental Analysis of Behavior 71 91-113

Wixted J T (1993) A signal detection analysis of memory for nonoc-currence in pigeons Journal of Experimental Psychology Animal Be-havior Processes 19 400-411

NOTES

1 A potentially confusing issue is that the symbol C is sometimes usedas a measure of response bias and its value is equal to the criterionrsquos dis-tance from the point of intersection between the signal and the noise dis-tributions By contrast we use C to denote the location of the criterion rel-ative to the mean of the noise distribution

2 Actually the probability that a continuous random variable assumesa particular value f is zero Thus in what follows we will actually as-sume that the value in question is some small interval on the familiarityaxis that does have a certain probability of occurrence

(Manuscript received November 13 2001revision accepted for publication June 13 2002)