information-accumulation theory of speeded categorization · 2020. 1. 22. ·...

34
Psychological Review Copyright 2000 by the American Psychological Association, Inc. 2000, Vol. 107, No. 2, 227-260 0033-295X/00/$5.00 DOI: 10.1037//0033-295X.107.2.227 Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A process model of perceptual categorization is presented, in which it is assumed that the earliest stages of categorization involve gradual accumulation of information about object features. The model provides a joint account of categorization choice proportions and response times by assuming that the probability that the information-accumulation process stops at a given time after stimulus presentation is a function of the stimulus information that has been acquired. The model provides an accurate account of categorization response times for integral-dimension stimuli and for separable-dimension stimuli, and it also explains effects of response deadlines and exemplar frequency. The time course of perceptual categorization has recently be- come the topic of much research (e.g., Ashby, Boynton, & Lee, 1994; Lamberts, 1995, 1998; Lamberts & Freeman, 1999a, 1999b; Nosofsky & Alfonso-Reese, 1999; Nosofsky & Palmeri, 1997a, 1997b). Whereas previous research on formal modeling of cate- gorzation concentrated primarily on the prediction of choice pro- portions, attention has become focused on the relation between processing time and categorization and on the prediction of cate- gorization response times (RTs). In this article, I present a formal theory of categorization that alms to predict RTs and choice proportions in perceptual categorization tasks. Perceptual catego- rization is defined as the process of assigning a visually presented object to a category and should be distinguished from categoriza- tion of other types of stimuli, such as verbal descriptions of people or situations. Current formal theories of perceptual categorization can be divided into four groups. The first group is that of exemplar models, which assume that categorization of an object depends on the similarity of that object to instances in memory (e.g., Estes, 1994; Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1986). Second, there are decision-bound models, which are based on the multidimensional generalization of classical signal-detection the- ory (Ashby & Lee, 1991; Ashby & Maddox, 1993). According to these models, stimuli correspond to points in a multidimensional space. The perceptual representations of stimuli are assumed to be variable from trial to trial because of intrinsic noise in the percep- tual system, For categorization, people are assumed to establish linear or nonlinear category decision bounds in the multidimen- sional stimulus space. Categorization depends on the position of the stimulus representation on a given trial relative to the decision bounds. Third are the models in which category decisions are The research in this article was supported by a research grant from the Economic and Social Research Council. NoeUie Brockdorff, Nick Chater, Evan Heit, Richard Freeman, Robert Nosofsky, Thomas Palmeri, and Dave Peebles provided help and feedback in various stages of the research. I especially thank Steven Chong for running the experiments. Correspondence concerning this article should be addressed to Koen Lamberts, Department of Psychology, University of Warwick, Coventry CV4 7AL, United Kingdom. Electronic mall may be sent to k.lamberts@ warwick.ac.uk. based on the application of formal rules (e.g., Nosofsky, Palmer, & McKinley, 1994). Finally, various connectionist models of categorization have been proposed (e.g., Gluck & Bower, 1988), which attempt to explain categorization in terms of associative links between input information and response alternatives. Models from each of these four classes have been successful in predicting choice proportions in a variety of categorization tasks. Thus far, only the exemplar framework and the decision-bound model have been used to predict categorization RTs. Understanding the time course of categorization not only is of intrinsic interest but also may have profound implications for theorizing about a variety of other cognitive tasks. For instance, there are close theoretical links between categorization and recognition memory (Estes, 1994; Nosofsky, 1991), between categorization and attention (Bundesen, 1990), and between categorization and object recognition (Logothetis & Sheinberg, 1996). Many cognitive processes involve categorization in some form, and a strong process theory of perceptual catego- rization could form an essential building block for a general theory of cognition. Although systematic modeling of categorization RTs has been carried out only recently, RTs often have been used in the past as a source of information about categorization processes (e.g., Joli- coeur, Gluck, & Kosslyn, 1984; Lassaline, Wisniewski, & Medin, 1992; Rosch, 1973). In these early studies, the focus was usually on RTs in different types of categorization tasks or for different types of stimuli, without attempts to model details of RT differ- ences between individual stimuli. Recent research has demon- strated that formal theories of categorization are now sufficiently powerful to cope with an extension to RTs, and the research that I report in this article aims to contribute further to this development. This article is organized as follows. First, I discuss two existing models of categorization RT--one based on the exemplar frame- work and one based on decision-bound theory. Next, I introduce the extended generalized context model for RTs (EGCM-RT) as an alternative theory of categorization RT. I then provide an overview of existing categorization RT data and discuss applications of the EGCM-RT to these data. Finally, I present three experiments that provide further support for the EGCM-RT. 227

Upload: others

Post on 22-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

Psychological Review Copyright 2000 by the American Psychological Association, Inc. 2000, Vol. 107, No. 2, 227-260 0033-295X/00/$5.00 DOI: 10.1037//0033-295X.107.2.227

Information-Accumulation Theory of Speeded Categorization

Koen Lamberts University of Warwick

A process model of perceptual categorization is presented, in which it is assumed that the earliest stages of categorization involve gradual accumulation of information about object features. The model provides a joint account of categorization choice proportions and response times by assuming that the probability that the information-accumulation process stops at a given time after stimulus presentation is a function of the stimulus information that has been acquired. The model provides an accurate account of categorization response times for integral-dimension stimuli and for separable-dimension stimuli, and it also explains effects of response deadlines and exemplar frequency.

The time course of perceptual categorization has recently be- come the topic of much research (e.g., Ashby, Boynton, & Lee, 1994; Lamberts, 1995, 1998; Lamberts & Freeman, 1999a, 1999b; Nosofsky & Alfonso-Reese, 1999; Nosofsky & Palmeri, 1997a, 1997b). Whereas previous research on formal modeling of cate- gorzation concentrated primarily on the prediction of choice pro- portions, attention has become focused on the relation between processing time and categorization and on the prediction of cate- gorization response times (RTs). In this article, I present a formal theory of categorization that alms to predict RTs and choice proportions in perceptual categorization tasks. Perceptual catego- rization is defined as the process of assigning a visually presented object to a category and should be distinguished from categoriza- tion of other types of stimuli, such as verbal descriptions of people or situations. Current formal theories of perceptual categorization can be divided into four groups. The first group is that of exemplar models, which assume that categorization of an object depends on the similarity of that object to instances in memory (e.g., Estes, 1994; Kruschke, 1992; Medin & Schaffer, 1978; Nosofsky, 1986). Second, there are decision-bound models, which are based on the multidimensional generalization of classical signal-detection the- ory (Ashby & Lee, 1991; Ashby & Maddox, 1993). According to these models, stimuli correspond to points in a multidimensional space. The perceptual representations of stimuli are assumed to be variable from trial to trial because of intrinsic noise in the percep- tual system, For categorization, people are assumed to establish linear or nonlinear category decision bounds in the multidimen- sional stimulus space. Categorization depends on the position of the stimulus representation on a given trial relative to the decision bounds. Third are the models in which category decisions are

The research in this article was supported by a research grant from the Economic and Social Research Council.

NoeUie Brockdorff, Nick Chater, Evan Heit, Richard Freeman, Robert Nosofsky, Thomas Palmeri, and Dave Peebles provided help and feedback in various stages of the research. I especially thank Steven Chong for running the experiments.

Correspondence concerning this article should be addressed to Koen Lamberts, Department of Psychology, University of Warwick, Coventry CV4 7AL, United Kingdom. Electronic mall may be sent to k.lamberts@ warwick.ac.uk.

based on the application of formal rules (e.g., Nosofsky, Palmer, & McKinley, 1994). Finally, various connectionist models of categorization have been proposed (e.g., Gluck & Bower, 1988), which attempt to explain categorization in terms of associative links between input information and response alternatives. Models from each of these four classes have been successful in predicting choice proportions in a variety of categorization tasks. Thus far, only the exemplar framework and the decision-bound model have been used to predict categorization RTs.

Understanding the time course of categorization not only is of intrinsic interest but also may have profound implications for theorizing about a variety of other cognitive tasks. For instance, there are close theoretical links between categorization and recognition memory (Estes, 1994; Nosofsky, 1991), between categorization and attention (Bundesen, 1990), and between categorization and object recognition (Logothetis & Sheinberg, 1996). Many cognitive processes involve categorization in some form, and a strong process theory of perceptual catego- rization could form an essential building block for a general theory of cognition.

Although systematic modeling of categorization RTs has been carried out only recently, RTs often have been used in the past as a source of information about categorization processes (e.g., Joli- coeur, Gluck, & Kosslyn, 1984; Lassaline, Wisniewski, & Medin, 1992; Rosch, 1973). In these early studies, the focus was usually on RTs in different types of categorization tasks or for different types of stimuli, without attempts to model details of RT differ- ences between individual stimuli. Recent research has demon- strated that formal theories of categorization are now sufficiently powerful to cope with an extension to RTs, and the research that I report in this article aims to contribute further to this development.

This article is organized as follows. First, I discuss two existing models of categorization R T - - o n e based on the exemplar frame- work and one based on decision-bound theory. Next, I introduce the extended generalized context model for RTs (EGCM-RT) as an alternative theory of categorization RT. I then provide an overview of existing categorization RT data and discuss applications of the EGCM-RT to these data. Finally, I present three experiments that provide further support for the EGCM-RT.

227

Page 2: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

228 LAMBERTS

The E x e m p l a r - B a s e d R a n d o m - W a l k M o d e l

The exemplar-based random-walk model (EBRW; Nosofsky & Palmeri, 1997b) is an exemplar model of categorization. It as- sumes that instances are stored in memory during category learn- ing and that category decisions are based on retrieval of stored exemplars. When a test item is presented, the exemplars in mem- ory are activated. The level of activation of each exemplar corre- sponds to the product of the strength of the exemplar (which reflects its recency and presentation frequency) and its similarity to the stimulus:

aij = My × s o, (1)

in which aiy is the activation of exemplar j on presentation of stimulus i, My is the strength of exemplar j , and s 0 is the similarity of the stimulus and the exemplar. Similarity is defined as in Nosofsky's (1986) generalized context model (GCM):

P

Sij = exp[ - - c ( ~ W p l X i p - - Xjpl")q"r], (2) p = l

in which siy is the similarity between stimulus i and stored exem- plar j , c is a generalization value, wp is the weight of dimension p (0 ----- w --< 1, ~w = 1), andxip andxjp are the values of the stimulus and the stored exemplar on dimension p. The generalization value (c) determines the steepness of the function that relates similarity to the number of discrepancies between the representations (see Lamberts, 1994). The exponent r defines the type of metric (city block if r = 1 or euclidean if r = 2), and q determines the relation between distance and similarity (exponential or Gaussian; see Nosofsky, 1986).

Once the exemplars are activated, they start to race against each other for retrieval. The time required to complete the race for a given exemplar is an exponentially distributed random variable, with a rate parameter that is proportional to the exemplar's acti- vation. The probability density that an exemplar j with activation a o completes the race at time t is

f(t) = a/j × e x p ( - a i j X t). (3)

The probability that a particular exemplar wins the race and is retrieved thus depends on its activation value (and thereby on its strength and on its similarity to the test stimulus) and on the activation values of the other exemplars in the race.

As soon as an exemplar is retrieved, a new race is initiated, and the next exemplar will be retrieved. The retrieval of exemplars drives a random-walk process. This process implies that a random- walk pointer is maintained, which moves between two decision barriers (one for Category A, called +A, and one for Category B, called - B ) . When an exemplar is retrieved, the pointer moves a certain amount either in the direction of Category A or in the direction of Category B, depending on the category membership of the exemplar. The size of each step can be made to depend on the activation of the retrieved exemplar, but Nosofsky and Palmeri (1997b) assumed that the pointer moves with constant increments. Exemplars are retrieved from memory until the pointer exceeds the criterion value for one of the two categories, after which the corresponding response is initiated. Categorization RTs depend on the number of exemplars that need to be retrieved before the

random-walk pointer crosses a barrier and on the time needed for the retrieval of each exemplar. The time for each step in the random walk is given by

/'step = Ot + tw, (4)

where a is a constant term associated with each step, and tw is the time taken to retrieve the winning exemplar.

Nosofsky and Palmed (1997b) also derived analytic predictions from the EBRW. Given test item i, there is a constant probability (p~) of the random-walk counter moving in the direction of Crite- rion +A, and a probability (qg = 1 - Pi) of moving in the direction of Criterion - B . The expected number of steps for the random walk to reach + A or - B is given by

B A + B [ ! - _ (qi/pi) B .] E(Nli) - q , ~ , q, 1 - (q/p,)X+aJ, i f Pi 4= q,

E(Nli) = AB, i f p, = q,. (5)

It can further be shown that the expected duration of the entire random walk is given by

E(TJi) = E(lVJi) × [or + I/(S,A + Sin)], (6)

in which SiA and S,~ refer to the summed activations of all exemplars in Categories A and B, respectively. The probability of a Category A response given item i is

1 - (qi/Pi) B P(AIi ) - 1 - (qi/pi) A+B' i f p i :/: qi

B P(AIi ) - A + B ' i f p i = q,, (7)

with P(BIi) = 1 - P(AIi). RTs are assumed to be a linear function of random-walk duration:

aT( i ) =tres + k × E(7~i). (8)

The residual time (t~s) contains the duration of perceptual pro- cessing and response execution.

The EBRW makes three main predictions about categorization RT (see Nosofsky & Palmeri, 1997b). The first is that category decisions are fastest for stimuli that are highly similar to members from one category and highly dissimilar to members from the other category. Such stimuli will consistently retrieve exemplars from one category, which leads to a consistent random walk in one direction. Second, the EBRW predicts that categorization RTs become faster with practice. The reason for this is that more exemplars are stored in memory with practice, which results in faster winning retrieval times. A third prediction is that individual stimulus familiarity affects categorization RT. Items are familiar if they have been presented frequently or if they are highly similar to numerous other old exemplars. A highly familiar item will result in fast retrieval times and, hence, in short categorization RTs.

The EBRW is intended only as a model of the decision-making components of perceptual categorization; it does not address the time course of the perceptual processes that may precede or coincide with the decision processes. Nosofsky and Palmeri (1997b) noted explicitly that this is the reason why the EBRW applies primarily to stimuli that have integral dimensions, and they

Page 3: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 229

tested the model only with Such stimuli. The model implicitly assumes that perceptual processing of integral-dimension stimuli takes place in a single step, because such stimuli cannot or need not be analyzed into discrete components. Nosofsky and Palmeri indicated that the encoding of separable-dimension stimuli might involve serial or limited-capacity processing, which would com- plicate RT predictions. The EBRW was thus not intended to explain categorization of stimuli that consist of separable dimen- sions or otherwise require serial perceptual processing. Neverthe- less, elsewhere the model has been applied to data obtained with separable-dimension stimuli (Lamberts, 1998; Lamberts & Free- man, 1999a) in an attempt to demonstrate the importance of sequential perceptual processes for understanding the time course of categorization of such stimuli. The EBRW's failure to explain some of the results obtained with separable-dimension stimuli amplifies the argument that an account of such processes is im- portant for a general theory of perceptual categorization.

The Response Time-Distance Model

The RT-distance model (Ashby et al., 1994) is based on the decision-bound theory developed by Ashby and colleagues (e.g., Ashby & Townsend, 1986). A crucial assumption of decision- bound theory is that the perceptual effect of a stimulus can be represented as a point in a multidimensional space and that re- peated stimulus presentations do not always result in the same perceptual effect (e.g., Ashby & Gott, 1988; Ashby & Lee, 1991; Ashby & Maddox, 1992, 1993). It is further assumed that practiced observers divide the perceptual stimulus space into regions, which correspond to categories. The response on a given trial is deter- mined by the region in which the stimulus representation falls. The partition between two regions is called a "decision bound." The RT-distance model states that categorization RTs depend on the distance between the perceptual effect of a stimulus and the decision bound. Stimulus representations that are close to the bound yield slow responses, whereas RTs are fast for perceptual effects that are far from the decision bound.

The RT-distance model is formulated at a different level of explanation than the EBRW. The EBRW provides a mechanism that explains why certain stimuli are categorized faster than others. The RT-distance model, in contrast, offers a descriptive account of RT differences, without indicating in detail why these differ- ences might arise. Nevertheless, the RT-distance model provides a strong alternative for the EBRW, and it has been very successful in predicting multidimensional categorization RTs (e.g., Ashby et al., 1994; Maddox & Ashby, 1996). In this article, I focus primar- ily on comparisons between my own model and the EBRW. The reason for this choice is that these two models share many as- sumptions about the categorization process (they both are exem- plar models) and are formulated at the same level (they both provide detailed algorithms). This makes model comparisons rel- atively easy. Detailed comparisons between the EBRW and the RT-distance model can be found in Nosofsky and Palmeri (1997a, 1997b).

The Extended Generalized Context Model for Response Times

Elements of the EGCM-RT were first presented as part of an account of perceptual categorization under time pressure (when the

model was called EGCM; see Lamberts, 1995, 1998; Lamberts & Brockdorff, 1997). The original EGCM was not intended as a model of categorization RT but aimed only to explain the effects of time pressure on categorization. The discussion in this article builds extensively on this previous research. I attempt to show that the EGCM-RT is a viable alternative for the EBRW and the RT- distance model.

Like the EBRW, the EGCM-RT is an exemplar model of cate- gorization. It is assumed that exemplars are stored in memory during category learning and that subsequent category decisions are based on the similarity between the stimulus and the instances in memory. The EGCM-RT assumes that perceptual categorization involves the gradual construction of a stimulus representation through a process of information accumulation. In the model, it is further assumed that the probability that this information- accumulation process stops at a given time after stimulus presen- tation is a function of the evidence for category membership that has been collected at that time. RT differences between stimuli are explained in terms of the different amounts of information that need to be acquired before a confident decision can be made and in terms of the time course of the information-accumulation pro- cess itself.

According to the EGCM-RT, stimuli can be characterized as points in a multidimensional psychological space. Each stimulus dimension defines some aspect of the stimulus. Several different types of stimulus dimensions can be distinguished according to a number of criteria (see, e.g., Garner, 1974). For my purposes, two distinctions are most important. First, dimensions Can be classified according to the number of values they can have, leading to a distinction between binary dimensions (e.g., size, represented only as "large" vs. "small") and multivalued dimensions (e.g., size, represented in centimeters). The second distinction refers to di- mensions in relation to each other. Separable dimensions can be processed independently, whereas integral dimensions tend to be processed as unitary wholes (Garner, 1974).

A further assumption of the EGCM-RT is that the stimulus information for any given dimension consists of a number of discrete elements (e.g., Gati & Tversky, 1982). These elements are not differentiated. They can be considered as discrete units of information that are sampled or accumulated over time. The pro- portion of sampled elements is directly related to the discriminabil- ity of different values on a dimension. The more elements that are sampled for a given dimension, the more differentiated the psy- chological representations of different values on that dimension become.

The construction of a perceptual representation of a stimulus involves a process of stochastic, parallel sampling of information elements, without replacement. Element processing is assumed to be discrete: At any point in time after stimulus presentation, a given information element either has been processed ("included") or has not. It is further assumed that the hazard value of element processing is a constant for each dimension, from which it follows that the unconditional probability density that an element e from dimension x is included at time t is given by an exponential density function (see Lamberts, 1995, 1998):

f,.x(t) = qxexp(-qxt), (9)

in which qx is the inclusion rate (or processing rate) of elements from dimension x. The probability that an element e from dimen-

Page 4: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

230 LAMBERTS

sion x has been processed at or before time t after the start of perceptual processing is called the cumulative inclusion probabil- ity, i~.~(t), and is given by an exponential distribution function:

i,.x(t) = 1 - exp ( -qx t ) . (10)

The expected mean processing time for elements from dimension x equals l/qx. Thus, elements from dimensions with a high value of q are processed faster, on average, than those from dimensions with a low q value.

In parallel with the element-sampling process, and whenever a new element has been processed, the similarity of the stimulus representation to the exemplars in memory is computed or up- dated. Similarity depends on the information elements that have been processed (the set of processed elements across all dimen- sions is denoted by qb) and is defined as follows:

P

so(dP ) = exp{--c[ E u,(~o~lx,, - xjpl)r]l/r}, p = l

(11)

in which so(dP) is the similarity between stimulus i and stored exemplar j given set qb of included elements, c is a generalization value, p is an index for the dimensions (the total number of dimensions is P), up is the utility value of dimension p (0 --< u --< 1, Eu = 1), q~p is the proportion of processed elements from dimen- sion p, and x0, and xjp are the values of the stimulus and the stored exemplar on dimension p. For stimuli with separable dimensions, r is assumed to be 1, whereas r is 2 for stimuli with integral dimensions (see Nosofsky, 1986). The utility value of a dimension indicates how important that dimension is in the similarity com- putation. Dimensions that are highly diagnostic tend to have a high utility value (see Lamberts, 1995; Nosofsky, 1986). The EGCM- RT's similarity definition is a straightforward extension of the definition in Nosofsky's (1986) GCM. The main difference is that the EGCM-RT's similarity definition is dependent on the pro- cessed information elements and thereby is time-dependent as well. According to the EGCM-RT, similarity is inversely related to the overall distance between the stimulus representation and the exemplar representation in a multidimensional space, whereby the distance along any dimension is a function of the proportion of elements that have been processed for that dimension. This simi- larity definition clarifies how information elements affect the dis- criminability of different values along a dimension. As more elements are sampled, representations with different values on a given dimension will drift apart in psychological space. Therefore, the similarity of a stimulus representation to the exemplars stored in memory changes as more elements are processed over time. I chose to express similarity as a function of qb rather than time to emphasize that similarity changes in discrete steps, but an expres- sion in terms of processing time (similar to the one given in Lamberts, 1995) would be possible as well.

Whenever an element has been processed, a decision is made as to whether sufficient stimulus information has been acquired to stop sampling and initiate a response or whether more stimulus information is needed. Here, I present a stopping rule only for the case in which a decision has to be made between two alternative categories, A and B, but extension to multiple categories is not difficult. Computation of the stopping probabilities is based on

Luce's (1963) choice rule. The summed similarity of the stimulus to all exemplars from one category is divided by the summed similarity of the stimulus to all relevant exemplars in memory. If the total similarity to A exemplars is high relative to the total similarity to B exemplars, the probability that the participant will stop sampling and produce an A response is relatively high com- pared with the probability that the participant will stop sampling and produce a B response. The probability that sampling will stop (for stimulus i and set of processed elements ~ ) and that a Category A response will be given is as follows:

P(Stop & mli, alp)

ba ~ [~jsq(~) + 311 \ 0 j~CA ) = ib, ~ [~:~,j(¢) + v]} + {(1 - b~) ~ [~'~,,(¢) + ~/]}

j~CA k~Cn

L s , . (¢) + s,.(,I,) ] (12)

In this equation, j E C a refers to all exemplars j that belong to Category A, k E Cn refers to the exemplars from Category B, b A

is the response bias for Category A (0 --< b A ---~ 1) , and 'y represents noise (3, -> 0). The parameter ~s refers to the strength of exemplar j . Strength is assumed to be an increasing function of presentation frequency. The parameter 0 can have any value greater than or equal to 1. The noise parameter, % determines the absolute level of the ratio for a given set of similarity values (Estes, 1994; Nosof- sky, Kruschke, & McKinley, 1992). Unless explicitly noted oth- erwise, 3~ is assumed to be 0 in the applications of the model. S~A (qb) and S m (~) are convenient shorthand notations for the summed similarities (including exemplar weighting, noise, and bias) within the categories. The choice ratio can also be interpreted as a measure of confidence in the category membership of the stimulus. If the participant is very confident about category mem- bership, there is a high probability that sampling will stop and a response will be initiated.

Analogously, the probability that sampling will stop and a Category B response will be given is

f 10 P(Stop & Bli, qb) = LS,A(~ ~ , ( ~ ) ] (13)

The total probability that sampling will stop for a given set of sampled elements is

P(Stopl i ' alp) ---- P(Stop & Ali, qb) + P(Stop & Bli, qb)

[SiA(~)] 0 "~ [SiB((~)] 0

"~- [SiA((I) ) ..1_ SiB(~)]O ( 1 4 )

if not all elements have been sampled. If all stimulus elements have been sampled, the stopping probability is 1, and a response will be initiated immediately. A further assumption is that P(Stop[i,O) = 0, which means that at least one element will be sampled before stopping. Figure 1 illustrates the stopping proba- bilities for different values of the ratio in Equation 12, for 0 = 3.

Page 5: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 231

e O.

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0.1 0.2 0.3 0.4 0.5 0 e 0.7 0.a 0.9 1

S~l( S~ + S~)

Figure 1. Stopping probabilities in the extended generalized context model for response times as a function of the relative summed similarity (S) to two categories (A and B) for stimulus i (SiA and S,:a; see Equation 12 in the text). The values plotted are for 0 = 3.

The conditional probabilities that Category A or B will be chosen, given that sampling has stopped, are

P(Stop & A]i, ~ ) P (AlS top , i, qb) ---- P(Stop[i, qb)

S,A(a') ]o

= [ ,[s , , ( ,~)]o+ [s,~(a,)]°'[ ( [ siA((I)) ; ~ J

[SiA((I)) ] 0 -~- [SiA(I~)]0 + [SiB((I[I)]O (15)

and

P(BlSto p, i, ~ ) = P(Stop & BIi, * )

P(Stopli, ~ )

[s,B(¢,)] ° = [S,A(~)]0 + [S,B(Cp)]o. (16)

These expressions for choice probabilities are different from those in the GCM. The choice rule in the GCM is similar to the rule presented here but with the restriction that 0 equals 1 (and that "y is 0). As a result, the GCM's choice rule predicts less deterministic responding than the EGCM-RT's if 0 is greater than 1 (all else being equal). It has been shown that the GCM underpredicts the level of deterministic responding in experienced individual observ- ers (e.g., Maddox & Ashby, 1993; McKinley & Nosofsky, 1995). The GCM's standard choice rule works extremely well for pre- dicting choice proportions averaged across individuals (e.g.,

Nosofsky, 1987) but needs modification to account for individual- participant data. The EGCM-RT's choice rule is virtually identical to the choice rule in an extended version of the GCM proposed by Maddox and Ashby (1993), which fitted individual-participant data far better than the standard GCM (see McKinley & Nosofsky, 1995). Moreover, the response probability predictions of the EBRW can also be expressed in a form similar to those of the EGCM-RT (Nosofsky & Palmeri, 1997b, p. 291). Therefore, the EGCM-RT's choice rule seems to be well suited for the prediction of individual-participant data.

To summarize, the EGCM-RT assumes that (a) information elements from the stimulus dimensions are sampled stochastically in the earliest stages of categorization and (b) the relative summed similarity to exemplars from the alternative categories determines the probability that sampling is interrupted and a response is initiated. This is the only mechanism by which the model predicts RT differences between stimuli. These differences are thus as- sumed to be due only to the different durations of the element- sampling process for different stimuli. Unlike the EBRW, the EGCM-RT assumes that decision and response execution have the same average duration for all stimuli.

Mode l ing Conven t ions and Procedures

The principles outlined above define the EGCM-RT in its most general form. In modeling specific data sets, I adopt several conventions that aim to simplify the modeling effort as much as possible without sacrificing accuracy. Application of the EGCM-RT to predict RT data requires for each stimulus that (a) the probabilities of all possible courses of the element-sampling process are computed and (b) the expected time needed for each

Page 6: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

232 LAMBERTS

course is determined. The multiplication and summation of the course probabilities with the course times then yields an expected perceptual processing time, which can be used to compute a predicted RT.

The first modeling convention relates to the number of infor- mation elements that are assumed to be available for different dimensions. In this article, the EGCM-RT is applied to results obtained with two different stimulus types, if number of dimen- sions and possible dimension values are considered. Stimuli of the first kind, which are used in Experiments 1, 2, and 3, consist of four binary dimensions, the values of which are highly discrim- inable. Stimuli of the second type (which have been used in many experiments in the literature, some of which are modeled here) consist of two multivaiued dimensions, with values that are often not easily discriminable. For the first type of stimulus (with binary dimensions), I assumed that there is only one element to be sampled per dimension. This implies that dimension sampling is all-or-none, an assumption that is often made in feature-based recognition models (e.g., Townsend, Hu, & Evans, 1984). The main reason for adopting this assumption is computational tracta- bility. For stimuli that consist of a large number of dimensions, keeping track of all possible courses of element sampling is very complex if there are many elements per dimension, making pa- rameter estimation difficult. The assumption that binary dimen- sions are sampled in an all-or-none fashion has been adopted in previous research with the EGCM (see Lamberts, 1998; Lamberts & Freeman, 1999a, 1999b) and has always worked well. I have applied versions of the EGCM-RT with multiple elements per dimension to stimuli with binary dimensions as well, but the model fits were very similar to those obtained with only one element per dimension. Therefore, the all-or-none sampling assumption was maintained throughout for binary dimensions.

For stimuli of the second kind, which consist of only two multivalued dimensions, model complexity is of less concern. Although an all-or-none version of the EGCM has been applied successfully to multivalued dimensions before (Lamberts & Brockdorff, 1997), initial exploration of the EGCM-RT showed that the all-or-none assumption is almost certainly wrong for many stimuli with multivalued dimensions. Therefore, I always assumed that there were five information elements to be sampled per multivalued or continuous dimension. The choice of five as the

number of elements was fairly arbitrary, but it reflects a compro- mise between computational tractability and the need for accurate predictions.

Another modeling convention has to do with the difference between integral and separable dimensions. The perceptual pro- cessing of integral-dimension stimuli differs in many respects from that of separable-dimension stimuli (e.g., Garner, 1974; Lockhead, 1972). The dimensions of integral-dimension stimuli are not per- ceived as distinct units. Therefore, one of the characteristics of integral dimensions seems to be that information about one dimen- sion cannot be acquired without also acquiring information about the other dimension or dimensions. In the modeling, this principle is implemented by assuming that elements from different integral dimensions are sampled in a yoked manner. Whenever an element from one integral dimension is sampled, an element from the other dimension is sampled at the same time. In contrast, the element- sampling process for separable dimensions is completely indepen- dent. Sampling an element from a separable dimension does not affect the probability of sampling an element from another dimen- sion. In the following paragraphs, I first illustrate the computation of sampling course probabilities and stopping times for a stimulus that consists of three separable binary dimensions and then for a stimulus that has two separable multivalued dimensions.

Three Separable Binary Dimensions

To illustrate the computation of sampling course probabilities, I present an example of a stimulus with three binary dimensions, x, y, and z. The same principles can be applied to compute the relevant probabilities for stimuli with more (or fewer) binary dimensions. Figure 2 shows a tree diagram that contains all pos- sible courses of element sampling for the three-dimensional stim- ulus. It is assumed that only one information element is available for each dimension. The time course of element sampling is represented from the top to the bottom of the diagram. On the first step, x, y, or z can be included; the possibilities for the next step are shown in the second row of the tree and those for the third step in the third row. In total, there are 15 possible ways in which the element-sampling process can stop. Computation of the probabil- ities of these 15 events relies on the assumption that the inclusion times for each element are independent and exponentially distrib-

Figure 2. Possible courses of element sampling for a three-dimensional stimulus with binary dimensions. The elements from the three dimensions are denoted by x, y, and z.

Page 7: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 233

uted (see Equations 9 and 10). Which branch of the tree is taken at each step depends probabilistically on the inclusion rates of the different dimensions. Specifically, if there is a set of K elements available for inclusion at a given step, the probability that the element from dimension i is included first is

qi P(i) = - - (17)

K

~ q~ k = l

(this is a basic property of exponential race processes; see Townsend & Ashby, 1983). For example, the probability that x is the first of the three elements to be included on the first step of sampling equals qx/(qx + qy + qz). If processing then continues, only y and z are left for inclusion. The probability that y is included next (conditional on continuation after x) is qy/(qy + qz), and so forth. The probability that processing stops at each node in the tree is given by Equation 14; of course, the probability that processing continues after each step is 1 minus the stopping probability. The unconditional probability of each way of stopping the feature- sampling process can be computed by simply tracing the correct path down the tree and multiplying the relevant probabilities along the way. As an illustration, consider the probability of the course in which element y is processed first, element x is processed second, and processing terminates immediately after that. The unconditional probability of this event is given by

P(y "-> x ---> Stop) - qy

qy + q~ + qz × [1 - P(Stopl{y})]

qx X ~ × P(Stop[{x, y}).

Similarly, the unconditional probability that processing stops after elements z, y, and x have been processed in that order is given by

P(z---> y--> x---> S t o p ) = qz

qx + qy + qz x [1 - P(Stopl{z})]

qy × ~ × [1 - P(Stop[{z, y})]

(on the usual assumption that processing stops as soon as all elements have been included).

Once the probabilities of all courses of sampling have been determined, the expected time to complete each sampling course has to be computed. Again, the assumption that the inclusion process for each separable dimension constitutes an independent Poisson process yields some simple results. The expected time for the first element to complete processing from a given set • is given by the mean of an exponential distribution with a rate equal to the sum of the rates of the constituent Poisson processes (see Townsend & Ashby, 1983, p. 42):

1 = (18) E(rl~) ~ qj ,

j~@

in which ~" is the time to inclusion of the first element of set qb. Note that E(~t~) is the same regardless of which element is first to

be included. Therefore, if there are three dimensions, the expected time to include the first element (regardless from which dimen- sion) is

1 E('rlx, y, z) =

qx + qy + qz"

Because of the memoryless property of Poisson processes, after inclusion of the first element, the inclusion process of the remain- ing elements will be in the same state as it was at the outset of processing. Therefore, the expected time from the start of process- ing to inclusion of the second element is the sum of the time taken to process the first element and the expected time for the second element. The expected time for the second element is again given by Equation 18, with qb now being the set of remaining elements. The same rule gives the processing times of any further elements that may be processed. Thus, for instance, the total expected time to complete the course of sampling for the three-dimensional stimulus in which z, y, and x are included in this order is

1 1 1 -- t- - - - t - - - . E(time~y.x) qz + qy + qx qy + q~ qx

Finally, the expected perceptual processing time for the stimulus can be computed by multiplying the probability of each sampling course by the expected processing time for each course and sum- ming these products. The total predicted RT for the stimulus is then equal to the sum of the perceptual processing time and a residual time (trcs). The residual time consists of the sum of the duration of the (brief) initial "dead time" immediately after stim- ulus presentation, during which no elements can be included (e.g., Busey & Loftus, 1994), and the duration of the response-execution stage.

The computation of predicted choice proportions proceeds en- tirely analogously to that of RTs. The expected choice for each sampling course is determined, and the multiplication and sum- marion of the course probabilities with the choice values produces a predicted choice proportion for the stimulus.

T w o S e p a r a b l e M u l t i v a l u e d D i m e n s i o n s

Figure 3 shows an overview of the sampling courses for a stimulus with two dimensions, with two elements per dimension (normally, I assume that such stimuli have five elements per dimension, but the two-element case suffices to illustrate the computational techniques involved). Computation of the sampling course probabilities proceeds in the same way as for the previous example. If there are K dimensions, the probability that an element from a given dimension/is processed next is given by

niqi P(i) - - - ,

r

nkqk k=l

in which ni is the number of elements available for sampling from dimension i. The stopping probabilities and processing times are computed in the same way as they are for stimuli with binary dimensions, with the obvious modifications to accommodate the higher number of elements per dimension.

Page 8: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

234 LAMBERTS

E

E

E Figure 3. Possible courses of element sampling for a stimulus with two separable dimensions. It is assumed that each dimension consists of two information elements, x refers to an element from the ftrst dimension, and y refers to an element from the second dimension.

In the following sections, I explore the EGCM-RT's predictions and show how the model can account for several recently pub- lished data sets. Before doing so, I provide some independent evidence supporting the EGCM-RT's assumptions about the role of information accumulation in perceptual categorization.

Categorization Under Time Pressure

The idea that the earliest stages of perceptual categorization involve a process of feature sampling was originally proposed as an account of the effects of time pressure in categorization tasks (Lamberts, 1995, 1998; Lamberts & Brockdorff, 1997; Lamberts & Freeman, 1999a, 1999b). In these experiments, the purpose was to manipulate the duration of the perceptual processing stage in object categorization. First, this was attempted by manipulating the total time available for category responses. Lamberts (1995) re- ported three experiments in which the participants performed binary classifications under time pressure. The stimuli in those experiments were schematic drawings of faces, which varied on four binary dimensions. The experiments consisted of two stages. First, the participants were trained until they could correctly clas- sify all stimuli in the training set. Next, they classified a set of transfer stimuli. Processing time in the transfer stage was restricted by response deadlines of variable duration. There was also a control condition without a deadline. On a typical trial with a response deadline, the stimulus would appear, and the participant had between 600 and 1,600 ms (depending on the condition) to produce a response. The results showed strong deadline effects. If little time was available, responses generally became less consis- tent and tended toward chance. However, the effects of the re- sponse deadlines differed strongly between stimuli. The classifi- cation of certain stimuli was hardly affected by the deadline manipulation, whereas other stimuli yielded very different re- sponse patterns at different deadlines. The preferred category assignment of some transfer stimuli even reversed when a short deadline was imposed (the importance of this is made clear below). The EGCM (which shares the feature-sampling assumptions of the EGCM-RT but does not use the stopping rule of the EGCM-RT) provided a very accurate account of the deadline effects on choice proportions in the three experiments. The model was applied on

the assumption that the deadline restricted perceptual processing time but left response-execution time unaffected. The analyses also showed that the processing rates of stimulus features were influ- enced by their physical salience.

Although the results from the deadline experiments are compat- ible with the EGCM-RT, their interpretation is not without prob- lems. First, it is possible that the deadline manipulation induced systematic strategy differences. If the participants knew that they would have to respond very quickly, they might have processed the stimuli differently than they would have for a situation in which RT was unlimited. The second problem concerns the locus of the deadline effects. Although the data are in agreement with the predictions from the EGCM-RT's stochastic feature-sampling pro- cess, there is still a possibility that the deadlines affected the duration of decision making rather than perceptual processing.

The first problem was addressed in a series of experiments reported by Lamberts (1998), who used a different procedure for limiting processing time. Instead of predictable deadlines, unpre- dictable response signals were used to induce fast responses. This manipulation ruled out systematic strategy differences between conditions. The results from these experiments were very similar to those obtained with predictable deadlines, and the EGCM again provided an excellent account of the choice proportions in the different signal conditions.

To investigate the possibility that the deadlines or response signals interfered with decision making or response execution, rather than with perceptual processing, Lamberts and Freeman (1999b) manipulated the exposure duration of stimuli in a percep- tual categorization task, without using deadlines or response sig- nals. The pattern of results was very similar to that obtained with deadlines and response signals, suggesting that all of these manip- ulations affected the same component processes. It is not entirely clear how the EBRW would account for the results from the experiments with short exposure times. Clearly, the decision pro- cess would not be interrupted by termination of the stimulus display after 33 or 66 ms, so some sort of perceptual process seems essential in explaining the effects of exposure time on categoriza- tion. Of course, this does not imply that the same perceptual process should explain RT differences between stimuli in all

Page 9: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 235

situations (including those with unlimited exposure), but the fact that the EGCM-RT offers a coherent explanation across different experimental tasks seems an important argument in its favor.

Perhaps the most direct support for the EGCM's feature- sampling assumptions was obtained in the two experiments re- ported by Lamberts and Freeman (1999a). The rationale of the experiments was as follows: According to the EGCM, object representations are built gradually from stochastically sampled dimensions. If this process is interrupted (because available pro- cessing time is limited by a response signal or a deadline), the category decision will be based on an incomplete object represen- tation. Therefore, it should be possible to predict how people classify objects under time pressure from the way in which they classify partial objects. After initial category training, the experi- ments contained two distinct test stages. In one test stage, the participants categorized whole objects under various levels of time pressure, as in Lamberts's (1998) study. In the other test stage, they were asked to categorize partial objects (which were obtained by removing one or more components from the whole objects) without time pressure. The results from the partial-object catego- rization task were then used to predict categorization under time pressure by using the EGCM's feature-sampling mechanism. Thus, a highly selective and critical test of the feature-sampling process was obtained. The EGCM provided an excellent account of the data, with only inclusion rates and residual time as estimated parameters.

Appl ica t ions o f the Ex tended Genera l ized Contex t M o d e l

for Response T i m e s to Response T i m e Data

Simple Prototype Effects

It is a common finding that category prototypes are classified faster and more reliably than other category members (e.g., Rosch, 1973). To verify whether the EGCM-RT can account for this general result, a simple simulation was carried out. Assume that the participants in an experiment have learned the category struc- ture shown in Table 1. Categories A and B consist of four stimuli each, and each stimulus has four binary and separable dimensions. The multimodal prototype of Category A is 1111, and the proto- type of Category B is 0000 (the prototypes were not part of the training set). The EGCM-RT was applied to this stimulus set. It was assumed that all stimulus dimensions had the same utility value (0.25) and inclusion rate (arbitrarily fixed at 0.002). The discriminability parameter (c) was set to a value of 10.0, and the

Table 1 Category Structure in an Imaginary Classification Experiment

Stimulus Category Dim. 1 Dim. 2 Dim. 3 Dim. 4

1 A 1 1 1 0 2 A 1 1 0 1 3 A 1 0 1 1 4 A 0 1 1 1 5 B 0 0 0 1 6 B 0 0 1 0 7 B 0 1 0 0 8 B 1 0 0 0

Note. Dim. = dimension.

Table 2 Choice Proportions (P) and Response Times (RTs) Predicted by the Extended Generalized Context Model for RTs for Training Stimulus and Prototype as a Function of 0

Training stimulus Prototype

0 RT P(A) RT P(A)

1 625 .61 625 .71 3 848 .85 763 .98 5 1,008 .94 838 .99

10 1,218 .99 958 1.00

Note. It is assumed that the training stimulus is a member of Category A and that the prototype is from Category A as well. RTs are expressed in milliseconds.

residual time was assumed to be 500 ms. Table 2 shows the choice proportions and RTs predicted by the EGCM-RT for a training stimulus from Category A and the unseen Category A prototype, for different values of 0. The values show a clear prototype superiority effect. The unseen prototype is always classified at least as fast as the training stimulus, and usually faster. At the same time, the prototype is also classified more accurately than the training stimulus. The reason for the RT difference is that, on average, fewer features will be sampled from the prototype than from the training stimuli, and this will produce faster responses. Of course, the predicted prototype effect depends on the parameter settings within the model, but the type of effect shown in Table 2 occurs across a wide range of plausible parameter values.

The data in Table 2 also indicate how the EGCM-RT predicts a positive correlation between speed and accuracy across stimuli (which means that stimuli that yield fast RTs tend to yield accurate responses). The model shares this general principle with the R T - distance model and the EBRW. However, the strength of the correlation depends on a variety of other factors, such as category structure, stimulus familiarity, utility values, and inclusion rates.

Reaction Times in Probabilistic CategoriZation

Ashby et al. (1994; see also Ashby & Maddox, 1994) carded out three categorization experiments with two-dimensional stimuli that varied continuously along each dimension. The stimuli were two line segments of varying length that were joined at one corner, rectangles that varied in width and height, or semicircles of vary- ing size with a radial line of varying orientation. Two categories were used, and each category was defined by a bivariate normal distribution. On each trial of the experiment, a stimulus was sampled at random from one of the two categories. The partici- pants then assigned the stimulus to one of the categories and received immediate corrective feedback. The participants were instructed to classify the stimuli as quickly as possible. After an initial practice stage, the participants received 300 experimental trials. Ashby et al. manipulated the shape of the bivariate normal distributions and the amount of overlap between the categories. The ellipses and the circles in Figure 4 represent equal-likelihood contours of the bivariate normal category distributions, as they occurred in the three conditions of Ashby et al. 's Experiment 1. The optimal decision bounds are shown as diagonal lines. The RT-distance hypothesis assumes that participants categorize a

Page 10: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

236 LAMBERTS

)

Figure 4. Contours of equal likelihood from the low-, medium-, and high-overlap conditions tested by Ashby et al. (1994, Experiment 1). f(A) and f(B) refer to the probability density of Category A and Category B, respectively. The diagonal lines are optimal decision bounds. From "Categorization Response Time With Multidimensional Stimuli," by F. G. Ashby, G. Boynton, and W. W. Lee, 1994, Perception & Psychophysics, 55, p. 15. Copyright 1994 by the Psychonomic Society. Adapted with permission.

stimulus by determining its location in the stimulus space relative to the decision bound. For the categories shown in Figure 4, the participants would give a Category A response if the stimulus representation fell in the region above and to the left of the decision bound. The optimal decision bound was the same in each overlap condition. Because the category distributions overlapped, it was impossible for the participants to achieve perfect perfor- mance, even if they responded according to the optimal decision bound.

The most important result from Ashby et al.'s (1994) experi- ments was a moderately negative correlation between categoriza- tion RT and the (euclidean) distance of a stimulus from the decision bound that separated the categories, as predicted by the RT-distance hypothesis. Stimuli that were located far from the decision bound were classified consistently faster than stimuli that were located close to the decision bound. Moreover, there was no meaningful correlation between RT and stimulus familiarity, which was defined as the summed similarity of a stimulus to all stimuli in the experiment, regardless of their category membership.

To verify if the EGCM-RT is consistent with Ashby et al.'s (1994) results, I applied the model to the three overlap conditions. Because the stimulus dimensions were continuous, five informa- tion elements were assumed to be available for each dimension. The dimensions were assumed to be separable in the simulations. The dimensions of the rectangular stimuli might have been integral (see Ashby et al., 1994), but the other stimuli probably had separable dimensions. The separable-integral distinction is not very important here anyway because separate simulations (which I do not report) assuming integral dimensions yielded very similar results to those that assumed separable dimensions. Distances in the stimulus space were assumed to correspond to a city-block metric.

For each condition (low, medium, and high overlap), a random sample of 150 stimuli from each category was produced. It was assumed that these stimuli constituted the training set. The ex- pected RT for each stimulus was computed by applying the EGCM-RT, with the following parameter values: both utility val- ues equal to 0.5, both inclusion rates equal to 0.01, c -- 0.7, 0 = 10, and tr~ s = 0. Figure 5 shows the relation between euclidean distance from the optimal decision bound and perceptual processing time predicted by the EGCM-RT for the 300 stimuli in each condition. There was a clear negative correlation between the two measures in each condition, and the correlations were in the

same range as those reported by Ashby et al. (1994). Note that the absolute values of the correlations have no special meaning here because the strength of the correlations depends on the parameter settings of the model. Across a wide range of plausible parameter values, the correlations were negative in all three conditions. Moreover, in this simulation, I ignored the duration and variability of the decision process. Normally, it would be assumed that the duration of this process is a normally distributed random variable with a positive mean, and the absolute strength of the correlation between distance and RT depends also on the standard deviation of this distribution. Finally, it is likely that certain parameter values (e.g., c) differed between the conditions, perhaps as a result of differences in task difficulty, individual differences, and so forth (see Nosofsky & Palmeri, 1997b). Regardless, the main point of the simulation is that the EGCM-RT generally predicts a negative relation between RT and distance from the decision bound.

I also computed correlations between predicted RTs and famil- iarity. These correlations were usually weak, and (again depending on model parameters) they could be positive or negative, as were those reported by Ashby et al. (1994). Normally, the EGCM-RT does predict an effect of familiarity on RT, everything else being equal (this is further illustrated in the next section). However, as Nosofsky and Palmeri (1997b) pointed out, there was a negative correlation between distance from the decision bound and famil- iarity in Ashby et al.'s stimulus sets, which explains why there was no simple familiarity effect in those experiments.

The conclusion from this simulation is that the EGCM-RT predicts Ashby et al.'s (1994) results entirely by virtue of its information-accumulation process. For stimuli that are far from the decision bound, relatively little dimensional information is needed to achieve a high level of confidence in one of the categories, and hence RTs will be short. For stimuli close to the boundary, more information is needed, and therefore RTs will be longer.

Nosofsky and Palmeri (1997b) demonstrated that the EBRW also accounts for Ashby et al.'s (1994) central findings. The EBRW predicts shorter RTs for stimuli that are far from the decision bound, because such stimuli tend to be highly similar only to exemplars from their own category. As a result, the random walk marches consistently tOward a response barrier and thereby produces short RTs. For stimuli close to the boundary, exemplars from both categories tend to be retrieved, leading to inconsistent walks and slow RTs. Despite the differences between the two models, it is clear that the EBRW and the EGCM-RT can make

Page 11: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 237

500

450 •

, oo . I I Low overlap: r = -.49 35O

300

CI~ 250 .e e

150 ~ , ~ e e 100

.o

0 0 50 100 150 200 250 300

Distance

in ==5 J: o)

111 &2

&3 &4

115 & 6 & 7

118 & 9

11 10 11 11 11 12

2 1 I .... I I I , I , I I I

3 4 5 6 7 8 9 10 11 12 13

Saturation

800 I Medium overlap: r = -.50 I

400 1-_ •

"

loo •

0 • •

0 50 100 150 200 250 300

Distance

900

800

7oo I High overlap: r = -.42 • •

6110

soo -

i ~. 400 J_O.e.,L e o •

tFf_. • 200 • • ~ •

100 • • e 41)e;q~

0 50 100 150 200 250 300

Distance

Figure 6. Category structure in Nosofsky and Palmeri's (1997b) Exper- iment 1. The stimuli represented by triangles belonged to Category A, and the others belonged to Category B.

very similar predictions for the probabilistic categorization tas~ used by Ashby et al.

Individual Object Categorization Response Times

Nosofsky and Palmed (1997b) used a set of 12 computer- generated colors as stimuli in their first experiment. The stimuli were of a constant hue and varied only in saturation and brightness (which are considered to be integral dimensions). The colors were divided into two categories of six stimuli each (see Figure 6). Three. participants took part in the experiment. On each trial, a stimulus was presented on the computer screen, and the participant was instructed to assign it to one of the categories as rapidly as possible without making errors. Immediate feedback was given after every response. In addition to the speeded categorization task, each participant also rated the similarity of all possible pairs of stimuli. These ratings were used to derive multidimensional scal- ing (MDS) solutions for the colors. For each participant, the MDS solution provided two coordinate values for each stimulus (one for perceived brightness and one for perceived saturation), such that the distances between the stimuli could be used to predict their rated similarity. As in Nosofsky and Palmeri's research, I used these inferred psychological stimulus coordinates in the modeling instead of the Munsell coordinates. The observed mean RTs across Training Sessions 2-5 for the 3 participants are shown in Table 3. The RTs differed considerably between stimuli and between participants.

Figure 5. Processing times predicted by the extended generalized context model for response times as a function of distance from the decision bound in the low-, medium-, and high-overlap conditions of Ashby et al. (1994).

Page 12: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

238 LAMBERTS

Table 3 Predicted and Observed Mean Response Times (in Milliseconds) in Nosofsky and Palmeri 's (1997b) Experiment 1

Participant 1 Participant 2 Participant 3

Stimulus Predicted Observed Predicted Observed Predicted Observed

1 910 815 1,001 982 776 780 2 820 795 576 601 729 709 3 1,192 1,159 966 1,007 940 962 4 826 725 560 546 686 661 5 911 967 630 665 868 841 6 1,012 1,068 669 650 747 749 7 724 706 550 529 673 641 8 915 931 593 619 841 834 9 1,314 1,346 786 734 828 834

10 706 744 531 530 677 697 11 801 911 587 577 714 779 12 1,234 1,208 848 857 1,014 1,007

R 2 (EGCM-RT) .910 .973 .942 R 2 (EBRW) a .837 .982 .913 R 2 (RT--distance) .815 .964 .952

Note. EGCM-RT = extended generalized context model for response times; EBRW = exemplar-based random-walk model; RT-distance = response time-distance model.

The R 2 values for the EBRW are based on optimization for mean response times only, without taking accuracy data into account and with integer-valued boundaries (see Nosofsky & Palmeri, 1997b, p. 280).

Although the main interest was in modeling RTs, the EGCM-RT was applied jointly to RTs and choice proportions. This would provide some guarantee that accurate RT predictions were not obtained at the expense of choice predictions. Unfortunately, there is no ideal method for combining goodness of fit for RTs and proportions into a single measure. Unless explicitly noted other- wise, in all model applications in this article, best fitting parameter values were estimated by using the summed R ~ values for RTs and respons¢ proportions as a measure of overall goodness of fit. Response proportions were always coded as proportions of Cate- gory A responses (rather than as proportions correct), such that their variability would be maximal and it would be relatively easy for a model to produce a high R 2 value for the response propor- tions. This technique effectively reduces the impact of the predic- tions of response proportions on the overall goodness of fit. Be- cause I was mainly concerned with modeling RTs, this method seemed appropriate.

For each participant in the experiment, the EGCM-RT was applied with the following assumptions. First, because the dimen- sions were continuous, it was assumed that five elements were available for sampling from each dimension. Second, because the dimensions were integral, it was assumed that elements from both dimensions were sampled in a yoked manner and that distances in the psychological stimulus space were euclidean. Six parameter values were estimated: a processing rate (q) for the information elements (because elements from both dimensions were assumed to be sampled in a yoked manner, only one rate needed to be estimated), a utility value for saturation (the value for brightness was determined by this value because utility values sum to 1), a discriminability parameter (c), a response bias value (b), the 0 parameter, and the residual time value (tres). The EGCM-RT's RT predictions for each participant are shown in Table 3. As the R 2 values for each participant indicate, the EGCM-RT provided a

good fit for the individual RTs. The parameter values estimated for each participant are shown in Table 4. The EGCM-RT also pro- vided accurate accounts of the response proportions, with R 2 values of .996, .997, and .998 for Participants 1, 2, and 3, respectively.

Nosofsky and Palmeri (1997b) applied the EBRW (also with six free parameters) to the data from the same experiment. They first used the model to predict mean RTs for each stimulus in each training block, in a successful attempt to model learning curves. However, they also reported model fits to the mean individual stimulus RTs, without the constraints of also fitting the speedup curves and the accuracy data (see Nosofsky & Palmeri, 1997b, p.

Table 4 Best Fitting Parameter Values for the Extended Generalized Context Model for Response Times, as Applied to Nosofsky and Palmeri's (1997b) Experiments 1 and 2

Experiment 1

Parameter P1 P2 P3 Experiment 2

q 0.0013 0.0016 0.0021 0.0015 u (saturation) .708 .479 .744 .475 c 13.118 10.151 5.655 4.943 a

4.218 b b .438 .514 .516 .571 0 34.527 22.819 3.819 3.003 tre s 401.037 311.129 446.024 409.399 8 1.634

Note. See the text for explanations of 0 and 8 and for details regarding why ~ applied only in Experiment 2. P = participant; q = inclusion rate; u = utility value; c = generalization value; b = bias; t~s = residual time. a This value is for Condition U7. b This value is for Condition U8.

Page 13: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 239

280). Because these values (which are shown in Table 3) are probably the highest that can be achieved by the unmodified EBRW, they provide a good standard for comparison with the EGCM-RT. The EGCM-RT performed better than the EBRW for Participants 1 and 3 and slightly worse than the EBRW for Par- ticipant 2. Nosofsky and Palmeri also fitted the RT-distance model, which performed similarly to the EBRW and the EGCM-RT (see Table 3; note that Nosofsky and Palmed [1997b] reported Pearson correlations only between observed and predicted values for the EBRW and the RT-distance model, but because the models have an estimated intercept, R 2 simply equals the squared Pearson correlation).

The conclusion from these model applications is that the EGCM-RT provides an alternative explanation for the mean RT and accuracy data from Nosofsky and Palmeri's (1997b) Experi- ment 1. Instead of explaining RT differences between stimuli in terms of a random-walk process driven by retrieved exemplars, the EGCM-RT's account is based on the principle that stimuli differ in the amount of stimulus information that needs to be accumulated before a confident decision can be made.

Individual Exemplar Familiarity and Categorization Response Time

In Experiment 2, Nosofsky and Palmed (1997b) manipulated stimulus frequency in an attempt to discriminate between the EBRW and the RT-distance account. They used the category structure shown in Figure 7. Again, the stimuli were colors that varied in saturation and brightness. First, the participants learned to classify Stimuli 1, 2, and 3 into Category A and the other five stimuli into Category B. The presentation frequency of Stimuli 7

g

8 A8

6 &7

5 & ~ ==2

4 = = 1

i P ,I 3 ,,,

3 5 7 9 11 13

Saturation

Figure Z Category structure in Nosofsky and Palmeri's (1997b) Exper- iment 2. The stimuli represented by squares belonged to Category A, and the others belonged to Category B. The diagonal line represents a linear decision bound.

Table 5 Mean Response Times (in Milliseconds) and Proportions Correct in Nosofsky and Palmeri's (1997b) Experiment 2 and Predictions From the Extended Generalized Context Model for Response Times

Response time Proportion correct

Stimulus O b s e r v e d P red ic t ed O b s e r v e d Predicted

Condition U7

1 750 755 .964 .959 2 794 772 .927 .948 3 648 652 .992 .988 4 859 874 .891 .902 5 740 781 .988 .946 6 846 807 .932 .934 7 703 711 .944 .974 8 648 647 .996 .989

Condition U8

1 795 801 .948 .941 2 834 819 .948 .932 3 677 672 .980 .983 4 897 921 .895 .892 5 819 825 .972 .932 6 896 865 .883 .909 7 672 667 .984 .984 8 752 757 .923 .960

and 8 during training was manipulated between participants. In Condition U7, Stimulus 7 was never presented, whereas Stimu- lus 8 was not presented in Condition U8. In the transfer task, the participants carded out speeded classifications of the eight stimuli. Their classification RTs were recorded. The participants also pro- vided similarity ratings of all stimulus pairs, and the MDS solution of these ratings was used in all modeling.

The EBRW predicts that classification RT is influenced by individual stimulus familiarity. Unfamiliar stimuli should result in slow retrieval times for stored exemplars and should thereby lead to longer random walks. If a stimulus is highly familiar (because it has been encountered many times before), the stored exemplar that corresponds to that stimulus will have high strength and therefore will produce very short retrieval times. The EBRW thus predicts that, all other things being equal, familiar stimuli are classified faster than unfamiliar stimuli (Nosofsky & Palmeri, 1997b), and the model therefore predicts an effect of presentation frequency for Stimuli 7 and 8. In Condition U7, Stimulus 7 should be classified more slowly than Stimulus 8, whereas the reverse pattern should occur in Condition U8. The RT-distance model, in contrast, does not predict an effect of presentation frequency on RT, because changes in relative frequency would not change the relative distances of Stimuli 7 and 8 from the decision boundary.

The observed mean RTs are shown in Table 5. Contrary to the prediction from the RT-distance hypothesis, there was a signifi- cant interaction between presentation condition and RT for Stim- uli 7 and 8. As predicted by the EBRW, Stimulus 7 was classified more rapidly than Stimulus 8 in Condition U8, and the reverse pattern held in Condition U7. Nosofsky and Palmeri (1997b) fitted the EBRW to the data presented in Table 5. Of most interest are the results from their attempt to jointly model RTs and accuracy data

Page 14: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

240 LAMBERTS

Figure 8. Average response times (RTs) predicted by the extended generalized context model for RTs across 15 training blocks. The diamonds represent predictions for 3' = 0.05, and the squares represent predictions for 3, = 0.10. The solid and dashed lines connect the values of the best matching power functions.

(see Nosofsky & Palmeri, 1997b, Tables C2 and C3). To achieve good fits, Nosofsky and Palmed assumed that the discriminability parameter (c) differed between Conditions U7 and U8. Moreover, they also found that model fit for the accuracy data improved considerably if Stimulus 3 was assumed to have a higher associ- ated discriminability value than the other stimuli. In their modeling with the EBRW, this was implemented by adding an additional free parameter (~), such that the similarity between Stimulus 3 and any exemplar in memory was computed as

s3i = e x p ( - 8 × c × d3j),

in which daj is the distance in the psychological space between Stimulus 3 and exemplar j. The best fitting value of 8 was 1.300. With these additional assumptions (which are all very reasonable in the context of this experiment), the EBRW (with a total of nine free parameters) accounted for 92.6% of the variance in the RTs and 27.8% of the variance in the proportions of correct responses.

I applied the EGCM-RT to the data, with similar additional assumptions to those made by Nosofsky and Palmeri (1997b). Thus, separate c values were estimated for the two main experi- mental conditions, and the c value was multiplied by an additional

parameter in the similarity computations for Stimulus 3. Other- wise, the same parameter values applied to both experimental conditions. The EGCM-RT used the psychological stimulus coor- dinates produced by MDS of similarity ratings of the stimuli, as shown in Nosofsky and Palmeri's Figure 9. Again, the integral-

dimension version of the EGCM-RT was applied, with five infor- mation elements for each dimension. Instead of modeling propor- tions of Category A responses, I followed the convention adopted by Nosofsky and Palmeri and coded all choice data as proportions correct. This method greatly reduced the variability in the choice data, but it improved the comparability of the goodness-of-fit figures produced by the EGCM-RT and the EBRW (although it should be noted that Nosofsky and Palmed [1997b] used different criteria for goodness of fit, so that model comparisons should be made with some caution).

The EGCM-RT's predictions are shown in Table 5, and the best fitting parameter values are listed in Table 4. With eight estimated parameters (one less than the EBRW), the EGCM-RT performed well, accounting for 94.5% of the variance in RTs and 64.6% of the variance in the proportions correct. The model thus fitted both the RT data and the accuracy data better than the EBRW. The EGCM-RT correctly predicted that responses to Stimuli 7 and 8 were faster than responses to most other stimuli. Most important, the EGCM-RT also predicted the interaction between presentation condition and RT for Stimuli 7 and 8. Presentation frequency of individual exemplars has an effect in the EGCM-RT because the strength of exemplars depends on frequency. If a transfer stimulus is highly similar or identical to a strong training exemplar, confi- dence in one category will be high even if only relatively little stimulus information is available, and hence RTs will be short. The precise effects of variation in stimulus ]frequency are a complex function of the stimulus characteristics and the category structure.

Page 15: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 241

Figure 9. Example stimuli from Experiment 1. The top stimulus has a value of 0 for all dimensions, and the bottom stimulus has a value of 1 for all dimensions.

Effects of Practice on Categorization Response Time: The Extended Generalized Context Model for Response Times and the Power Law

An almost universal f'mding in experiments with speeded re- sponses is that the time to respond to items decreases with practice, according to the power law of practice (Logan, 1988, 1992; Newell & Rosenbloom, 1981). In its most general form, the power law states that

RT = A + BN -c, (19)

in which A is the asymptotic RT, B is the difference between initial and final RT, N is the amount of practice, and C is the learning-rate parameter.

Nosofsky and Palmed (1997b) demonstrated that the power law accurately accounts for the speedup in categorization RT across trials. In their Experiment 1 (discussed in the preceding section), they recorded mean RTs across 150 blocks of practice. The speed- ups of performance were well described by power-law functions. Nosofsky and Palmeri also showed that the EBRW provided a

good account of the decrease in RTs, making predictions that were virtually indistinguishable from the power-law predictions. The EBRW predicts a speedup with training because more exemplars get stored in memory (or, which is equivalent, existing exemplars get strengthened), which implies that the winning retrieval times will become faster. The faster exemplar retrieval times result in faster steps in the random-walk process and, thus, in faster deci- sions, Palmeri (1997) further demonstrated that the EBRW can explain speedup with practice in a numerosity judgment task and that the model can also account for effects of within- and between- category similarity on speedup.

In this article, I do not attempt to provide detailed model fits of the EGCM-RT to Nosofsky and Palmeri's (1997b) speedup data. Instead, I merely demonstrate that the EGCM-RT predicts speedup functions that are virtually identical to power functions. Assume that participants receive 15 blocks of training with the category structure shown in Figure 6 (this is the category structure that was used by Nosofsky & Palmed, 1997b, Experiment 1). Furthermore, assume that the psychological stimulus space corresponds exactly to the physical stimulus space (this assumption is unrealistic, but that does not matter for the purpose of this demonstration). The EGCM-RT was applied with the following parameters: both utility values = 0.5, c = 10, q = 0.001, 0 = 10, t~s = 500, and 2/= 0.01 or 0.02. Exemplar strength was assumed to be equal to presenta- tion frequency. Figure 8 shows the RTs predicted by the EGCM-RT across the 15 blocks of training, averaged across the stimuli shown in Figure 6, and separately for 3, = 0.01 and ",/= 0.02. The solid and dashed lines in Figure 8 connect the values of the best matching power functions (see Equation 19). It is clear that the EGCM-RT's predictions are indistinguishable from the power-function predictions. In the EGCM-RT, the parameters that independently determine the shape and intercept of the speedup function are 3' and t ~ . The speedup occurs as a result of the increasing importance of exemplar-similarity information relative to the decision noise represented by 3'- As exemplars gain strength

Table 6 Category Structure in Experiment 1

Dimension

Assignment and stimulus Base Upright Shade Top

Category A 1 1 1 1 0 2 1 0 1 0 3 1 0 1 1 4 1 1 0 1 5 0 1 1 1

Category B 6 1 1 0 0 7 0 1 1 0 8 0 0 0 1 9 0 0 0 0

Transfer 10 0 0 1 0 11 0 0 1 1 12 0 1 0 0 13 0 1 0 l 14 1 0 0 0 15 1 0 0 1 16 1 1 1 1

Page 16: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

242 LAMBERTS

Table 7 Observed and Predicted Response Times (in Milliseconds) and Choice Proportions in Experiment 1

Participant

1 2 3 4 5

Stim. Obs. EGCM EBRW Obs. EGCM EBRW Obs. EGCM EBRW Obs. EGCM EBRW Obs. EGCM EBRW

1 897 981 969 1,656 2,121 1,834 1,054 1,014 916 1,361 1,225 1,036 656 660 621 .77 .83 .74 1.00 .99 .98 .93 1.00 .99 .90 .94 1.00 .80 .91 .85

2 1,112 981 969 1,662 1,870 1,581 743 826 848 907 902 1,013 492 516 532 .90 .83 .74 1.00 1.00 1.00 .93 1.00 1.00 .77 .95 1.00 .97 .94 .98

3 838 725 757 1,750 1,544 1,573 723 741 844 840 896 1,013 514 500 508 .87 .99 1.00 1.00 1.00 1.00 .97 1.00 1.00 .90 .96 1.00 1.00 .97 .99

4 898 952 924 1,903 2,394 2,360 1,257 1,164 1,187 1,486 1,600 1,642 685 654 674 .87 .93 .86 .89 .63 .82 .80 .89 .75 .53 .46 .52 .30 .40 .39

5 748 725 760 2,856 2,937 2,304 1,119 1,109 1,138 1,771 1,660 1,633 553 579 581 .90 .99 1.00 .96 .98 .84 .90 .99 .84 .60 .67 .59 .90 .95 .93

6 653 726 688 970 1,333 1,231 1,056 1,048 1,015 1,697 1,558 1,561 587 567 576 .10 .13 .00 .03 .00 .00 .13 .00 .07 .27 .12 .24 .07 .04 .08

7 880 981 1,011 2,929 2,859 2,938 1,076 1,203 1,144 1,533 1,682 1,605 659 663 639 .70 .54 .55 .08 .02 .01 .07 .06 .20 .20 .30 .32 .57 .63 .80

8 996 952 970 1,851 1,326 1,191 764 773 739 923 874 932 512 514 505 .67 .67 .74 .00 .00 .00 .00 .00 .00 .03 .02 .00 .07 .07 .01

9 684 726 687 949 998 1,190 773 693 736 878 866 932 507 497 475 .03 .12 .00 .00 .00 .00 .03 .00 .00 .00 .01 .00 .00 .03 .00

10 1,072 981 1,011 3,024 2,608 3,197 1,149 1,059 1,144 1,144 1,094 1,195 529 519 546 .37 .54 .55 1.00 .82 .33 .77 .78 .85 .77 .63 .99 .93 .92 .97

11 849 725 760 3,003 2,686 2,852 941 972 983 1,028 1,092 1,165 513 503 516 .97 .99 1.00 .08 1.00 .57 .97 .99 .98 .80 .74 1.00 1.00 .97 .99

12 670 726 687 977 1,058 1,194 908 903 837 1,093 1,122 1,085 533 565 566 .07 .12 .00 .00 .00 .00 .00 .00 .00 .03 .04 .00 .07 .04 .07

13 932 952 970 2,114 1,519 2,114 963 1,005 1,014 1,057 1,160 1,128 641 650 666 .83 .67 .74 .03 .00 .00 .07 .07 .06 .10 .06 .00 .20 .25 .33

14 657 726 688 1,025 1,273 1,206 788 885 837 1,068 1,051 1,028 494 499 483 .03 .13 .00 .03 .00 .00 .03 .00 .00 .03 .15 .00 .17 .03 .01

15 944 952 924 2,018 2,200 2,009 1,020 979 1,012 1,112 1,065 1,049 512 518 521 .87 .93 .86 .10 .00 .00 .03 .02 .06 .25 .25 .00 .07 .09 .02

16 701 725 757 1,830 1,794 1,744 956 922 899 1,159 1,197 1,036 594 576 569 .97 .99 1.00 1.00 1.00 .99 1.00 1.00 1.00 .97 .96 1.00 .97 .97 .94

Note. For each stimulus, the top row of values are response times, and the bottom row of values are choice proportions. Stim. = stimulus; Obs. = observed;

through repetition, less stimulus information needs to be sampled to achieve a particular level of confidence in the category decision, which explains why responses become faster over time. Note that this explanation of speedup is different from the EBRW's expla- nation. Although the EGCM-RT and the EBRW have in common that speedups are ultimately attributed to strengthening of exem- plars, the EBRW predicts speedups for two reasons: faster re- trieval of exemplars and greater exemplar strength relative to background-noise elements (see Nosofsky & Alfonso-Reese, 1999). The EGCM-RT makes no assumptions about changes in exemplar retrieval times but assumes that practice allows people to sample less stimulus information and yet maintain a sufficiently high level of accuracy.

Nosofsky and Palmeri (1997a; 1997b, Experiment 3) also ap- plied the EBRW to data from Garner 's (1974) speeded classifica- tion tasks and systematically compared the EBRW and the decision-bound model in these tasks. I do not review these exper- iments in this article because the issues involved are complex and require a more extensive discussion than I can provide here. The experiments that I have reviewed so far are probably sufficient to

support the main point I wanted to make in the introduction, namely, that the EGCM-RT can provide an alternative to the EBRW. In the remainder of this article, I present three speeded categorization experiments that aim to provide direct tests of the EGCM-RT as a model of RTs in speeded categorization tasks. In the previous sections, I have shown that the EGCM-RT can ex- plain categorization RTs for integral-dimension stimuli. However, to my knowledge, there have been no systematic studies of cate- gorization RTs with stimuli that consist of more than two separable dimensions. In the experiments that follow, the stimuli were com- posed of four clearly distinguishable components. Because the components were spatially nonoverlapping and very distinctive, it seemed reasonable to assume that the dimensions corresponding to these components were perceptually separable.

The focus of the experiments and the modeling is primarily on the evaluation of the EGCM-RT, rather than on a systematic comparison between the EGCM-RT and the EBRW. As I indicated in the introduction, Nosofsky and Palmeri (1997b) emphasized that they did not intend for the EBRW to apply to stimuli with sepa- rable dimensions because they accepted that processing of such

Page 17: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 243

Participant

6 7 8 9 10

Obs. EGCM EBRW Obs. EGCM EBRW Obs. EGCM EBRW Obs. EGCM EBRW Obs. EGCM EBRW

1,215 1,224 1,236 1,426 1,427 1,391 2,384 2,273 2,116 1,005 1,046 1,059 1,619 1,613 1,518 1.00 1.00 .88 1.00 .98 ,97 .87 .98 .85 1.00 1.00 .94 1.00 .98 .91

1,128 1,148 1,072 1,417 1,427 1,380 1,385 1,515 1,408 1,022 1,046 973 1,004 1,056 940 1.00 1.00 1.00 1.00 .98 ,97 .90 .98 1.00 1.00 1.00 .97 1.00 .99 1.00 841 1,003 1,065 963 864 990 1,302 1,235 1,360 854 847 903 997 888 940 .97 1.00 1.00 1.00 1.00 1.00 .90 .99 1.00 1.00 1.00 .99 1.00 1.00 1.00

1,157 1,236 1,178 1,669 1,625 1,498 2,171 2,289 2,244 1,749 1,529 1,384 1,393 1,423 1,346 .93 .99 .95 .93 .95 .96 .96 .96 .78 .70 1.00 .67 .97 .99 .96

1,285 1,167 1,165 911 962 1,061 1,944 1,960 2,022 945 1,002 1,105 1,304 1,248 1,316 1.00 1.00 .96 1.00 1.00 1.00 .96 .99 .89 1.00 1.00 .92 .97 1.00 .96

1,025 1,125 1,081 814 765 735 2,104 2,060 2,008 1,569 1,542 1,512 1,419 1,554 1,478 .00 .00 .02 .03 .00 .00 .07 .03 .12 .21 .16 " .33 .00 .04 .06

1,266 1,240 1,239 1,380 1,476 1,533 2,346 2,311 2,429 1,109 1,202 1,299 1,812 1,612 1,860 .00 .00 .16 .03 .00 .07 .04 .11 .48 .03 .31 .76 .00 .11 .29

1,147 1,081 986 1,301 1,319 1,273 1,515 1,456 1,279 1,217 1,534 1,524 1,125 1,126 1,063 .00 .00 .00 .00 .00 .02 .00 .02 .00 .04 .09 .21 .00 .08 .00

1,023 904 980 725 757 654 1,237 1,177 1,229 1,812 1,548 1,524 1,157 977 1,063 .00 .00 .00 .00 .00 .00 .07 .01 .00 .04 .02 .11 .07 .03 .00

1,328 1,165 1,160 1,586 1,476 1,564 1,587 1,566 1,625 1,056 1,202 1,210 900 1,056 970 .17 .20 .97 .00 .00 .07 .97 .90 .98 .10 .31 .85 1.00 .90 1.00

1,047 1,038 1,089 964 954 1,066 1,392 1,289 1,456 988 1,002 1,040 924 896 970 .97 1.00 1.00 1.00 1.00 1.00 1.00 .96 1.00 1.00 1.00 .95 1.00 .95 1.00

1,124 1,089 1,081 785 758 655 2,027 2,045 1,927 1,788 1,548 1,534 1,616 1,552 1,525 .00 .00 .02 .00 .00 .00 .00 .03 .09 .52 .02 .20 .00 .03 .06

1,194 1,222 1,264 1,325 1,325 1,327 2,361 2,278 2,406 1,349 1,534 1,507 1,398 1,428 1,388 .87 .84 .87 .00 .00 .03 .03 .14 .66 .10 .09 .40 .10 .35 .96

932 940 999 675 764 737 1,145 1,230 1,322 1,494 1,542 1,532 995 977 1,097 .00 .00 .00 .00 .00 .00 .03 .03 .00 .08 .16 .24 .00 .10 .00

947 1,095 1,071 1,583 1,618 1,529 1,428 1,505 1,502 1,511 1,529 1,481 1,073 1,118 1,097 .00 .00 .01 .97 .94 .95 .07 .07 .01 .21 1.00 .49 .07 .25 .00

1,149 1,132 1,141 862 872 989 1,788 1,918 1,804 990 847 968 1,047 1,242 1,216 1.00 1.00 .98 1.00 1.00 1.00 1.00 .99 .95 1.00 1.00 .98 1.00 1.00 .98

EBRW = predictions from the exemplar-based random-walk model; EGCM = predictions from the extended generalized context model for response times.

stimuli might involve serial perceptual processing, which is not accounted for in the EBRW. The evidence from the aforemen- tioned experiments on categorization under time pressure confirms that sampling of stimulus information may be an important ele- ment of perceptual categorization of separable-dimension stimuli. What remains to be demonstrated is that the EGCM-RT can account for various aspects of RTs and RT distributions in tasks without deadlines or response signals, and this was the main goal of the three experiments reported here.

E x p e r i m e n t 1

This experiment consisted of two stages. In the first stage, the participants learned to assign 9 stimuli to two categories (5 stimuli belonged to Category A and 4 stimuli to Category B). In the second stage, they categorized 16 transfer stimuli and were told to respond as quickly and accurately as possible. The stimuli were realistically rendered images of table lamps, which consisted of four binary dimensions (see Figure 9). RTs and choice proportions in the transfer task were recorded.

M e ~ o d

Participants. Eleven undergraduate and graduate students from the University of Birmingham participated in this experiment. Most of them had experience with RT experiments, but none of them had participated in a category-learning study before.

Apparatus and stimuli. A Gateway P-90 Pentium computer was used for stimulus presentation, response registration, and timing. The stimuli were presented on a 37-cm Gateway Vivitronl5 monitor, with a resolution of 800 × 600 pixels. Responses were registered by means of two buttons (one under each index finger) mounted 10 cm apart.

The stimuli were fairly realistic, rendered images of table lamps (see Figure 9). The stimuli were about 5 cm high and 2.5 cm wide, and they were viewed from a distance of 1.5 m. The lamps varied on four binary dimensions: base (smooth or stacked), upright (thick or thin), shade (con- ical or hemispherical), and top (cylindrical or rounded).

Design and procedure. The experiment consisted of a training stage and a transfer stage. The structure of the training stimuli is shown in Table 6. This structure was first used by Medin and Schaffer (1978). I chose it for this experiment because it has been used successfully in numerous cate- gorization studies (including Lamberts, 1995, 1998). The training stimuli

Page 18: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

244 LAMBERTS

were presented sequentially in blocks of nine. Within each block, all training stimuli occurred in random order. On each training trial, a lamp appeared in the center of the computer screen and remained present until the participant assigned it to a category (A or B) by pressing one of the two response buttons. After each response, auditory correct-incorrect feedback was given. The assignment of the response buttons to the categories was counterbalanced across participants. Training continued until the partici- pants made no errors on three successive blocks of trials.

The transfer stage followed the training stage after a break of about 5 min. In the transfer stage, all 16 stimuli that could be constructed from the four binary features were presented (see Table 6). Each participant carried out a total of 480 transfer trials (30 blocks of trials). Within each block of 16 trials, all transfer stimuli occurred in random order. The participants were instructed to categorize the stimuli as quickly and accurately as possible, using the information they had acquired in the training stage. No correct-incorrect feedback was given in the transfer stage. Each trial started with the presentation of a small fixation cross for 500 ms. The fixation cross was immediately replaced by the transfer stimulus. The stimulus remained present on the computer screen until a response was given. The participants were allowed to take a short break after every 160 trials.

Table 8 Goodness of Fit (R 2) for the EGCM-RT and the EBRW, as Applied to Response Times (RTs) and Choice Proportions (Choice) From Experiment 1

EGCM-RT EBRW

Participant RT Choice RT Choice

1 .72 .93 .79 .93 2 .79 .73 .84 .79 3 .84 .99 .81 .99 4 .91 .94 .79 .88 5 .93 .98 .87 .96 6 .54 .99 .43 .81 7 .97 .99 .92 .99 8 .96 .98 .92 .80 9 .77 .61 .69 .49

10 .85 .96 .92 .77

Note. EGCM-RT = extended generalized context model for response times; EBRW = exemplar-based random-walk model.

Results and Discussion

All participants learned the category structure within a reason- able number of trials. The average number of 9-trial blocks needed to reach criterion performance in the training stage was 37.9 (SD = 21.9).

Before the analyses of the transfer data, trials with an RT of more than 4,000 ms were discarded (1.6% of all trials). Because I was primarily interested in modeling RTs and accuracy data for individual participants, a further screening of the data was carded out to ensure that only participants who achieved a reasonable level of performance were included in the analyses. The criterion for acceptable performance in the transfer stage was at least 75% correct responses across the 9 stimuli that were presented in training. One participant failed to meet this criterion and was therefore excluded from the analyses. The mean RTs and choice data from the remaining 10 participants are shown in Table 7. Single-participant analyses of variance (ANOVAs) of RTs with stimulus as the independent variable yielded reliable effects for all participants (all ps < .001).

The separable-dimensions version of the EGCM-RT was ap- plied to the individual-participant data in Table 7. As I justified in the introduction, I assumed that each stimulus dimension consisted of one information element. The EGCM-RT had 11 free parame- ters: an inclusion rate (q) for each stimulus dimension, three utility values (the fourth was constrained by the other three), a discrim- inability value (c), a residual time value (tres), a bias value (b), and 0. The noise parameter (3') was set to 0. Best fitting parameter values were computed separately for each participant. Parameter estimation was based on a maximum R E criterion. The best fitting parameter values were those that maximized the sum of the R E

values for RTs and response proportions. Table 7 presents an overview of the EGCM-RT's predictions.

The goodness-of-fit data for RTs and choice proportions are shown in Table 8, and the best fitting parameter values are listed in Table 9. Figure 10 summarizes the data for the entire group of partici- pants, showing the relation between predicted RTs averaged across participants and average observed RTs. The goodness-of-fit values in Table 8 show considerable differences between participants.

The EGCM-RT explained more than 80% of the variance in the RTs for Participants 3, 4, 5, 7, 8, and 10 and also provided good accounts of the choice proportions for these participants. Given that the number of collected RTs for each stimulus and each participant was relatively small (--<30), the results for these par- ticipants are good and provide support for the model. Of particular interest are the model 's (apparent) failures, and I therefore discuss the results from the other participants in some detail. For Partici- pant 1, the EGCM-RT provided a reasonable account of RTs and choice proportions, and there seemed to be no systematic de- viations between predictions and observations. Although the EGCM-RT predicted the RTs for Participant 2 rather well (R 2 = .79), the model ' s choice predictions were not very good (R 2 = .73) as compared with the predictions for the other participants. Inspec- tion of the data in Table 7 shows that this finding is primarily due to a complete failure of the model to predict the low proportion of Category A responses for Stimulus 11, which was a new transfer stimulus. The EGCM-RT predicted that Stimulus 11 should be classified consistently into Category A, whereas Participant 2 produced only 8% Category A responses for this stimulus. It is not clear why this participant preferred Category B, given that all other participants produced high proportions of Category A responses for Stimulus 11. This idiosyncratic response pattern cannot be explained by the model. Because it is a unique occurrence, it probably does not indicate a fundamental shortcoming of the model. For Participant 6, the EGCM-RT provided an excellent account of choice proportions (R 2 = .99) but only a poor account of RTs (R 2 = .54). It is difficult to pinpoint a single cause of this failure. The EGCM-RT's predictions differed from the observed RTs by more than 100 ms on Stimuli 3, 5, 9, 10, and 15, but there were no truly excessive differences between observations and predictions. For Participant 6, the only conclusion can be that the EGCM-RT generally failed to account for the main trends in the RTs. Finally, for Participant 9, the EGCM-RT performed reason- ably well on RTs (R 2 = .77) but poorly on choice proportions (R 2 = .61). The model failed to predict the choice proportions for Stimuli 12 (.02 predicted and ,52 observed) and 15 (1.00 predicted and .21 observed). Again, there is no obvious reason for this selective failure of the model.

Page 19: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION

Table 9 Best Fitting Model Parameters for the Extended Generalized Context Model for Response Times, as Applied to Data From Experiment 1

245

Participant

Parameter 1 2 3 4 5 6 7 8 9 10

q (base) 0.0011 0.0010 0.0027 0.0011 0.0016 0.0015 0.0018 0.0005 0.0011 0.0008 q (uptight) 0.0018 0.0005 0.0029 0.0019 0.0094 0.0041 0.0012 0.6907 0.1053 0.0273 q (shade) 0.2682 8.0369 0.0317 0.0042 1.6468 0.9845 4.0114 0.0039 0.1489 0.0043 q (top) 10.501 0.2029 0.0019 0.0007 0.5734 0.0598 9.779 0.0017 0.0012 0.6325 0 2.844 9.192 9.193 3.885 3.056 18.221 12.942 6.467 11.211 6.472 t~s 723.1 786.0 488.5 582.3 475.9 498.8 726.3 500.3 289.3 502.5 u(base) .285 .200 .182 .311 .222 .243 .203 .193 .287 .246 u (upright) .000 .258 .200 .235 .289 .291 .024 .196 .000 .249 u (shade) .359 .472 .423 .448 .440 .309 .571 .453 .231 .409 u (top) .356 .070 .195 .006 .049 .157 .202 .158 .482 .096 c 51.524 17.351 10.827 86.822 15.959 9.869 17.101 7.890 7.149 9.447 b .583 .305 .428 .413 .457 .441 .322 .471 .697 .531

Note. See the text for an explanation of 0. q = inclusion rate; tr~ = residual time; u = utility value; c = generalization value; b = bias.

An important indicator of problems with a model is systematic deviations between observed and predicted values, which occur across participants. If the discrepancies between observed and predicted values are due only to noise in the data (i.e., if the model is essentially correct), then one can expect residuals to be uncor- related across participants. To verify whether the EGCM-RT sys- tematically predicted RTs that were too long or too short for certain stimuli, I carried out t tests on the residuals for each stimulus. Only the test for Stimulus 14 was significant, t(9) = 2.45, p < .05. As shown in Figure 10, the EGCM-RT tended to overestimate the RT for this stimulus. In fact, the EGCM-RT predicted slower responses to Stimulus 14 than was observed for 8 out of 10 participants. It is not clear why this effect emerged, but it does indicate that the model failed to capture some

1600

1500

1400

1300 ,0 O ~ 1200

1100

1000

gO0 9O0

j 10• 1

) 6

1000 1100 1200 1300 1400 1500 1600

Predicted (EGCM.RT)

Figure 10. Average predicted and observed response times (RTs; in milliseconds) for the stimuli in Experiment 1. The predicted values were obtained by averaging the predictions of the extended generalized context model for RTs (EGCM-RT) across participants.

aspect of how the participants processed this stimulus. However, the nonsignificant t tests for the other stimuli are somewhat reas- suring, because they do not provide any evidence for further systematic deviations between predictions and data.

The EGCM-RT's best fitting parameter values are shown in Table 9. The estimated inclusion rates show that the shade dimen- sion was processed fastest, on average, by 6 participants. Another general result is that the base dimension had a relatively low processing rate for all participants. Otherwise, processing rates differed between participants, which precluded a simple interpre- tation of processing rates in terms of dimensional salience, as suggested by Lamberts (1995). Although the rate of information accumulation is almost certainly influenced by physical salience (Bundesen, 1990; Eriksen & Schultz, 1977; Hoffman & Singh, 1997; Lamberts, 1995, 1998), other factors may be important as well. For instance, Btmdesen's theory assumes that processing rates also depend on attention, in that attended elements are pro- cessed faster than nonattended elements. The ability to modify processing rates as a function of task context has great adaptive value. If an observer knows that a particular dimension is highly relevant in a given context, the optimal strategy would be to process that dimension as quickly as possible, regardless of its physical salience. The results from Lamberts's (1995, 1998) stud- ies indicate that such top-down influences cannot always overrule the constraints imposed by physical salience, but it is likely that attention can alter the processing rates of features within certain limits. A further discussion of the possible determinants of pro- cessing rates is provided in Lamberts (1998) and in Lamberts and Freeman (1999a).

The utility values showed a different pattern from the inclusion rates. With the exception of Participant 9, all participants weighted the shade dimension most heavily. The base dimension received intermediary weights for all participants, whereas the weights of the upright and top dimensions varied. It has been argued that participants tend to use utility values (or dimension weights) that optimize performance for a given category structure (e.g., Kruschke, 1992; Nosofsky, 1984). For the category structure in Experiment 1, the optimal rank order of the dimensions by utility

Page 20: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

246 LAMBERTS

Table 10 Best Fitting Model Parameters for the Exemplar-Based Random-Walk Model, as Applied to Data From Experiment 1

Participant

Parameter 1 2 3 4 5 6 7 8 9 10

w (base) .008 .247 .141 .145 .009 .078 .090 .052 .049 .041 w (uptight) .000 .410 .142 .158 .081 .206 .002 .185 .035 .528 w(shade) .124 .275 .668 .691 .865 .594 .205 .693 .864 .370 w(top) .868 .067 .049 .006 .044 .122 .702 .070 .052 .061 c 10.790 4.925 12.743 19.484 12.176 15.772 10.445 12.635 13.720 24.716 A 3.915 3.819 2.339 3.927 2.070 1.596 2.783 1.840 1.724 2.072 - B 18.159 1,239.398 3.716 6.429 2.500 1.960 5.761 2.038 .908 1.832 t~s 660.896 1 ,189 .282 551.812 803.814 281.802 603.468 301.625 .000 .000 .000 k .196 .012 1.967 .252 3.527 3.289 4.458 12.000 64.618 11.298

21.345 21.171 38.681 128.434 21.255 70.586 24.546 53.452 13.237 44.577

Note. See the text for explanations of k and a. w = weight; c = generalization value; A and - B = response barriers; t=s = residual time.

value was base or shade (which optimally had equal values), then top, and finally upright (see Lamberts, 1995; Nosofsky, 1984). For most participants, the estimated values corresponded fairly well to the optimal values, although the base dimension was usually weighted less than optimal.

The estimated values for 0 ranged from 2.844 for Participant 1 to 18.221 for Participant 6. These values are relatively high, which indicates that the participants ceased sampling stimulus features only if they were confident about category membership. The estimated values for c and b and the residual times differed considerably between participants, confirming the importance of fitting individual-participant data.

Overall, the main conclusion from the model applications is that the EGCM-RT provides a satisfactory account of the individual- participant results from Experiment 1. For most participants, the model captured the differences between stimuli well, often pro- viding an excellent joint account of RTs and choice proportions. Therefore, the temporal characteristics of a feature-sampling pro- cess can be used to account for categorization RT differences between stimuli that consist of separable dimensions. The data are largely in agreement with the EGCM-RT's assumption that par- ticipants sample object features until they have accumulated enough evidence in favor of one of the categories to initiate a confident response.

Next, the EBRW was applied to the individual-participant RTs and response proportions from Experiment 1, using the same optimization criterion as for the EGCM-RT (summed R2). Al- though the stimuli used in this experiment were beyond the EBRW's scope, the model can still provide useful reference points for the EGCM-RT, and differences between the models' predic- tions might give information about the processes that produce RT differences between stimuli in this experiment. For each partici- pant, nine parameters were estimated: three dimension weights, a discrimination value (c), a value for a in the step-time function, a criterion parameter A, a criterion parameter B, a scaling constant (k), and a residual time parameter (tres). In the EBRW, it is normally assumed that A and B are integers (Nosofsky & Palmeri, 1997b), but I assumed that these parameters were real-valued in the modeling to obtain a maximally general model. (Nosofsky and Alfonso-Reese [1999] also assumed real-valued criterion settings in their applications of the EBRW.) The real-valued criterion

settings can be considered as an approximation of mixtures in the criterion settings across trials. The EBRW's predictions are shown in Table 7, and the goodness-of-fit values are summarized in Table 8. The best fitting parameter values are shown in Table 10, and a scatter plot of average predicted RTs against average ob- served RTs is provided in Figure 11.

The EBRW fitted the RT data better than the EGCM-RT for Participants 1, 2, and 10, despite having two parameters less than the EGCM-RT. Except for Participants 1 and 2, the EBRW fitted the choice proportions worse than the EGCM-RT. T tests on the residual RTs for each stimulus showed significant discrepancies between observed and predicted values for Stimuli 7, 13, and 14. As shown in Figure 11, the EBRW predicted RTs that were too long for these stimuli. Otherwise, the model fitted the data rea- sonably well.

Of course, direct comparisons between the EBRW and the EGCM-RT are difficult because of the difference in free parame-

16oo

"== P ¢D

o

1500

1400

1300

1200

1100

1000

900 9O0

8 e

I I I I I I

1000 1100 1200 1300 1400 1500 1600

Predicted (EBRW)

Figure 11. Average predicted and observed response times (in millisec- onds) for the stimuli in Experiment 1. The predicted values were obtained by averaging the predictions of the exemplar-based random-walk model (EBRW) across participants.

Page 21: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 247

ters and the use of R 2 as a measure of goodness of fit (which precludes the application of measures such as Akaike information criterion statistics). Therefore, it is important to look for qualitative differences between the models ' predictions and to contrast those with the data. Inspection of the data in Table 7 shows that the models generally make similar predictions, and it is difficult to discern obvious and significant discrepancies between them. To obtain an approximate indication of the correspondence between the two models ' predictions, I computed Pearson correlations between the two models ' residuals across stimuli, separately for each participant and for RTs and choice proportions (this is equiv- alent to computing partial correlations between the models ' pre- dictions, partialing out their correlation with the observed data). These correlations are shown in Table 11. They were all positive and were remarkably high for some participants (e.g., Partici- pants 1 and 9). This finding confirms that the two models tend to make similar predictions for this category structure. The main conclusion from the application of the EBRW is that the model managed to explain significant trends in the data and that the EGCM-RT usually performed at a similar level. This conclusion is similar to that from the experiments with integral-dimension stim- uli discussed in the introduction. Overall, the data from Experi- ment 1 lend further support to the EGCM-RT as an alternative to the EBRW.

E x p e r i m e n t 2

The primary purpose of Experiment 2 was to collect information about RT distributions and to investigate the EGCM-RT's ability to predict the characteristics of these distributions. Information about RT distributions is often crucial in the evaluation of models, even if they have the ability to account for mean RTs (e.g., Nosofsky & Palmeri, 1997a; Ratcliff, 1979; Ratcliff & Murdock,

1976). Nosofsky and Palmed (1997a) carried out a detailed study of the

E B R W ' s ability to account for RT distributions. They recorded RTs in the classic set of speeded classification tasks used by Garner (1974) for distinguishing between stimuli with integral and separable dimensions. The EBRW yielded good quantitative fits to the RT distributions. Therefore, I took the E B R W ' s ability to

Table 11 Pearson Correlations Between the EBRW's and the EGCM-RT's Residuals Across Stimuli, Separately for Each Participant and for Response Times (RTs) and Choice Proportions

Participant RT Choice

1 .92 .72 2 .52 .75 3 .67 .17 4 .74 .13 5 .64 ,70 6 .87 .32 7 .45 .44 8 .43 .56 9 .93 .55

10 .49 .67

Note. EBRW = exemplar-based random-walk model; EGCM-RT = extended generalized context model for response times.

Table 12 Category Structure in Experiment 2

Dimension

Assignment and stimulus Base Upright Shade Top

Category A 1 1 1 1 0 2 1 1 0 1 3 1 0 1 1 4 0 0 0 0

Category B 5 1 0 0 0 6 0 1 0 0 7 0 0 1 0 8 1 1 1 1

Transfer 9 0 0 0 1

I0 0 0 1 1 11 0 1 0 1 12 0 1 1 0 13 0 1 I 1 14 1 0 0 1 15 1 0 1 0 16 1 1 0 0

model RT distributions as a given and focused on the EGCM-RT's ability to do the same. As in Experiment 1, stimuli with four binary and separable dimensions were used.

To compare predicted and observed RT distributions with suf- ficient power, large numbers of observations are needed. These can be obtained by testing a large number of participants and then using a procedure such as Vincent averaging to obtain group distributions (see, e.g., Nosofsky & Palmeri, 1997a; Ratcliff, 1979). Because the EGCM-RT is intended as a model of individual-participant RTs, I chose instead to obtain a fairly large number of observations from 2 participants only and attempted to model the distributions obtained from each participant.

M e ~ o d

Participants. Two undergraduate students participated in this experi- ment. They were each paid £10 (approximately $15). Although they had participated in experiments with RT registration before, neither of them had previously taken part in a categorization study.

Apparatus and stimuli. The same computer equipment was used as that in Experiment 1. Again, the stimuli were rendered images of table lamps, as shown in Figure 9.

Design and procedure. The experiment consisted of a training stage, followed by four transfer stages. The structure of the stimuli presented in the training stage is shown in Table 12. The category structure was different from that used in Experiment 1. There was no particular reason for choosing the structure shown in Table 12, apart from the fact that it was sufficiently irregular to prevent the participants from using simple classi- fication rules. The training procedure was identical to that used in Exper- iment 1, with immediate correct-incorrect feedback given after every trial. Training continued until the participants obtained a perfect score on four successive blocks of 8 trials.

In the transfer stages, the 16 stimuli that could be constructed from the four binary dimensions were presented repeatedly for speeded classifica- tion (see Table 12). Each transfer stage consisted of 320 trials (presented in random order), with each transfer stimulus presented 20 times. Two transfer sessions were held on the 1st day of testing, and the f'mal two

Page 22: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

248

Table 13 Mean Response Times (RTs; in Milliseconds) and 95% Confidence Limits (CLs) in Experiment 2

Participant 1 Participant 2

Stimulus RT 95% CLs RT 95% CLs

1 831 ±56 831 ±65 2 793 ±48 784 ±50 3 954 ±57 786 ~64 4 862 ±67 820 ±64 5 822 ±58 799 ±35 6 833 ±60 805 ±44 7 803 ±53 737 ±43 8 1,079 ±81 882 ±55 9 885 ±70 947 ±64

10 863 ±64 747 ±45 11 878 ±68 799 ±55 12 839 ±61 703 ±33 13 868 ±71 789 ±60 14 833 ±57 818 ±36 15 944 ±70 693 ±30 16 969 ±89 824 ±56

sessions were held the next day. There was an interval of at least 2 hr between sessions. Because there were four transfer sessions, each partici- pant classified each transfer stimulus a total of 80 times. No feedback was provided in the transfer stages.

Results and Discussion

Because all RTs were faster than 4,000 ms, no outliers were removed, and the data from all trials were analyzed. The mean transfer RTs for each participant are shown in Table 13, and the choice proportions are listed in Table 14. The choice proportions showed some remarkable differences between the participants. Participant 1 consistently classified Stimuli 14, 15, and 16 (none of which were presented in the training stage) into Category B, whereas Participant 2 strongly preferred Category A for these stimuli. Otherwise, the participants tended to make similar choices, although Participant 2 was generally more consistent than Participant 1.

ANOVAs on the RTs yielded reliable main effects of stimulus for Participant 1, F(15, 1264) = 5.197, p < .001, MSE = 85,611, and for Participant 2, F(15, 1264) = 5.863, p < .001, MSE = 53,235. The 95% confidence limits on the RTs are also shown in Table 13. Although the participants' RTs were in the same range, there were some marked differences in the RT patterns between the participants. For instance, Participant 2 responded fastest to Stimulus 15, whereas Participant 1 produced relatively slow responses to that stimulus. Also, Participant 1 responded significantly slower to Stimulus 8 than to any other stimulus, but Participant 2 did not respond particularly slowly to that stimulus.

The crucial question in this experiment was whether the EGCM-RT could account for the RT distributions produced by the 2 participants. As a first step in the analysis of the RT distributions, the RTs for every participant-stimulus combination were divided into 10 decile bins (each containing eight RTs). The mean RT in each bin was computed. Next, the EGCM-RT was fitted jointly to these decile means (henceforth called "deciles" for short) and the choice proportions, separately for each participant.

LAMBERTS

For each stimulus, the EGCM-RT was used in a Monte Carlo procedure to generate 1,000 RTs and a choice proportion. As for the modeling of Experiment 1, it was assumed that each stimulus dimension consisted of one information element. The same deciles as for the observed RTs were then computed from each set of 1,000 RTs generated by the model, and the squared deviations between observed and predicted deciles were calculated. The over- all measure of goodness of fit was the sum of R 2 for the RT deciles and R 2 for the choice proportions.

For each participant, 12 model parameters were estimated. For each stimulus dimension, a processing rate (q) was estimated. There were 3 utility parameters (the 4th was constrained by the other 3), and values were estimated for c, b, and 0. The residual time (which included the delay in onset of stimulus processing and the time for response preparation and execution) was assumed to be a normally distributed random variable, with estimated mean residual time tre s and estimated standard deviation SD~s. The residual time on any simulated trial was a random sample from this distribution.

For Participant 1, the EGCM-RT accounted for 90.2% of the variance in the RT deciles and for 96.0% of the variance in the proportions of Category A responses. The observed and predicted choice proportions are shown in Table 14, and the estimated parameter values for this participant are shown in Table 15. Figure 12 shows the quanti le-quanti le (QQ) plots for the observed dis- tributions and the distributions produced by the EGCM-RT. A QQ plot compares the quantiles from two samples. If the two samples are drawn from the same distribution, the QQ plot is expected to be close to a straight line (shown in each graph in Figure 12). For most stimuli, there was a close correspondence between observed and predicted RT distributions. To confirm this statistically, I carried out separate Kolmogorov-Smirnov tests of goodness of fit on the observed and predicted distributions, with a groupwise error rate of .05. The only reliable discrepancy between an observed and a predicted distribution occurred for Stimulus 2. For this stimulus, the EGCM-RT's distribution had a longer tail than the observed

Table 14 Observed and Predicted Proportions of Category A Responses in Experiment 2

Participant 1 Participant 2

Stimulus Predicted Observed Predicted Observed

1 .91 .91 1.00 1.00 2 .94 .91 1.00 1.00 3 .94 .79 1.00 1.00 4 .76 .68 .98 .95 5 .00 .00 .00 .00 6 .00 .01 .00 .01 7 .00 .00 .00 .00 8 .00 .20 .01 .05 9 .88 . 9 8 .99 .93

10 .02 .01 .05 .01 11 .03 .03 .05 .00 12 .01 .10 .01 .00 13 .00 .06 .00 .00 14 .06 .01 .93 1.00 15 .08 .08 .91 1.00 16 .06 .06 .88 .98

Page 23: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 249

Table 15 Best Fitting Model Parameters for the Extended Generalized Context Model for Response Times, as Applied to Data From Experiment 2

Parameter Participant 1 Participant 2

q (base) 0.0054 0.0095 q (upright) 0.0049 0.0047 q (shade) 0.0041 0.0058 q (top) 0.0045 0.0057 0 7.676 10.003 tr~ s 523.3 478.1 SDr~ s 27.278 17.946 u (base) .237 .269 u (upright) .253 .255 u (shade) .311 .260 u (top) .199 .216 c 10.458 10.512 b .327 .503

Note. See the text for an explanation of 0. q = inclusion rate; t~s = residual time; u = utility value; c = generalization value; b = bias.

distribution. It is not clear what may have caused this difference between the observed and predicted distributions. A similar but nonsignificant discrepancy occurred for Stimulus 1. The observed distribution had a somewhat longer tail than the predicted distri- bution for Stimuli 8, 13, 15, and 16. Generally, however, the model appeared to provide a satisfactory account of the shape and loca- tion of the RT distributions. The predicted distributions for Stim- uli 3, 4, 5, 6, 7, 9, 10, 11, 12, and 14 were almost identical to the observed distributions.

For Participant 2, the EGCM-RT accounted for 89.1% of the variance in the RT quantiles and for 99.1% of the variance in choice proportions for the estimated parameter values shown in Table 15. The choice proportions are listed in Table 14. Again, the QQ plots (see Figure 13) confirmed the model's ability to predict the general shape of RT distributions. Kolmogorov-Smirnov tests revealed significant deviations between observed and predicted distributions for Stimuli 9 and 15. For Stimulus 9, the EGCM-RT predicted RTs that tended to be faster than the observed RTs, whereas the opposite occurred for Stimulus 15. For the other stimuli, there were few systematic deviations between observed and predicted deciles. Therefore, the data from Participant 2 pro- vide further evidence of the EGCM-RT's ability to predict RT distributions for individual stimuli.

The parameter values in Table 15 showed no unusual patterns. The values of all parameters were fairly similar for both partici- pants, except for the bias parameter (b). Participant 1 had a strong bias to produce Category B responses. This bias could explain the difference between the participants in the choice proportions for Stimuli 14, 15, and 16 (note how the EGCM-RT accounts very well for this difference).

The main conclusion from Experiment 2 is that the EGCM-RT can account for the RT distributions in the classification of mul- tidimensional stimuli. The distributions generated by the model had the same shape as the observed distributions. The analysis of the RT distributions thus provides further support for the model.

Exper iment 3

The purpose of Experiment 3 was to investigate yet another characteristic of categorization data that can potentially provide a critical test of the EGCM-RT. In the introduction, I already indi- cated how the data from experiments with response deadlines or response signals have supported the EGCM-RT's assumptions about information accumulation in the earliest stages of perceptual categorization. In these experiments, the deadlines o r signals served to produce a wide range of RTs, with different sets of data corresponding to different deadlines or signal intervals. The choice probabilities within each set were computed, and different models were tested on their ability to predict the complex relations that emerged between RT and choice probabilities.

The purpose of Experiment 3 was to collect information about the relation between categorization RT and choice proportions within stimuli without manipulating the time available on different trials. Instead, choice proportions were computed for different response latencies, using only the normally occurring variation in RTs between trials. For each participant and each stimulus in the transfer stage of the experiment, a latency-choice curve (LCC) was constructed. The individual trials for each stimulus were sorted by RT and divided into bins with equal numbers of trials, such that the first bin contained the fastest trials and the last bin the slowest trials. Within each bin, the average RT and the proportion of Category A responses were computed. LCCs were then obtained by plotting these choice proportions against the mean RTs in the corresponding bins.

As in Experiments 1 and 2, the stimuli in Experiment 3 were table lamps with four dimensions. The category structure in Ex- periment 3 is shown in Table 16. Five stimuli belonged to Cate- gory A, and four stimuli belonged to Category B. Of particular interest here is Stimulus 5, which belonged to Category A. Stim- ulus 5 was an exception within its category, because this stimulus actually corresponded to the multimodal prototype of the opposite category and was very dissimilar to the other exemplars from its own category. The EGCM-RT makes strong predictions about the shape of the LCCs for the different stimuli in this experiment. For Stimulus 5, the model predicts that there should be a strong positive relation between RT and accuracy. This stimulus will be categorized with perfect accuracy only if all of its features have been processed. If feature sampling is interrupted before all of the features have been processed, the partial representation of Stimu- lus 5 is identical to the partial representation of one or more of the exemplars in Category B, which can lead to an incorrect response. As an example, assume that the participant decides to stop feature sampling on a given trial after processing the base, shade, and top dimensions of Stimulus 5 (see Table 16), but without processing the upright dimension. This partial stimulus representation is iden- tical to the corresponding partial representation of Exemplars 5 and 8, which belong to Categories A and B, respectively. This ambiguity will lead to less accurate responses than in a case in which all stimulus dimensions were processed. And if a participant decides to stop sampling even sooner, for instance, after process- ing only the base and the upright dimensions, the probability of an incorrect Category B response to Stimulus 5 increases still further. In fact, the EGCM-RT can predict a category crossover effect for Stimulus 5, in which very fast responses produce a majority of Category B choices, whereas slower responses are associated with

Page 24: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

2 5 0 L A M B E R T S

~, ~ o ° ~ o

'11'4,

~ ~ ~ o

~ ~ ~ o ~ ~ o

o o ~ ~

o o

Q

Q

O C'

° ~ ' Z

Q O

o ~

o . ~

Page 25: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 251

~ ~ ~ o o

~ ~ ~ o

%

o o o o ~ o o °

~ ~ ~ o

Page 26: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

2 5 2 LAMBERTS

~ ~ ~ o

~ o o ~ ~ ~ o ~ to

_ _ _ m

o

~ ~ ~ o

r . . . . . . . , , ,

,fi

o o

o r~

O o e ~

o ~

m

Page 27: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 253

O

~ ~ ~ o

oo o

o

-6

o ° o

~ ~ ~ o

Page 28: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

254

Table 16 Category Structure in Experiment 3

Dimension

Category and stimulus Base Upright Shade Top

A 1 0 0 0 1 2 0 0 1 0 3 0 1 0 0 4 1 0 0 0 5 1 1 1 1

B 6 1 1 1 0 7 1 1 0 1 8 1 0 1 1 9 0 1 1 1

a majority of Category A choices. For the other stimuli in Category A (Stimuli 1 - 4 in Table 16), the EGCM-RT predicts a weaker relation between RT and accuracy than for Stimulus 5. The regular members of Category A can be categorized accurately even if they are processed incompletely. For the stimuli in Category B, the model also predicts a weaker relation between speed and accuracy than for Stimulus 5, but because partial representations of these stimuli can be confused with Exemplar 5, the relation should be somewhat stronger than for the regular stimuli from Category A.

The category structure of Experiment 3 has been used before in a study of categorization under time pressure (Lamberts & Free- man, 1999a) and in an experiment on categorization of briefly presented objects (Lamberts & Freeman, 1999b). In both experi- ments, category crossovers occurred for Stimulus 5, with a major- ity of B responses at short response-signal intervals (Lamberts & Freeman, 1999a) or at the shortest exposure durations (Lamberts & Freeman, 1999b). Whereas these results provide strong support for the EGCM-RT' s assumptions about the role of feature sampling in the earliest stages of perceptual categorization, the question re- mains whether such effects can also be observed if the duration of feature sampling is not directly manipulated but rather is left to the participants' own judgment. Experiment 3 aimed to answer this question and thereby provide another critical test of the EGCM-RT as a model of categorization RT.

Although I do not apply the EBRW to the results from this experiment (which involves separable stimuli and is therefore outside of the model ' s scope), it is interesting to consider the E B R W ' s predictions about the relation between RT and accuracy within stimuli. If the response criteria (and other model parame- ters) remain constant across trials, the EBRW predicts no relation between RT and accuracy within stimuli. According to the model, fast and slow responses should be equally accurate under those conditions. Nevertheless, random-walk models such as the EBRW can predict various relations between speed and accuracy of re- sponses by assuming variability in the random-walk rate parame- ters across trials, which produce slow errors and fast correct responses, or variability in the response-criterion settings, which yield fast errors and slow correct responses (Nosofsky & Alfonso- Reese, 1999; Ratcliff, Van Zandt, & McKoon, 1999). However, variability in criterion settings is not sufficient to predict a cate- gory crossover effect for Stimulus 5, as predicted by the

LAMBERTS

EGCM-RT and observed in previous studies. Although the re- sponse criteria may change between trials, the variables that drive the random walk (i.e., the variables that determine similarity to exemplars in memory) do not change with processing time, and this implies that the average pressure on the walk is either consis- tently toward A or consistently toward B for a given set of parameters and stimuli. Although fast responses might be accurate only at chance level (i.e., 50% A responses and 50% B responses), accuracy should never descend below that level. The only way in which a random-walk model such as the EBRW could predict systematic crossover effects is by assuming that the parameters that drive the random walk itself change in the course of process- ing. For instance, if the value of the discriminability parameter (c) increases with processing time, a category crossover for Stimulus 5 might be predicted by the EBRW. Although this cannot be ruled out as a possibility, the resulting model would be much more complex than the standard EBRW.

M e ~ o d

Participants. Two graduate students (1 female and 1 male) from the University of Birmingham participated in this experiment. They were each paid £10 (approximately $15). Both participants had experience with RT experiments, but neither of them had previously taken part in a categori- zation study.

Apparatus and stimuli. The same computer equipment was used as that in Experiments 1 and 2. Again, the stimuli were realistic, rendered images of table lamps, as shown in Figure 9.

Design and procedure. The experiment consisted of training and transfer stages. The category structure in the training stages is shown in Table 16. Five stimuli belonged to Category A, and four stimuli belonged to Category B. As indicated, Stimulus 5 was very different from the other members of Category A and formed an exception within that category.

The experiment consisted of four sessions held on successive days. Each session started with a training stage, in which the nine stimuli were presented sequentially for categorization in random order and in which immediate correct-incorrect feedback was provided. Training continued until the participant made no errors in five successive blocks of nine trials. Each training stage was immediately followed by a transfer stage, in which the nine stimuli were presented 50 times each in completely randomized order. Because there were four sessions, each participant categorized each stimulus a total of 200 times. For the transfer stages, the participants were instructed to categorize the stimuli as quickly and accurately as possible. No feedback was provided in the transfer stages.

Results and Discussion

Table 17 shows the mean transfer RTs and their 95% confidence limits for both participants. The data from all trials were analyzed. ANOVAs on the RTs yielded reliable main effects of stimulus for Participant 1, F(8, 1791) = 30.74, p < .001, MSE = 53,066, and for Participant 2, F(8, 1791) = 43.06, p < .001, MSE = 34,501. The RT data showed that both participants responded fastest to Stimuli 1-4, which were the "regular" members of Category A. Responses to Stimulus 5, which was the exception in Category A, were considerably slower than responses to the other stimuli in Category A. The choice proportions, which are also shown in Table 17, followed the same pattern as the RTs. Responses to Stimulus 5 were less consistent than those to the other stimuli in Category A, and responses to the stimuli in Category B were generally less consistent than those to the regular members of Category A.

Page 29: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 255

Table 17 Mean Response Times (RTs; in Milliseconds) and Proportions of Category A Responses in Experiment 3

Participant 1 Participant 2

Stimulus RT 95% CL P(A) RT 95% CL P(A)

1 531 ±15.20 .97 473 ±12.67 .98 2 612 ±25.52 .98 537 ±19.09 .75 3 628 ±31.58 .98 506 ±17.86 .96 4 546 ±14.78 .98 472 ±11.42 .99 5 721 ±34.62 .83 709 ±35.71 .56 6 743 ±34.91 .04 642 ±31.14 .07 7 789 ±45.92 .14 633 ±35.74 .31 8 672 ±33.62 .09 583 ±22.73 .28 9 731 ±39.14 .07 655 ±31.99 .13

Note. 95% CL = 95% confidence limits for RTs.

For each participant and each stimulus, an LCC was constructed as follows. First, the 200 individual trials for each stimulus were sorted by RT and divided into five bins of 40 trials each. Within each bin, the average RT and the proportion of Category A responses were computed. LCCs were then obtained by plotting these choice proportions against the mean RTs in the correspond- ing bins.

The EGCM-RT was used in separate Monte Carlo simulations for each participant. Per stimulus, 1,000 experimental trials were simulated, producing RTs and category choices. These were used to construct predicted LCCs. The model assumptions were iden- tical to those for Experiment 2, and the same 12 parameters were estimated. Again, the criterion for model optimization was the summed R 2 for RTs and choice proportions (both computed across the nine stimuli by five bins, which equals 45 data points per participant).

The observed and predicted LCCs for Participant 1 are shown in Figure 14. The observed LCCs showed considerable differences between stimuli. For Stimuli 1-4, which were the regular members of Category A, accuracy was always high, even on trials with relatively fast RTs. Stimulus 5, the exception in Category A, yielded a very different pattern, with accurate responses on slow trials but chance-level responses on the fastest trials. The stimuli from Category B also showed strong negative relations between response speed and accuracy. Overall, the EGCM-RT predicted these effects well. The model accounted for 93.3% of the variance in the mean RTs across stimuli and bins and for 98.1% of the variance in the choice proportions. For Stimuli 1-4, the model correctly predicted only minor effects of response speed on accu- racy. For Stimulus 5, the model predicted the strong relation between RT and accuracy well. The strong effect for this stimulus confirmed the main qualitative prediction from the EGCM-RT, and the model fitting confirmed the model 's ability to handle this result. The EGCM-RT also performed well on the stimuli from Category B. The model predicted fairly strong negative correla- tions between speed and accuracy for these stimuli because they were nearly identical to Stimulus 5 from Category A. In fact, the stimuli from Category B shared more features in common with Stimulus 5 (three) than with each other (two). Therefore, on trials in which participants responded quickly, without processing all of the features of the stimuli from Category B, confusion with Stim-

ulus 5 was bound to lead to less accurate responding. The best fitting parameter values are shown in Table 18. The processing rates of the four dimensions were quite similar, but the top dimen- sion was processed more slowly than the other dimensions. The top dimension also had a relatively low utility value. The values for the other parameters were similar to those estimated for the previous experiments.

The LCCs for Participant 2 are shown in Figure 15. Although the data from this participant showed the same trends as those from Participant 1, there were some differences. On Stimulus 2, Partic- ipant 2 responded less accurately than Participant 1. The observed LCCs for Stimuli 5 and 7 showed a category crossover effect. On Stimulus 5, Participant 2 gave a majority of Category B responses on trials with fast RTs but consistently chose Category A on trials with slower RTs. This effect was strong, and it confirmed the EGCM-RT's a priori predictions about the relation between speed and accuracy for this stimulus. An almost equally strong relation between RT and accuracy occurred for Stimulus 7.

The EGCM-RT accounted for 90.6% of the variance in mean RTs and for 83.0% of the variance in the choice proportions. The estimated parameter values (shown in Table 18) were similar to those estimated for Participant 1. The model correctly predicted the high accuracy of responses to Stimuli 1, 3, and 4, regardless of RT. However, the model failed to explain the relatively high number of errors on Stimulus 2. The category crossover for Stim- ulus 5 was fairly well accounted for, although the model predicted more correct responses than was observed on trials with relatively fast RTs. The crossover effect for Stimulus 7 is rather surprising, and it is not completely explained by the EGCM-RT. The model does predict that accuracy should be close to chance level on the fastest trials for this stimulus, but it does not predict a category crossover. In fact, the crossover for Stimulus 7 is counterintuitive and suggests that this participant used a categorization strategy that was more complex than postulated in the EGCM-RT. It is difficult to understand how incomplete processing of Stimulus 5 could have led to a majority of Category B responses, whereas incomplete processing of Stimulus 7 could have produced mainly Category A responses. It is as if the participant was sometimes aware that processing was incomplete when she initiated a fast response and compensated for this by assigning the stimulus to the opposite category from the one suggested by the stimulus information. Still, it is unclear why she would apply such a strategy only with Stimulus 7.

The general conclusion from the modeling of the LCCs is that the EGCM-RT accounts for the main qualitative differences be- tween the stimuli. Although the model 's predictions are not always in exact agreement with the data, the model fits are sufficiently close to conclude that the data support the EGCM-RT's assump- tions about the role of feature sampling in producing differences in RTs and accuracy. The data also confirm the results from previous studies using response signals (Lamberts & Freeman, 1999a) or short exposure durations (Lamberts & Freeman, 1999b), in which category crossover effects were observed for stimuli that were exceptions within their category.

Conc lus ion

The main conclusion from the studies that were reviewed and the model applications that were carded out is that an information-

Page 30: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

2 5 6 LAMBERTS

o

t -o

-o ,~r

_== c~

I I I I I I I I I I

oddoooddd

== c~

]

.==

== c~

.o o ~0

-° o

~ 1 1 1 1 1 1 1 1 1 o

ododd6ddo

i o

I ~ 1 I I I I I I t I o

o

o

o

D

~ 1 1 1 1 1 1 1 1 1

~ l l l J I I I I I

i I I I i t 1 1 1 1 1

/

I I t I q - - - - % I I I I--

D

t

I I I I ~ - o

~5

"5

o

Page 31: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 257

Table 18 Best Fitting Parameter Values for the Extended Generalized Context Model for Response Times, as Applied to Data From Experiment 3

Parameter Participant 1 Participant 2

q (base) 0.0070 0.0057 q (upright) 0.0077 0.0071 q (shade) 0.0071 0.0067 q (top) 0.0038 0.0054 0 6.195 4.781 tr~s 408.2 358.2 SDr~s 25.002 22.139 U (base) .134 .183 u (upright) .374 .382 u (shade) .317 .388 u (top) .175 .047 c 12.145 12.157 b .535 .521

Note. See the text for an explanation of 0. q = inclusion rate; tre s = residual time; u = utility value; c = generalization value; b = bias.

accumulation model such as the EGCM-RT provides an alternative for existing models of RTS in speeded categorization tasks. The EGCM-RT provides a good account of choice proportions and RTs in a variety of experiments by attributing all RT differences to the duration of perceptual processing. The model proved sufficiently powerful to explain all of the relevant trends in the data that were modeled. The EGCM-RT's achievements can be summarized as follows:

1. The model explains the negative relation between RT and distance from the decision bound in probabilistic categorization experiments (Ashby et al., 1994).

2. The EGCM-RT accounts for the individual RTs in speeded categorization experiments with integral-dimension stimuli (Nosofsky & Palmeri, 1997b, Experiment 1) and for the effects of individual stimulus frequency on RT (Nosofsky & Palmeri, 1997b, Experiment 2). The EGCM-RT's goodness of fit in these studies is similar to that of the EBRW.

3. The EGCM-RT predicts power-law speedups with practice. 4. The EGCM-RT predicts mean RTs and choice proportions for

individual stimuli with separable dimensions (Experiment I). 5. The EGCM-RT correctly predicts RT distributions (Experi-

ment 2) and LCCs (Experiment 3) with separable-dimension stimuli.

6. The EGCM-RT explains the results from categorization stud- ies with response deadlines (Lamberts, 1995; Lamberts & Brock- dorff, 1997), response signals (Lamberts, 1998; Lamberts & Free- man, 1999a), and short exposure durations (Lamberts & Freeman, 1999b).

The main advantage of the EGCM-RT, compared with the EBRW and the RT-distance model, is its ability to explain how the accumulation of perceptual information can affect various aspects of categorization RT data. Therefore, the EGCM-RT has a broader scope than either alternative, without being excessively complex and without sacrificing accuracy of prediction to generality.

Although the EGCM-RT has been tested only with artificial objects with just a few well-defined dimensions, the results that

were presented have implications for the categorization or identi- fication of natural objects. The notion that object recognition involves gradual accumulation of perceptual information and con- tinuous evaluation of the relation between the perceptual repre- sentation and exemplars in memory might be useful as part of a general theory of object recognition. However, many important issues need to be resolved before this idea can be considered seriously. For example, what is the nature of the information elements that are presumably sampled in the earliest stages of categorization of natural objects? In all experiments that were reviewed or presented thus far, the stimulus dimensions were well defined, and their values could be determined quite easily. But what are the dimensions of natural objects that could define or constrain the units of information accumulation? One possibility is that objects are processed as if they are composed from a limited set of primitive elements or parts (e.g., the geons in Biederman's, 1987, recognition-by-components theory of object recognition) that can be recovered on the basis of nonaccidental properties that remain constant across viewpoints. Perceptual processing could then involve gradual accumulation of information about object parts or components (Lamberts & Freeman, 1999a). Although such a view is generally compatible with the EGCM-RT's assumptions, it is not universally accepted within the object-recognition litera- ture that parts (always) form the basis of object recognition (see, e.g., Logothetis & Sheinberg, 1996; Tarr & Bulthoff, 1995). More- over, even if parts are used in recognition, the question remains how a parts-based representation can be translated into the dimension-based representation that is assumed in models like the EGCM-RT and the EBRW. Objects differ widely in the number of parts they possess, and relations between parts are often crucial for establishing object identity. Therefore, it seems unlikely that a representation scheme based on a fixed number of dimensions will suffice to capture the complexity of representations of natural objects. This is clearly an issue that merits further research, but it should be pointed out that the EGCM-RT's assumptions about information accumulation in perceptual categorization can be pre- served within alternative representation schemes that may not be based on a strict dimensional representation of stimuli.

Research into categorization RT has taken off only recently, and it is clear that many questions remain unanswered. A first issue, which relates specifically to the EGCM-RT, concerns the assump- tion that elements from separable dimensions are processed inde- pendently. Although this assumption seems justified in some tasks and with certain stimuli (e.g., Lamberts & Freeman, 1999a; Townsend, Hu, & Ashby, 1981), there is considerable evidence in the literature to suggest that independence does not always hold (e.g., Townsend & Ashby, 1982; Townsend et al., 1984). For instance, Sanocki (1991, 1993) showed that stimulus information that is extracted early in processing can modify later phases of object recognition. Such a contingency-based processing scheme reduces the computational burden on the system. A processing system that could implement time-course contingencies would have to be far more active than the EGCM-RT's passive feature- collection mechanism. Also, Busey and Loftus (1994) showed that sampling rates of features can depend on the amount of stimulus information that has already been processed. It is clear that there are many possible alternatives to the simple independent-sampling assumption of the EGCM-RT, and these will have to be considered in the future.

Page 32: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

258 LAMBERTS

o ° o

_' . ' . ' L.~ . I .~ L.~ o . . . . . . . . = . °

o

~o

I I I I I I I I I I i o

~ D i o o o

- i ~ o

io

o

- i

I I I I I I I I I i o

o o

i

~0

l t l l l l l l l l l o

o o d d d o d d o

o o CN

o

o

m M" ¢/i

I I I I I I I I I I I O

d d d d d d d d d

L D

!

!

o

o - o

I I I I I I I I I I o

Fl

o o F

I o ~ o o

I 0

CO

L o o

r o

- o

I I I I I i I I I o

d d d d d d d d d

o

!

~o

- i

_ o o ,~r

- i

I I I P I I I I I I o

i" £ =! "0 !=

• I !

!=

~J

"0

~J

0

Page 33: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

THEORY OF SPEEDED CATEGORIZATION 259

A second issue concerns the role of capacity limitations in the dimension-sampling process. Evidence from visual-search and partial-report tasks suggests that the storage capacity of visual short-term memory can be limited to as little as four or five elements (e.g., Bundesen, 1987, 1990; Bundesen, Pedersen, & Larsen, 1984; Shibuya & Bundesen, 1988). As long as the number of stimulus dimensions remains within these limits, the EGCM-RT will perform as well as a limited-capacity model. However, it remains to be seen how the EGCM-RT performs with stimuli that contain more dimensions than visual short-term memory can hold simultaneously.

A final issue concerns the conceptual relation between the EBRW, the RT-distance model, and the EGCM-RT. From the model applications in this article and the research reported by Nosofsky and Palmeri (1997a, 1997b), it is clear that a differen- tiation between these models is not easily achieved because they can make similar predictions in many circumstances. The relation between the EBRW and the EGCM-RT is of particular interest here because both models are based on the assumptions of the GCM. As I pointed out already, perhaps the main strength of the EGCM-RT, as compared with the EBRW, is that the EGCM-RT explains phenomena that are almost certainly related to the acqui- sition of perceptual information, which is beyond the scope of the EBRW. Alternatively, the EBRW uses an elegant and well- understood random-walk mechanism for decision making, which has a long history of success in RT research in a variety of areas (see, e.g., Luce, 1986; Ratcliff et al., 1999). Therefore, it may well be worthwhile to consider a combination of the strong elements from both models into a unified hybrid model of the time course of categorization. In such a model, it could be assumed that the exemplar retrieval process that drives the random walk (according to the principles of the EBRW) is continuously revised in light of the incoming perceptual information (the principles of which are given by the EGCM-RT). Whether such a hybrid account will be superior to its constituent models remains to he seen, but this certainly seems a promising line for further research. One possible drawback of a hybrid model could be that it loses some of the simplicity and tractability of the EGCM-RT and the EBRW (closed-form expressions for RTs and choice are difficult to derive for a random-walk model with complex rate functions), but if a hybrid model offers a truly superior fit to the data, this may not be of primary concern.

References

Ashby, F. G., Boynton, G., & Lee, W. W. (1994). Categorization response time with multidimensional stimuli. Perception & Psychophysics, 55, 11-27.

Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and classification of multidimensional stimuli. Journal of Experimental Psy- chology: Learning, Memory, and Cognition, 14, 33-53.

Ashby, F. G., & Lee, W. W. (1991). Predicting similarity and categoriza- tion from identification. Journal of Experimental Psychology: General, 120, 150-172.

Ashby, F. G., & Maddox, T. W. (1992). Complex decision rules in categorization: Contrasting novice and experienced performance. Jour- nal of Experimental Psychology: Human Perception and Perfor- mance, 18, 50-71.

Ashby, F. G., & Maddox, T. W. (1993). Relations between prototype, exemplar and decision bound models of categorization. Journal of Mathematical Psychology, 37, 372-400.

Ashby, F. G., & Maddox, T. W. (1994). A response time theory of separability and integrality in speeded classification. Journal of Mathe- matical Psychology, 38, 423-466.

Ashby, F. G., & Townsend, J. T. (1986). Varieties of perceptual indepen- dence. Psychological Review, 93, 154-179.

Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-147.

Bundesen, C. (1987). Visual attention: Race models for selection from multielement displays. Psychological Research, 49, 113-121.

Bundesen, C. (1990). A theory of visual attention. Psychological Re- view, 97, 523-547.

Bundesen, C., Pedersen, L. F., & Larsen, A. (1984). Measuring efficiency of selection from briefly exposed visual displays: A model for partial report. Journal of Experimental Psychology: Haman Perception and Performance, 10, 329-339.

Busey, T. A., & Loftus, G. R. (1994). Sensory and cognitive components of visual information acquisition. Psychological Review, 101, 446-469.

Eriksen, C. W., & Schultz, D. W. (1977). Retinal locus and acuity in visual information processing. Bulletin of the Psychonomic Society, 9, 81-84.

Estes, W. K. (1994). Classification and cognition. New York: Oxford University Press.

Garner, W. R. (1974). The processing of information and structure. New York: Wiley.

Gaff, I., & Tversky, A. (1982). Representations of qualitative and quanti- tative dimensions. Journal of Experimental Psychology: Human Per- ception and Performance, 8, 325-340.

Gluck, M. A., & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychol- ogy: General, 117, 225-244.

Hoffman, D. D., & Singh, M. (1997). Salience of visual parts. Cogni- tion, 63, 29-78.

Jolicoeur, P., Gluck, M. A., & Kosslyn, S. M. (1984). Pictures and names: Making the connection. Cognitive Psychology, 16, 243-275.

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.

Lamberts, K. (1994). Flexible tuning of similarity in exemplar-based categorization..Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1003-1021.

Lamberts, K. (1995). Categorization under time pressure. Journal of Ex- perimental Psychology: General, 124, 161-180.

Lamberts, K. (1998). The time course of categorization. Journal of Exper- imental Psychology: Learning, Memory, and Cognition, 24, 695-711.

Lamberts, K., & Brockdorff, N. (1997). Fast categorization of stimuli with multivalued dimensions. Memory & Cognition, 25, 296-304.

Lamberts, K., & Freeman, R. P. J. (1999a). Building object representations from parts: Tests of a stochastic sampling model. Journal of Experimen- tal Psychology: Human Perception and Performance, 25, 904-926.

Lamberts, K., & Freeman, R. P. J. (1999b). Categorization of briefly presented objects. Psychological Research, 62, 107-117.

Lassaline, M. E., Wisniewski, E. J., & Medin, D. L. (1992). Basic levels in artificial and natural categories: Are all basic levels created equal? In B. Burns (Ed.), Percepts, concepts and categories (pp. 327-378). Amster- dam: Elsevier Science.

Lockhead, G. R. (1972). Processing dimensional stimuli: A note. Psycho- logical Review, 79, 410-419.

Logan, G. D. (1988). Toward an instance theory of automatization. Psy- chological Review, 95, 492-527.

Logan, G. D. (1992). Shapes of reaction time distributions and shapes of learning curves: A test of the instance theory of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 883- 914.

Logothetis, N. K., & Sheinberg, D. L. (1996). Visual object recognition. Annual Review of Neuroscience, 19, 577-621.

Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush,

Page 34: Information-Accumulation Theory of Speeded Categorization · 2020. 1. 22. · Information-Accumulation Theory of Speeded Categorization Koen Lamberts University of Warwick A ... aij

260 LAMBERTS

& E. Galanter (Eds.), Handbook of mathematical psychology (pp. 103- 189). New York: Wiley.

Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. Oxford, England: Oxford University Press.

Maddox, W. T., & Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization. Perception & Psychophysics, 53, 49-70.

Maddox, W. T., & Ashby, F. G. (1996). Perceptual separability, decisional separability, and the identification-speeded classification relationship. Journal of Experimental Psychology: Human Perception and Perfor- mance, 22, 795-817.

McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human Perception and Perfor- mance, 21, 128-148.

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.

Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 1-55). Hillsdale, NJ: Erlbaum.

Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 104-114.

Nosofsky, R. M. (1986). Attention, similarity, and the identification- categorization relationship. Journal of Experimental Psychology: Gen- eral, 115, 39-57.

Nosofsky, R. M. (1987). Attention and learning processes in the identifi- cation and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 87-109.

Nosofsky, R. M. (1991). Tests of an exemplar model for relating perceptual classification and recognition memory. Journal of Experimental Psy- chology: Human Perception and Performance, 17, 3-27.

Nosofsky, R. M., & Alfonso-Reese, L. (1999). Effects of similarity and practice on speeded classification response times and accuracies: Further tests of an exemplar-retrieval model. Memory & Cognition, 27, 78-93.

Nosofsky, R. M., Kruschke, J. K., & McKinley, S. C. (1992). Combining exemplar-based category representations and connectionist learning rules. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 211-233.

Nosofsky, R. M., & Palmed, T. J. (1997a). Comparing exemplar-retdeval and decision-bound models of speeded perceptual classification. Per- ception & Psychophysics, 59, 1027-1048.

Nosofsky, R. M., & Palmed, T. J. (199To). An exemplar-based random walk model of speeded classification. Psychological Review, 104, 266-300.

Nosofsky, R. M., Palmed, T. J., & McKinley, S. C. (1994). Rule-plus- exception model of classification learning. Psychological Review, 101, 53-79.

Palmed, T. J. (1997). Exemplar similarity and the development of auto- maticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 324-354.

Ratcliff, R. (1979). Group reactio n time distributions and an analysis of distribution statistics. Psychological Bulletin, 86, 446-461.

Ratcliff, R., & Murdock, B. B., Jr. (1976). Retrieval processes in recog- nition memory. Psychological Review, 83, 190-214.

Ratcliff, R., Van Zandt, T., & McKoon, G. (1999). Connectionist and diffusion models of reaction time. Psychological Review, 106, 261-300.

Rosch, E. (1973). Natural categories. Cognitive Psychology, 4, 328-350. Sanocki, T. (1991). Effects of early common features on form perception.

Perception & Psychophysics, 50, 490-497. Sanocki, T. (1993). Time course of object identification: Evidence for a

global-to-local contingency. Journal of Experimental Psychology: Hu- man Perception and Performance, 19, 878-898.

Shibuya, H., & Bundesen, C. (1988). Visual selection from multielement displays: Measuring and modeling effects of exposure duration. Journal of Experimental Psychology: Human Perception and Performance, 14, 591-600.

Tarr, M. J., & Bulthoff, H. H. (1995). Is human object recognition better described by geon structural descriptions or by multiple views? Com- ment on Biederman and Gerhardstein (1993). Journal of Experimental Psychology: Human Perception and Performance, 21, 1494-1505.

Townsend, J. T., & Ashby, F. G. (1982). Experimental test of contempo- rary mathematical models of visual letter recognition. Journal of Exper- imental Psychology: Human Perception and Performance, 8, 834-864.

Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. New York: Cambridge University Press.

Townsend, J. T., Hu, G. G., & Ashby, F. G. (1981). Perceptual sampling of orthogonal straight line features. Psychological Research, 43, 259- 275.

Townsend, J. T., Hu, G. G., & Evans, R. J. (1984). Modeling feature perception in brief displays with evidence for positive interdependen- cies. Perception & Psychophysics, 36, 35-49.

Received June 20, 1997 Revision received May 15, 1999

Accepted June 18, 1999 •