perceptual effects of scene context on object identification

13
Psychol Res (1990) 52:317 - 329 Psychological Research © Springer-Verlag 1990 Perceptual effects of scene context on object identification Peter De Graef, Dominiek Christiaens, and G6ry d'Ydewalle Laboratoryof ExperimentalPsychology, University of Leuven,Tiensestraat102,B-3000 Leuven,Belgium ReceivedMarch 14, 1990/Accepted June 24, 1990 Summary. In a number of studies the context provided by a real-world scene has been claimed to have a mandatory, perceptual effect on the identification of individual objects in such a scene. This claim has provided a basis for chal- lenging widely accepted data-driven models of visual per- ception in order to advocate alternative models with an outspoken top-down character. The present paper offers a review of the evidence to demonstrate that the observed scene-context effects may be the product of post-perceptu- al and task-dependent guessing strategies. A new research paradigm providing an on-line measure of genuine percep- tual effects of context on object identification is proposed. First-fixation durations for objects incidentally fixated dur- ing the free exploration of real-world scenes are shown to increase when the objects are improbable in the scene or violate certain aspects of their typical spatial appearance in it. These effects of contextual violations are shown to emerge only at later stages of scene exploration, contrary to the notion of schema-driven scene perception effective from the very first scene fixation. In addition, evidence is reported in support of the existence of a facilitatory compo- nent in scene-context effects. This is taken to indicate that the context directly affects the ease of perceptual object processing and does not merely serve as a framework for checking the plausibility of the output of perceptual pro- cesses. Finally, our findings are situated against other con- trasting results. Some future research questions are high- lighted. Introduction Most prominent models of visual object perception (e.g., Biederman, 1987; Marr, 1982) view the apprehension of a particular object in an image as exclusively based on a data-driven and preconceptual recovery of the object's Offprint requests to: R De Graef structural features (i.e., geons, contour segments, etc.) from the image. Research on object perception in full- scene context, however, has suggested that this view may need to be modified if one wishes to model more than the perception of unanticipated, isolated objects. Several studies recording eye-movement patterns across line drawings of natural scenes have consistently shown shorter first fixations on objects likely to appear in a given scene than on objects unlikely to be encountered in that scene (Antes & Penland, 1981; Friedman, 1979; Lof- ms & Mackworth, 1978). To the extent that first-fixation durations can be assumed to reflect object-identification time (Friedman, 1979; Henderson, Pollatsek, & Rayner, 1989), this "Probability effect" suggests that object percep- tion in real-world environments cannot be regarded as strictly data-driven and preconceptual. The question then becomes how existing models of object perception should be modified in order to account for this contextual effect. The dominant position on this issue is that they should incorporate two qualitatively different routes to object per- ception. Specifically, it has been argued (Friedman, 1979) that the perception of context-consistent objects in natural scenes involves rapid, resource-inexpensive and concept- driven feature detection, while the perception of context- inconsistent or isolated objects requires a time- and re- source-consuming data-driven feature recovery. The cen- tral assumption underlying this claim is that during the first glance at a scene a scene-specific schema (Biederman, 1981) or frame (Friedman, 1979) is automatically acti- vated. Since the schema presumably contains knowledge about the typical makeup and contents of the scene being viewed, its activation will generate expectations about what objects are likely to be present in that scene and what the typical features of these objects are. Consequently, the perception of context-consistent objects will merely re- quire detection of the features suggested by a global scene- schema, while the perception of context-inconsistent or isolated objects demands a data-driven recovery of features to be matched with a specific object representation. While it may appear that the incorporation of a schema- mediated route to object perception in data-driven models

Upload: kuleuven

Post on 10-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Psychol Res (1990) 52:317 - 329 Psychological Research © Springer-Verlag 1990

Perceptual effects of scene context on object identification

Peter De Graef, Dominiek Christiaens, and G6ry d'Ydewalle

Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium

Received March 14, 1990/Accepted June 24, 1990

Summary. In a number of studies the context provided by a real-world scene has been claimed to have a mandatory, perceptual effect on the identification of individual objects in such a scene. This claim has provided a basis for chal- lenging widely accepted data-driven models of visual per- ception in order to advocate alternative models with an outspoken top-down character. The present paper offers a review of the evidence to demonstrate that the observed scene-context effects may be the product of post-perceptu- al and task-dependent guessing strategies. A new research paradigm providing an on-line measure of genuine percep- tual effects of context on object identification is proposed. First-fixation durations for objects incidentally fixated dur- ing the free exploration of real-world scenes are shown to increase when the objects are improbable in the scene or violate certain aspects of their typical spatial appearance in it. These effects of contextual violations are shown to emerge only at later stages of scene exploration, contrary to the notion of schema-driven scene perception effective from the very first scene fixation. In addition, evidence is reported in support of the existence of a facilitatory compo- nent in scene-context effects. This is taken to indicate that the context directly affects the ease of perceptual object processing and does not merely serve as a framework for checking the plausibility of the output of perceptual pro- cesses. Finally, our findings are situated against other con- trasting results. Some future research questions are high- lighted.

Introduction

Most prominent models of visual object perception (e.g., Biederman, 1987; Marr, 1982) view the apprehension of a particular object in an image as exclusively based on a data-driven and preconceptual recovery of the object's

Offprint requests to: R De Graef

structural features (i.e., geons, contour segments, etc.) from the image. Research on object perception in full- scene context, however, has suggested that this view may need to be modified if one wishes to model more than the perception of unanticipated, isolated objects.

Several studies recording eye-movement patterns across line drawings of natural scenes have consistently shown shorter first fixations on objects likely to appear in a given scene than on objects unlikely to be encountered in that scene (Antes & Penland, 1981; Friedman, 1979; Lof- ms & Mackworth, 1978). To the extent that first-fixation durations can be assumed to reflect object-identification time (Friedman, 1979; Henderson, Pollatsek, & Rayner, 1989), this "Probability effect" suggests that object percep- tion in real-world environments cannot be regarded as strictly data-driven and preconceptual. The question then becomes how existing models of object perception should be modified in order to account for this contextual effect.

The dominant position on this issue is that they should incorporate two qualitatively different routes to object per- ception. Specifically, it has been argued (Friedman, 1979) that the perception of context-consistent objects in natural scenes involves rapid, resource-inexpensive and concept- driven feature detection, while the perception of context- inconsistent or isolated objects requires a time- and re- source-consuming data-driven feature recovery. The cen- tral assumption underlying this claim is that during the first glance at a scene a scene-specific schema (Biederman, 1981) or frame (Friedman, 1979) is automatically acti- vated. Since the schema presumably contains knowledge about the typical makeup and contents of the scene being viewed, its activation will generate expectations about what objects are likely to be present in that scene and what the typical features of these objects are. Consequently, the perception of context-consistent objects will merely re- quire detection of the features suggested by a global scene- schema, while the perception of context-inconsistent or isolated objects demands a data-driven recovery of features to be matched with a specific object representation.

While it may appear that the incorporation of a schema- mediated route to object perception in data-driven models

318

would solve the problems posed to them by the Probability effect, it remains to be seen whether this is a necessary, or even a promising, approach. This question has been raised by a number of studies that have demonstrated the exist- ence of a priming process between semantically related objects, resulting in shorter identification times for primed objects (Carr, McCauley, Sperber, & Parmelee, 1982; Hen- derson, Pollatsek, & Rayner, 1987; Huttenlocher & Kubi- cek, 1983; Kroll & Potter, 1984). Since, as Henderson et al. (1987) point out, semantically related objects also tend to appear in the same real-world settings, the Probability ef- fect may very well be a result of inter-object priming. The major advantages of this alternative to the schema account are its greater simplicity and its compatibility with existing data-driven models of object perception. Rather than hav- ing to postulate a complex model with two qualitatively different routes to object perception, one merely needs to assume that data-driven access to an object representation primes related object representations, thus reducing the thresholds for establishing a match between them and the object features recovered from the image.

There is, however, one set of studies (B iederman, 1981; Biederman, Mezzanotte, & Rabinowitz, 1982) that sug- gests that this last model is quite insufficient as an account of object perception in real-world scenes. According to Biederman and his colleagues, the appearance of an object in such scenes defines several object-context relations that provide information about the object's identity. First, they point out, objects in scenes generally are supported by some surface and always occlude other objects or part of the scene's background. Consequently, two object-context relations are defined (i.e., Support and Interposition) that reflect the fundamental physical identity of objects, i.e., that they are entities with a certain mass and density. Second, objects are likely to appear in some scenes and not in others; they tend to occupy privileged positions in scenes they are likely to be found in; and they have typical and stable sizes relative to other objects and scene backgrounds. As a result, three additional relations are defined (i.e., Probability, Position, and Size), which can be informative in establishing the semantic identity of partic- ular objects in a given scene. 1

This analysis suggests that in addition to the likelihood of an object being in a scene, its spatial appearance in that scene may affect its perception. Specifically, the question is raised whether the spatial structure inherent in real-world scenes provides a contextual definition of a set of relational object features (i.e., Support, Interposition, Position, and Size) that are taken into account during object perception. If this were the case, a model integrating inter-object prim- ing with data-driven recovery of structural object features would clearly be inadequate as an account of object per- ception in natural scenes. While Support and Interposition effects on object perception could still be reconciled with such an account, this would not be possible for similar effects of Position and Size. Indeed, the former types of

1 When used to indicate the name of object-context relations or effects produced by these relations, the words support, interposition, probabil- ity, position, and size will be capitalized.

effects could be situated at the level of image-segmentation processes, which in data-driven models are assumed to be guided by non-specific knowledge of the general, physical constraints governing the appearance of surfaces in 3-D space (Man, 1978; McArthur, 1982). Position and Size effects, however, would suggest that object perception is influenced by specific knowledge of the typical contents and spatial layout of a particular object-context configura- tion. Consequently, as Biederman and his colleagues argue, evidence for such effects would seem to indicate that con- cept-driven processing, guided by scene-specific schemas, needs to be posited in order to fully understand object perception in scenes.

The results presented by these authors appear to corroborate this position. Subjects asked to determine whether a prenamed target object had been present at a post-cued position in a briefly (150 ms) presented scene, were slower and less accurate in responding when the object at the cued position violated Support, Probability, Position, and Size (i.e., when the object defied gravity or appeared in an atypical scene, position, or relative size). Furthermore, an increase in the number of simultaneous violations of these four relations produced clear increases in miss rates and correct reaction times, along with a very slight, but significant, increase of false-alarm rates. Ac- cording to Biederman and his colleagues, these results clearly show that the object-identity information provided by these relations is actually used in object perception and becomes available through rapid access to an integrated representation of typical contents and layout of the viewed scene. Consequently, they conclude that the perception of context-consistent objects in scenes should be viewed as based on a concept-driven detection of structural and rela- tional object features, specified in a global scene-specific schema activated within the course of the first glance at a scene.

However, a number of arguments can be given to cau- tion against this conclusion. The most important one is that response speed and accuracy in this kind of experiment may not measure the perceptibility of the positionally cued object at all, but rather may reflect the subject's degree of uncertainty in deciding post-perceptually whether the ob- ject was indeed the prenamed target. Specifically, we want to argue that the observed response patterns can quite ade- quately be explained as a result of educated guessing strategies to which the subjects resorted in the absence of proper structural object-feature information. Indeed, if ob- ject perception is based primarily on the recovery of such information from the incoming image, one can safely as- sume that a 150-ms masked exposure of a complex scene will frequently be insufficient to conclusively identify all the objects in it. Consequently, subjects asked to determine whether a prenamed target object was present at a post- cued position in a scene viewed under these conditions will frequently remain uncertain as to whether this was indeed the case. However, this does not mean that they are com- pletely in the dark, since Antes, Mann, and Penland (1981) and Antes, Penland, and Metzger (1981) demonstrated that scene exposures of 100- 150 ms can be sufficient to ob- tain some idea of the general theme of the scene. Further- more, while detailed structural features of the cued object

319

may not have been recovered during the scene's exposure, this might be possible for some of its gross spatial proper- ties, such as proportion of the scene's visual angle occupied by the object, its distance to the scene's ground plane, its nearness and position in relation to other objects in the scene. From a comparison between these two types of contextual information and their a-priori knowledge about the prenamed target object, subjects can generate post-per- ceptual guesses that will lead to the response patterns Biederman and his colleagues interpreted as reflecting variations in perceptibility of the cued object.

For target-present trials, this post-perceptual compari- son will produce evidence against a "yes, the cued object was the target" response whenever a violation of Probabil- ity, Size, Position, or Support is introduced. For instance, deriving the theme "kitchen" from a scene will increase the subject's uncertainty that some unidentified blob in that scene is a "wheelbarrow." Clearly, this uncertainty will increase even further when the blob occupies only a tiny portion of the scene, is located at a great distance of the scene's ground plane, and does not appear anywhere near another potentially support-providing surface. Conse- quently, it is to be expected that the introduction of any of these violations will result in the observed increase in correct reaction times and miss rates.

For trials on which not the target, but some other object appears at the cued location, no systematic violation effects should be expected, since the violations do not pertain to the prenamed target. The post-perceptual comparison should therefore be largely noninformative and irrelevant to task performance. Note that Biederman et al.'s (1982) finding of a very slight, but significant, increase in false- alarm rates as more violations are introduced, can hardly be viewed as a serious argument for rejecting the post-per- ceptual interpretation of the data in favor of an explanation in terms of object perceptibility. Indeed, apart from having failed to replicate this finding (Klatsky, Teitelbaum, Mez- zanotte, & Biederman, 1981), Biederman and his col- leagues are equally unable to account for it, since there is no obvious reason why, within the framework of their theory, one would expect subjects to be more likely to claim that an object is a prenamed target as the object in question becomes less perceptible.

Clearly, a post-perceptual explanation can deal with the main aspects of the Biederman data at least as well as a theory of schema-mediated object perception does. In fact, a brief analysis of some of the more detailed results reveals that it may even be superior.

A first argument to this effect is that false-alarm rates proved to be consistently higher when the prenamed target was probable in the scene than when it was inprobable. Obviously, this is completely in line with the post-percep- tual explanation, while it poses problems for the interpreta- tion by Biederman et al. Specifically, this finding implies that knowledge associated with an individual object speci- fied before the scene's exposure plays an important role in determining the subject's response. The question then becomes to what extent one can still maintain that respon- ses in this experiment reflect influences of knowledge con- tained in a global scene-schema activated during the scene's exposure.

Second, Interposition violations turned out to have no effect at all. Following Biederman's rationale, this implies that thoroughly disturbing an object's featural structure by letting its background pass through it has no effect what- soever on its perceptibility. One might argue that this only shows that relational object-features are much more important for object perception in scenes than structural object features are. However, this interpretation presup- poses that relational object features are generally sufficient to uniquely specify an object's identity, which we think is, at the least, a questionable assumption. The post-perceptual explanation, on the other hand, predicts the absence of Interposition effects without having to rely on this assump- tion. Specifically, it starts out from the idea that 100- 150-ms scene exposures are generally insufficient to re- cover featural object-structure, which logically entails that disturbances of that structure should have little effect on later decisions concerning the identity of the object.

Finally, it should be mentioned that simultaneous viol- ations of Size and Probability produced higher costs than violations of Probability only. Within the framework of Biederman's theory, this is an inexplicable finding, since it implies that scene-specific schemas contain knowledge about the typical size relations that hold between that scene and all objects that typically do not appear in it. According to the post-perceptual explanation, however, this finding is to be expected. Even if apprehension of the scene's global theme suggests that the target was unlikely to be in it, subjects will still be able to determine whether the relative visual angle occupied by the blob at the cued position conforms to what could be expected if the target were to be placed in that particular scene. As a result, combined Size and Probability violations will produce more evidence against the presence of the prenamed target, leading to greater violation costs.

In conclusion, this discussion leaves us with the sug- gestion that an object's perception could be mandatorily affected by its spatial relations to the scene in which it appears, but indicates that unequivocal evidence of such effects still remains to be presented. In fact, we feel that this may be the case for the Probability effect as well. Just as we pointed out for the Biederman et al. (1982) study, all other experiments claiming to demonstrate this effect can be criticized for encouraging subjects to use contextual knowledge and information deliberately in order to reach high levels of task performance. These studies invariably require subjects to identify or to detect objects under view- ing conditions that provide only degraded perceptual infor- mation about the objects (e.g., Antes, Penland, & Metzger, 1981), or they allow for a more detailed display explora- tion, but demand that it be done in preparation of an object and/or scene-recognition test (Antes & Penland, 1981; Friedman, 1979; Loftus & Mackworth, 1978; Palmer, 1975). As a result, subjects in these studies can be expected to be inclined to actively employ contextual knowledge and information in order either to compensate for lack of information on structural object features or to facilitate memory-trace formation (Schank, 1982). Consequently, even if it can be assumed that a reliable measurement of object perceptibility has been employed in at least some of these experiments (e.g., Friedman, 1979), it is difficult to

320

maintain that the observed effects of object probability constitute strong evidence for the notion of genuine and mandatory effects of context on everyday object percep- tion.

Measuring mandatory context effects on object perception

In view of these considerations, and given the important implications that evidence of mandatory context effects could have for present models of object perception, we attempted to develop a paradigm that would provide a less disputable measurement of such effects. Specifically, it involves the recording of eye-movement patterns of sub- jects freely exploring line drawings of real-world scenes in order to count the number of "non-objects," i.e., meaning- less closed figures with an object-like appearance, present in those scenes. This paradigm has two main advantages.

First, it uses a task in which subjects can easily achieve maximum performance without having to take recourse to an active use of contextual knowledge and information. All they have to do is scan the scene and determine whether any object-like entities they come across correspond to a known object. Given the absence of viewing constraints or mnemonic requirements in this "object decision task" (Huttenlocher & Kubicek, 1983; Kroll & Potter, 1984), need and relevance of a deliberate capitalization upon con- text may be considered to be minimal.

Second, the registration of eye-movement patterns under these conditions provides an unobtrusive, on-line measure of object perceptibility. Specifically, for each ob- ject fixated in the course of scene exploration, first-fixation duration can be determined, providing a measure com- monly believed to directly reflect object-identification time (e.g., Friedman, 1979; Henderson et al., 1989). In addition, since Henderson et al. (1989) pointed out that first-fixation duration may be an overly conservative measure of the actual time needed to complete object identification, gaze duration (i.e., the total sum of successive fixations on an object when it is looked at for the first time) was also recorded. However, since gaze durations are inherently more likely to reflect slower, post-perceptual processes as well as initial encoding (Inhoff, 1984; Rayner & Pollatsek, 1987), they should be interpreted with caution.

With the use of this paradigm, the effects of various object-context relations on object perception were exam- ined by a comparison of fixation times for objects under- going relational violations (Violation conditions) with those for the same objects appearing in a normal relation to their context (Base condition). Specifically, the relations involved were Probability, Position, Size, and Support. Interposition was not considered, since it is impossible to violate this relation without disturbing the object's featural structure. Consequently, any costs that violations of this relation produced (i.e., longer object-fixation times in rela- tion to the Base condition) would be difficult to interpret as a genuine contextual effect.

In addition to determining which of these relations affect object perception, we also decided to explore how the information they provide becomes available to the per-

ceptual system. According to Biederman et al. (1982), this is the result of immediate, scene-schema activation during the first glance at a scene, and this independently of, and prior to, individual object identification and 3-D scene segmentation. Since this view poses the formidable (and presently unanswered) challenge of outlining how an un- limited variety of possible "blob-configurations" can im- mediately be perceived as instantiations of a limited set of prototypical scene representations, we thought it worth- while to perform a basic test of its plausibility. Specifically, one condition that needs to be met in order to further entertain this hypothesis is that context effects should al- ready surface in the earliest stages of scene exploration. Indeed, if context effects only appeared later on in scene exploration, there would be no need to postulate the opera- tion of an immediate, scene-encompassing mechanism for image interpretation. Instead, access to contextual informa- tion could then be viewed as a gradual process based on data-driven encoding and identification of local-scene components and the spatial relations between them. In order to assess the plausibility of these contrasting views, it was decided to investigate differences in fixation times for the normal and the violated objects as a function of the ordinal position of their first fixation in the entire fixation sequence recorded for the scene in which they appeared.

A final question addressed in the present study per- tained to the precise role of contextual information in ob- ject perception. Specifically, does it directly influence the ease with which an object is apprehended in an image? Or does it serve as a framework for testing the plausibility of the output of a strictly data-driven analysis of structural object features? While the former view on context effects has been conceptualized in a number of different ways - e.g., context generates predictions for a concept-driven detection of structural object-features (Friedman, 1979); context provides an additional set of relational object diag- nostic features (Biederman et al., 1982); context automati- cally reduces activation thresholds of stored object repre- sentations (Henderson et al., 1987) - all these models imply that a facilitatory component will be present in the context effect. This is in contrast with the "plausibility- checking" view, which holds that contextual information is merely used to endorse or to reject the output of the object- encoding process. Consequently, it predicts that the only effect of context will be to delay or to inhibit conclusive object identification in case of relational inconsistencies.

In order to determine which of these two views is more appropriate, a condition was included in which objects appeared out of scene context, i.e., in an array of isolated objects. Clearly, failure to find longer fixation times in this Isolation condition relative to the Base condition would indicate the absence of a facilitatory effect of contextual information and would suggest that it does not directly influence ease of object encoding. In this case, any viol- ation costs observed in the Base-Violation comparisons could be interpreted as strictly inhibitory effects. If, on the other hand, fixation times were to be longer in the Isolation condition, this would not necessarily imply that the costs produced by any of the specific violations should (at least partly) be attributed to a lack of facilitation. This conclu- sion can only be drawn from a direct comparison between

321

r

C

e

a

[ \

b

Fig. 1 a - g. Scene Versions 1 - 5 of the "Gas Station" are presented in la-1 e, while Scene Versions 5 of the "Playground" and "Chemistry Lab" are presented in 1 f and 1 g, thus illustrating all violations of the Targets "motorcycle" and "gas pump" as well as the unviolated pres- ence of the Bystander "tire" in all versions of the "Gas Station." a. Version 1 of Gas Station: Motorcycle (target 1) and gas pump (target 2) appear unviolated, b. Version 2 of Gas Station: Support violation for motorcycle, Size violation for gas pump. e. Version 3 of Gas Station: Position violation for motorcycle, Support violation for gas pump. d. Version 4 of Gas Station: Size violation for motorcycle, Position vi- olation for gas pump. e. Version 5 of Gas Station: Probability vio- lations for vacuum cleaner and barbecue, f. Version 5 of Chemistry Lab: Probability violations for motorcycle and parking meter, g. Ver- sion 5 of Playground: Probability violations for coffee cup and gas pump.

322

violated objects and the same objects in an identical scene providing neutral or no information with respect to the violated relations, but completely coherent otherwise. In view of the obvious difficulties associated with the con- struction of appropriate stimuli for such a comparison, it was decided to perform this Base-Isolation comparison only in order to provide an initial, overall test for the existence of any contextual facilitation.

Method

Stimuli. By tracing and somewhat simplifying the contours in photo- graphs and slides of real-world settings, we obtained line drawings of 27 different scenes. For each of these scenes, two objects likely to appear in it were designated as "targets," i.e., objects that would be subjected to relational violations. By the insertion of these objects into the scenes and by manipulation of target-scene relations, five different versions of each scene were constructed. In Version 1 both targets appeared in a perfectly normal relation to their context. In the remaining four versions targets appeared in violation of the object-context relations of interest: in Ver- sion 2 target 1 violated Support and target 2 violated Size; in Version 3 target 1 violated Position and target 2 violated Support; in Version 4 target 1 violated Size and target 2 violated Position. Finally, in Version 5 two targets belonging to two of the other scenes were inserted, producing a Probability violation for both objects. In this way a limited number of 135 stimuli (27 scenes x 5 versions) was sufficient to present a broad variety of 54 target objects in 1 Base and 4 Violation conditions. An illustration of the stimulus-construction procedure is provided in Fig- ure 1.

In order to control whether the relational manipulations in Scene Versions 2 - 5 did not incidentally produce visually more complex or disorganized scenes in comparison with Scene Version 1 (which obvious- ly could bias comparisons of fixation times for normal and violated targets), a third probable object ("bystander") was selected for each scene. By the insertion of this bystander in all five versions of that scene (see Figure 1), such that it never violated any spatial relations, a compar- ison of bystander-fixation times as a function of Scene Version could be used to assess any effects the presence of relational anomalies might have on general ease of image processing. A complete list of scenes, targets, and bystanders is provided in Appendix A.

The construction of the 135 scene stimuli (27 scenes x 5 versions) was completed by the insertion of varying numbers of non-objects (most of them adapted from the set provided by Kroll and Potter, 1984). Specifically, 20% of the stimuli contained three different non-objects, 20% contained only two, 20% contained just one, and the remaining 40% contained no non-objects at all. This distribution scheme was used to encourage subjects (who saw all stimuli in order to count the number of non-objects) to explore each display completely. In addition, non-objects were distributed in such a fashion that they could not serve as predictors for other non-objects, targets, scenes, or scene versions. Furthermore, neither the 27 scenes nor the 5 scene versions differed in terms of the average number of non-objects they contained. After this insertion of non-objects, all scene stimuli were made into white-on-black slides sub- tending 30 ° by 20 ° .

Finally, in order to measure ease of object identification out of scene context, all targets and a number of non-objects were presented on 26 white-on-black slides with 4 (non-)objects per slide. Across displays all targets appeared once, while the number of non-objects per slide was again varied to encourage subjects to scan each display completely. In order to avoid inter-object priming effects, no array contained objects that were semantically related and/or likely to appear in the same scene. Objects and non-objects were located at 10 ° from the display's center at the corners of an imaginary square, with the positions of the two ca- tegories randomized across displays.

Subjects, procedure and apparatus. Eight subjects from the University of Leuven subject pool participated in the experiment. All of the subjects had normal vision and none of them required corrective lenses.

Upon arriving for the experiment, subjects were seated comfortably 184 cm away from a slide projection screen. They were told that they would participate in one of a series of experiments on how good people are at detecting certain kinds of information in images of varying com- plexity. In this particular experilnent they would have to determine whether slides of line drawings depicting real-world scenes and groups of isolated objects contained drawings of nonexistent objects. In order to illustrate the concept, subjects were given a page (see Appendix B) providing them with drawings of existing and nonexistent objects. They were then told that their accuracy in detecting non-objects would be evaluated in two ways. First, after each slide they would have to press a response key once for each non-object they had seen in the slide. Second- ly, their eyemovements would be registered during the entire display exposure in order to determine whether they had in fact localized all non-objects in the display or had just guessed how many were present.

After these instructions, subjects were asked to place their head in a head-and-chin rest in order to eliminate head movements and to keep viewing distance constant. Subsequently, the eye-movement registration system was calibrated and subjects received a series of 164 trials, inter- rupted for a brief rest whenever the subject showed signs of fatigue. The first three trials served as practice and involved the presentation of three scenes, which did not return for the remainder of the experiment. On the following 135 trials all experimental scene stimuli were presented in an individually randomized order, with the restriction that the same scene or the same target could not appear twice in a row. On the last 26 trials the arrays of isolated objects were presented, again in an individually ran- domized order.

Each trial involved the following events. First, a fixation cross was presented in the center of the screen and subjects were instructed to fixate upon it until a slide appeared. Once their gaze had settled on the fixation cross, the experimenter initiated an 8-s exposure of a stimulus slide. Pilot research had indicated that this was sufficient for a complete exploration of the displays. A fixed, experimenter-controlled exposure duration was used in order to create similar viewing conditions for all displays and to prevent subjects from terminating displays before they had examined them completely. After terminating the display, subjects responded by pressing a key in front of them. These keypresses were not registered, but subjects were not aware of this, since no feedback was given until the end of the experiment, which lasted about 50-60 rain.

Throughout the whole series of trials, eye movements were recorded by means of a Debic-80 eye tracker operating on the principles of the pupil-center corneal reflection method (Young & Sheena, 1975). This system has a 0.5°-accuracy and a 50-Hz sampling rate and was interfaced with a PDP 11/40 computer keeping a complete record of the X and Y coordinates of the subject's point of regard. This on-line recording was broken down into separate fixations by means of a data-reduction pro- gram defining a fixation as a sequence of at least four consecutive measurements (i.e., 80 ms) whose X and Y coordinates did not deviate more than 1 ° from the mean X and Y coordinates of the previous measurements that constituted the fixation. The coordinates of the fixa- tions were then compared with the coordinates of the targets and bystand- ers in the displays, in order to determine first-fixation and gaze duration for these objects.

Results and discussion

Violation ratings

In o r d e r to d e t e r m i n e w h e t h e r o b j e c t - c o n t e x t r e l a t i o n s h a d b e e n m a n i p u l a t e d succes s fu l ly , f ou r i n d e p e n d e n t r a t e r s un- f a m i l i a r w i t h the s t i m u l i a n d the e x p e r i m e n t a l h y p o t h e s e s

r e c e i v e d a d e t a i l e d e x p l a n a t i o n a n d e x a m p l e s o f the re la- t ions a n d t h e i r v io l a t i ons . S u b s e q u e n t l y , t h e y w e r e s h o w n all s c e n e s l ides in a n i n d i v i d u a l l y r a n d o m i z e d order . A l o n g w i t h e a c h s c e n e s l ide, ra te rs w e r e g i v e n a p h o t o c o p y o f the s a m e s c e n e f r o m w h i c h ta rge t s a n d b y s t a n d e r s h a d b e e n r e m o v e d w h i l e t he i r p o s i t i o n was i n d i c a t e d b y a r e d rec-

Table 1. Mean object size, violation, and camouflage ratings for fixated targets in the Base and Violation conditions.

Physical characteristics Rated violation strength

Conditions Object size Camouflage Probability Position Size Support (degrees) (ratings)

Base 7.9 1.8 0.397 0.991 1.015 0.470 Probability 8.2 2.1 6.852 2.426 1.588 0.384 Position 7.4 2.2 0.632 7.050 1.177 2.200 Size 6.2 1.4 0.497 1.908 6.032 0.561 Support 6.4 2.3 0.474 4.014 1.144 8.657

323

tangle. For each slide, raters were asked first to identify the objects at the positions indicated on the photocopy, and then to rate the extent to which each of these objects violated the Probability, Position, Size, and Support rela- tions. Separate rating scales ranging from 0 ("the violation is absent") to 10 ("the violation is very clear") were pro- vided for each relation. Subjects were encouraged to use the whole rating scale in giving their judgments.

All four raters agreed on the identity of targets and bystanders, with the exception of two targets (i.e., "milk- can" and "vice") and one bystander ("crate"). These ob- jects were therefore eliminated from all subsequent ana- lyses.

The degree of consensus in judging violation strength was examined by computing interrater correlations (IRCs) on the remaining 390 ratings (52 targets and 26 bystanders appearing five times each) provided by each rater. Average IRCs were 0.807 for Probability, 0.564 for Position, 0.604 for Size and 0.791 for Support. The IRC for Position very likely underestimates the actual consensus about the strength of Position violations. Indeed, as Biederman et al. (1982) have already pointed out, it is not clear how raters will tackle the problem of rating Position violations for objects violating Probability and Support. The fact that this kind of situation clearly causes confusion is evident in the substantially greater average IRC obtained for Position (i.e., 0.669), when only objects with a rating of less than 5 on the Support and Probability scales are taken into con- sideration. In any case, these results do confirm the intuit- ively plausible hypothesis that an object's typical position and relative size in real-world scenes are less strictly defined than either its likelihood to appear in a given scene or its need for support. Consequently, if one wishes to examine the possible impact of Position and Size on object perceptibility by violating these relations, it is of the utmost importance to create stimuli in which at least the relative strength of these violations varies considerably.

This condition was tested for the present set of stimuli by the examination of mean violation ratings for the nor- mal and violated targets. Since differences in relational

2 Since subjects could freely explore the displays, with an inevitable loss of at least 10% of the eye-position measurements due to blinking and track loss, we could not expect to record an observation for all possible 2,080 target fixations (i.e., 52 targets × 5 Violation conditions x 8 Sub- jects). Fortunately, a large proportion of this total was obtained, i.e., 82% or 1,700 data-points. In addition, in a Subjects x Violation analysis of variance on the proportions of missed target fixations, none of the main effects reached unity, indicating that misses were randomly distributed across subjects and violation conditions.

violations were to be related to differences in fixation times, only those targets that had been fixated during the actual experiment were taken into account. 2

As can be seen in Table 1, rated violation strengths were in close agreement with the intended manipulations of target-scene relations. Specifically, in each of the viol- ation conditions the intended violation (italicized) was al- ways far stronger than any of the other violations. In addi- tion, while the various conditions clearly differed in terms of the intended violations, the differences with respect to the non-intended violations were much smaller. Conse- quently, the ratings support the position that fixation-time comparisons between the Base and each of the Violation conditions will indeed allow for an assessment of the ef- fects produced by the intended violations.

Finally, an inspection of the mean violation ratings for the fixated bystanders in each of the five scene versions revealed that none of the violations received a rating higher than 1.5. It can therefore be assumed that bystander-scene relations always appeared to be normal. Consequently, comparisons of bystander-fixation times in the Base and Violation versions can be considered to reliably reflect any incidental effects the introduction of relational anomalies may have had on general ease of image processing.

Physical object characteristics

In order to further refine the measurement of context ef- fects on object perceptibility, we also collected data on absolute object size and degree of object camouflage, which in previous research (e.g., Antes & Penland, 1981; Klatsky et al., 1981) have been identified as possible physi- cal determinants of ease of object identification. The abso- lute size of each of the targets and bystanders in each of the experimental stimuli was measured as maximum object length x width in degrees of visual angle. Camouflage ratings were obtained by having the four judges of viol- ation strength evaluate the degree to which object features were masked by adjacent contours. Ratings were given on a 0 (no camouflage) to 10 (very strong camouflage) scale. While raters were encouraged to use the whole scale, they showed considerable consensus (inter-rater correlations averaged 0.728) in their judgment that objects were only slightly camouflaged (mean ratings ranged from 1.51 to 2.98 with SDs ranging from 1.23 to 2.34). Mean camouf- lage ratings and object size for targets fixated in the Base and Violation conditions are presented in Table 1.

324

Table 2. Mean first-fixation (FFD) and gaze durations (in ms) as a function of Violation.

Violation

Base Probability Position S i z e Support

FFD 198 220 226 206 214 (199) (220) (225) (208) (212)

Gaze 304 335 389 294 332 (304) (332) (381) (306) (330)

Note: Parentheses indicate means adjusted for differences in object camouflage and size.

Violation effects on object perception

In order to determine which violations had an effect on ease of object perception, first-fixation and gaze durations for normal and violated targets were entered in a Sub- jects x Violation analysis of variance, revealing a signifi- cant effect of Violation on both measures, F (4,28) = 3.85, p <.05 for first fixation durations, and F (4,28) = 8.02, p <.01 for gaze duration. 3 Dunnett comparisons (Kirk, 1982, pp. 112 f.) between the means for the Base and each of the Violation conditions (see Table 2) showed signifi- cantly longer first-fixation durations for targets undergoing Position, tD' (28) = 3.0, p <.05, and Probability tD' (28) = 2.35, p <.05, violations. Gaze durations for targets viola- ting Position, tD'(28) = 4.97, p <.01, also proved to be longer. Two aspects of these data deserve further consider- ation.

First, while Position and Probability violations pro- duced similar effects on first-fixation durations, the gaze results only reveal a clear effect for Position violations. One explanation for this discrepancy is that Position sim- ply has a stronger effect on object perception and that its violation therefore necessitates a more extensive and time- consuming analysis of object detail than can be carried out within the course of a single object fixation. As has been mentioned before, however, gaze durations are inherently more likely than first-fixation durations to reflect slower (possibly post-perceptual) processes as well. Consequent- ly, the observed pattern of results could also be taken to suggest similar effects of Probability and Position on initial object processing (i.e., encoding or identification), and a stronger Position effect on subsequent aspects of object processing (e.g., evaluating the plausibility of an encoded object or allocating extra fixation time for interesting or puzzling objects).

Secondly, Size and Support violations did not produce significantly longer object-fixation times, indicating that these relations have no effect on object perception. But since Support means more closely resemble the Position

and Probability means than the Base mean, while Bieder- man et al. (1982) concluded that Support (and especially Size) do have considerable effects on object perceptibility, one might suspect an incidental dilution of these effects in the present experiment. Indeed, an inspection of mean ob- ject camouflage and absolute object size (see Table 1) reveals that targets fixated in the Size and Support condi- tions were smaller, and in the Size condition also some- what less camouflaged, than those fixated in the Base condition. This is in contrast to the targets fixated in the Position and Probability conditions, which were slightly more camouflaged and more similar in size to the targets fixated in the Base condition. In order to determine whether these incidental differences in physical object characteristics could have obscured the effects of Size and Support violations, object size and camouflage were en- tered as covariates in the Subjects x Violation analyses. Preliminary tests for the within-cell heterogeneity of re- gression slopes (Cliff, 1987, pp. 282 f.) proved to be non- significant for both first fixation, F (56,1602) <1, and gaze durations, F (56,1602) = 1.21, p >.10, allowing for a straightforward interpretation of the adjusted effects and condition means. Specifically, the adjusted main effects of Violation remained significant, F (4,28) = 3.71, p <.05 for first-fixation durations, and F (4,28) = 7.21,p <.01 for gaze durations. In addition, object camouflage had an effect on first fixation, F (1,1658) = 6.58, p <.05, and gaze durations, F (1,1658) = 27.26, p <.01, with a greater degree of ca- mouflage associated with longer fixation times. Finally, absolute object size had an effect on gaze durations, F (1,1658) = 39.16, p <.01, with larger objects receiving longer gazes. Mean target fixation times for the Base and Violation conditions, adjusted for differences in physical object characteristics, are presented in Table 2. The ob- vious similarity between the adjusted and the unadjusted means reported in this table already indicates that inciden- tal Base-Violation differences in object size and camouf- lage were too insignificant to attenuate the measurement of violation effects. This was confirmed by Dunnett tests of the adjusted Base-Violation differences, which yielded a pattern of significances identical to that found for the unad- justed means.

As a final control on the reliability of the observed violation effects, fixation times for the nonviolated by- standers in the Base and four Violation versions of each scene were entered into a Subjects × Scene Version analy- sis of variance. Effects of Scene Version did not approach significance, F (4,28) < 1 for both first-fixation and gaze durations, and also failed to do so when adjusted for effects of object camouflage and absolute object size. It can there- fore be concluded that the presence of relational anomalies did not affect general ease of image processing and that the observed Base-Violation differences in target-fixation times do in fact reflect specifiC violation effects on the perceptibility of the violated object.

? This and all subsequent analyses of object-fixation times were per- formed with individual object fixations as the unit of analysis. The inevitable disproportionality of celt n's associated with this approach (see footnote 2) was dealt with by use of the General Linear Model approach to analysis of variance as outlined by Kirk (1982).

The origin of violation effects

Having established the presence of violation effects, we performed a further analysis in order to investigate whether

325

they surfaced in the earliest stages of scene exploration, which would support Biederman's (1981) claim that con- textual information becomes available through immediate activation of a global scene-schema. For this purpose we determined the ordinal postion of all first target fixations in the fixation sequence recorded for the scene in which the target appeared. The median first-fixation number (8) was then used as a cut-off point for dividing all target fixations into an "Early Fixation" and a "Late Fixation" group. Sub- sequently, fixation times for the normal and the violated targets were introduced in a Subjects x Violation x Fixa- tion Moment analysis of variance. For first-fixation dura- tions this analysis revealed a marginally significant effect of Violation, F (4,28) = 2.43, p = .07, a main effect of Fixation Moment, F (1,7) = 38.16, p <.001, with longer first-fixation durations in the Late Fixation group, and a significant Violation x Fixation Moment interaction, F (4,28) = 4.88, p <.05. The gaze results followed the same pattern with main effects of Violation, F (4,28) = 9.09, p <.001, and Fixation Moment, F (1,7) = 15.14, p <.05, along with a significant Violation x Fixation Moment in- teraction, F (4,28) = 5.24, p <.05. Mean fixation times as a function of Violation and Fixation Moment are presented in Table 3.

With respect to first fixations, Dunn-Sidak tests (Kirk, 1982, pp. 110 f.) of the Base-Violation differences as a function of Fixation Moment revealed significantly longer durations for late-fixated objects violating Probability, tDS (28) = 3.87, p <.05, or Position, tDS (28) = 3.59, p <.05. A similar, but slightly smaller, effect was found for objects violating Support, tDS (28) = 3.04, p <.05, which explains why the Support effect resembled Probability and Position effects in the overall analysis, but failed to reach significance. The gaze results replicated this pattern with significantly longer durations for late-fixated objects viol- ating Probability, tDS (28) = 3.12, p <.05, Position, tDS (28) = 3.58, p <.05, and Support, tDS (28) = 2.98, p <.05. In addition, the Base-Position difference for early-fixated objects also proved to be significant, tDS (28) = 4.51, p <.05. An analysis of covariance, analogous to the previ- ous analyses, was performed to adjust the results for possi- ble confounding influences of incidental differences in ab- solute object size and object camouflage. As can be seen in Table 3, these differences were again too insignificant to attenuate the pattern of results.

Overall, the data quite clearly argue against the view that contextual information is available or influences object perception right from the very first scene fixation. And so there seems to be no justification for the hypothesis that the observed context effects should be attributed to the immediate activation of a scene-specific schema, inde- pendently of detailed encoding of local-scene components and the spatial relations between them. Before acceptance of this conclusion, however, two additional issues should be addressed.

First, there is the early Position effect on gaze dura- tions. In our opinion, this does not invalidate the above conclusion, since it is not paralleled by a similar effect on first-fixation durations. As was argued above, such a dis- crepancy strongly suggests that the gaze results also reflect violation effects on processes following initial object en-

Table 3. Mean first-fixation and gaze durations (in ms) as a function of Violation and Fixation Moment.

Violation

Fixation Moment Base Probability Position S ize Support

Early FFD 193 193 203 200 191 (195) (192) (201) (203) (188)

Gaze 296 287 383 29l 289 (298) (288) (375) (301) (290)

Late FFD 203 248 246 213 239 (204) (247) (249) (213) (237)

Gaze 323 383 391 296 374 (314) (378) (387) (311) (370)

Note: For parentheses see note to Table 2.

coding or identification. One explanation consistent with this argument is that upon (partial) identification of a viol- ated object during its first fixation, the relational anomaly becomes so evident that it causes a perceptual "double- take" which will be reflected in the gaze durations. The fact that this effect appears only later for the Probability and Support violations would then indicate slower access to the contextual information required for these violations to be- come evident. This is not an unreasonable hypothesis when one thinks about the nature of the information needed for the detection of these various relational violations. For a positional anomaly to become evident, a rudimentary ana- lysis of the violated object's immediate context can often be sufficient (e.g., the identification of a wide variety of objects, such as a telephone booth, a gas pump, a shopping trolley, an ironing board, etc., will create an immediately obvious anomaly when they are perceived to stand on top of another shape). For the detection of a Probability viol- ation, however, such an undetailed analysis of local context will generally be less informative (e.g., the identification of a gas pump next to a similar shape will not make the gas pump anomalous unless that shape has been identified as a refrigerator and unless additional contextual information identifies the scene as a kitchen rather than a gas station). A similar argument can be applied to the detection of Support violations. Indeed, unless salient depth cues (e.g., occlu- sions, shadows, differences in texture gradients) are avail- able in the immediate environment of a floating object, depth information from other parts of the scene will be required in order to anchor the object and its immediate background in a 3-D coordinate system and to detect the Support violation.

A second argument that might be advanced against the interpretation of these data as showing gradually develop- ing effects of context on object perception stems from a number of studies demonstrating an Overall increase in fixation durations as a function of time spent in exploring pictorial displays (e.g., Antes, 1974; Locher & Nodine, 1987). While the precise origin of this increase still re- mains to established - e.g., increase in processing load; reflection of global orientation and detailed examination stages (Nodine, Carmody, & Kundel, 1978); decrease in extrafoveal-preview benefit (Henderson et al., 1989) - its consistent observation implies an important warning for

326

a 1D LU

0.9

O9 D.8

© r r O.7

u_ 0B © Z Q 06

cc O 0.4 13_ o c£ o,a n

w > 02

._1 o.1

O o.o

I I I I I I l l l l l l l l l l l l l l

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 8 8 0

ORDINAL FIXATION NUMBER

O SUPPORT

O SIZE

[] POSITION

& PROBABILITY

BASE

Fig. 2. Cumulative proportion of targets fix- ated as a function of ordinal fixation number and Violation condition.

the present study. Specifically, it underscores the necessity of ruling out the presence of an incidental tendency for normal objects to be fixated earlier on in scene exploration than violated objects, which obviously would create the false impression of (late) context effects in our analyses. In order to examine this possibility, the cumulative proportion of targets fixated was plotted against the ordinal fixation number for the Base and each of the Violation conditions. As can be seen in Figure 2, differences between the Base and the Violation distributions are minimal and show no systematic tendency for objects to be fixated earlier in the Base condition than in the conditions producing violation cos t s .

It therefore seems reasonable to conclude that at least the first-fixation data reliably reflect gradually developing effects of Probability, Position, and Support on object per- ception.

Contextual facilitation of object perception

As was pointed out above, we intended to obtain only an overall test for facilitatory effects of context on object perception by means of a Base-Isolation comparison of object-fixation times. A distinction between inhibitory and/or facilitatory influences of specific object-context re- lations was not considered possible, given the difficulty of creating the appropriate neutral conditions. The pattern of first-fixation data in Table 3, however, does appear to jus- tify the conclusion that the observed violation effects re- flect pure inhibition rather than a lack of facilitation. This is suggested by the fact that their late appearance is entirely due to an increase of fixation times for the violated objects, while fixation times for normal objects do not decrease.

However, a closer inspection of studies examining the overall evolution in fixation durations during picture ex- ploration cautions against this conclusion. Specifically, studies reporting fixation-duration increases have typically used stimuli such as TAT cards (Antes, 1974), abstract paintings (Locher & Nodine, 1987), cartoons (Nodine et al., 1978), and arrays of isolated objects (Antes & Penland, 1981; Henderson et al., 1989). Loftus (1983), however, did not find evidence for such an increase when subjects were shown naturalistic scenes. The reason for this discrepancy may well be that coherent real-world scenes provide stronger contextual constraints on individual scene-com- ponent identification. On this view, the insignificant in- crease of object-fixation times in the Base condition may actually be the result of two counteracting processes, i.e., a general tendency for fixation times to increase during scene exploration and a gradually developing contextual facilitation of object identification. By the same token, the increase in first-fixation durations in the Probability, Posi- tion, and Support conditions may (at least partly) reflect a lack of the facilitation normally produced by good context, rather than pure inhibition due to bad context.

In view of these considerations, we decided to compare Base-Isolation differences in first-fixation durations as a function of Fixation Moment. Disregarding the possible effects of physical determinants of first-fixation duration, the dual-process interpretation of the Base-Violation dif- ferences predicts that a clear Base-Isolation difference (i.e., longer first-fixation durations in the Isolation condition) should be most outspoken for the later stages of display exploration. In order to test this, we again determined the ordinal fixation number for all first target fixations and used the median (i.e., 8 for the Base condition and 5 for the Isolation condition) as a cut-off point for dividing the ob- servations in "Early Fixation" and "Late Fixation" groups.

327

A subsequent Subjects × Fixation Moment analysis of vari- ance revealed a significant main effect of Fixation Mo- ment, F (1,7) = 28.68, p <.001, and a Context x Fixation Moment interaction, F (1,7) = 5.96, p <.05.

The means relevant to this interaction are presented in Table 4 and show that the late appearance of a significant Base-Isolation difference, tDS (7) = 4.15, p <.05, is entirely due to a significant increase of first-fixation durations in the Isolation condition, tDS (7) --- 4.88, p <.01.4 This pattern of results is quite compatible with the notion that a coher- ent scene provides gradually developing constraints which directly enhance ease of object encoding, thus compensat- ing for a general tendency for fixation times to increase during scene exploration. Moreover, while our experiment did not allow for a direct test of facilitatory effects of the manipulated object-context relations, the similarity be- tween the patterns of Base-Isolation and Base-Violation differences strongly suggests that Probability, Position, and Support information are part of these contextual con- straints.

General discussion

The main concern of the present research was to establish whether scene context has a genuinely perceptual and man- datory effect on individual object identification. For this purpose, we manipulated object-context relations and the presence or absence of good scene context, and examined their effects on first-fixation durations for objects inciden- tally fixated during a display-exploration task that required no active use of contextual knowledge or information. Longer fixation durations for relationally violated and iso- lated objects in this task indicated that context does indeed affect object perception and that at least part of this effect is facilitatory.

While a similar general conclusion has been reached in a number of other studies, the specific observations and theoretical accounts offered there are quite inconsistent with the present data. Empirically, the crucial discrepancy is between the apparently immediate context effects on object identification in tachistoscopically presented scenes (Biederman et al., 1982; Boyce, Pollatsek, & Rayner, 1989) and the present failure to observe such effects in the initial stages of free scene exploration. Two explanations can be offered for this discrepancy.

First, as was discussed in the Introduction, the immedi- ate effects may be a product of sophisticated guessing strategies rather than a perceptual phenomenon. Evidence of tremendous drops in task performance when accuracy of

4 While it could be argued that the absence of object camouflage in the Isolation condition may have obscured an early advantage of the Base over the Isolation condition, we do not believe this to be the case. In order to obtain an estimate of the possible camouflage effect in this analysis, we took the regression coefficient associated with object camouflage in the Subjects x Violation x Fixation Moment analysis of covariance on first-fixation durations (i.e., 5.48) and multiplied it with average object camouflage in the Base condition (i.e., 1.8). Clearly, this estimated cam- ouflage effect (i.e., 10 ms) is too small to maintain that it could have obscured an early Base-condition advantage.

Table 4. Mean first-fixation durations (in ms) as a function of Context and Fixation Moment

Context

Fixation moment Base Isolation

Early 193 186 Late 203 231

forced-choice recognition (Antes et al., 1981) of uncued objects is measured rather than accuracy of simple present- absent decisions about precued target objects (Boyce et al., 1989) only serves to strengthen this suspicion. If this inter- pretation is correct, then these studies merely confirm other work (e.g., Antes, Singsaas, & Metzger, 1978; Loftus et al., 1983; Metzger & Antes, 1983) that indicates that low-res- olution background information and gross object charac- teristics such as overall shape, size, and position, can be extracted more rapidly and further out in extrafoveal vision than is the case for detailed object features. Since the former kind of information does have some !~redictive power at least for the categories of objects theft can be expected in the scene, this research may prove to be useful for devising ways of constraining the search space of ob- ject-recognition processes in systems with limited capacity for data-driven image analysis (Hanson & Riseman, 1978; Riseman & Hanson, 1987). Its relevance for modeling everyday human perception of real-world scenes and ob- jects, however, may be rather limited.

A second possible explanation for the discrepancy be- tween immediate and delayed context effects is that they are both perceptual, but reflect different modes of atten- tional distribution adopted by the viewer. Allowing only one glimpse at a scene in order to detect an object in an uncertain position may in fact encourage viewers to spread their attention evenly over the entire image. Free explora- tion of a scene in search of non-objects, on the other hand, is more likely to involve attending sequentially to local- scene components. To the extent that scene-context effects are actually mediated by a global, scene-specific schema (Biederman, 1981) and the privileged mode of access to this schema involves the apprehension of global informa- tion such as background (Boyce et al., 1989) or configura- tions of gross object shapes (Biederman, 1988), context effects may be slower to appear under the present condb tions. Moreover, on this view one might argue that if cer- tain effects of object-context relations (e.g., the Size effect in the present study) do not appear at all during free scene exploration, this only indicates that a spatially wide focus of attention is a prerequisite for encoding the necessary relational information from the image.

Clearly, more work on the perceptual relevance of measurements of context effects and on the role of visual attention is needed to determine which of these two expla- nations is the more adequate to deal with the observed discrepancy between immediate and delayed context ef- fects. Depending on the answer to this question, quite different models of real-world scene and object perception will have to be put forward.

328

Appendix A

(Scene names in parentheses indicate the scenes in which targets were placed to produce Probability violations).

Scene Targets Bystander

1 Living room speaker (train platform) toy bear vacuum cleaner (gas station)

2 Gas station motorcycle (chemistry lab) tire gas pump (playground)

3 Beach flippers (office) kite shovel (kitchen)

4 Office telephone (library) fan keyboard (laundrette)

5 Train platform push cart (concert hall) luggage clock (backyard)

6 Waterfront fork lift (farm) crate boat (construction site)

7 Workshop vice (living room) tool box fire extinguisher (dining room)

8 Chemistry lab microscope (dining room) test tubes mortar (library)

9 Playground scooter (bar) skateboard tricycle (construction site)

10 Kitchen blender (living room) funnel rolling pin (workshop)

11 Bedroom shoes (classroom) coathanger alarm clock (restaurant) with shirt

12 Backyard barbecque (gas station) wheelbarrow lawnmower (waterfront)

13 Farm chickens (post office) pig milkcan (bedroom)

14 Concert hall piano(waterfront) saxophone cello (backyard)

15 Library coat rack (beach) glasses books (street)

16 Classroom setsquare (kitchen) briefcase globe (laundrette)

17 Supermarket shopping cart (bedroom) shopping bag pineapple (office)

18 Laundrette ironing board (bus terminal) shirt iron (locker room)

19 Locker room tennis racket (restaurant) coat soccer ball (post office)

20 Post Office parcel (concert hall) ashtray letters (bathroom)

21 Restaurant wine bucket (workshop) glass saltshaker (bathroom)

22 Bar cash register (street) ashtray coffee cup (playground)

23 Bus terminal car (train platform) garbage can bicycle (supenrlarket)

24 Street telephone booth (farm) traffic sign parking meter (chemistry lab)

25 Dining room plant (locker room) lamp coffee pot (classroom)

26 Construction site dump truck (beach) cement bulldozer (bus terminal) mixer

27 Bathroom toilet (supermarket) mirror blow-dryer (bar)

Overall, the present data are certainly in agreement with previous research (e.g., Antes & Penland, 1981; Biederman et al., 1982; Friedman, 1979) in arguing against the view that object perception is a strictly data-driven, modular process of structural object feature recovery. However , they also challenge the dominant alternative to this theory (Biederman, 1981), which states that real-world perception is inevitably driven by scene-specific schemas, activated independently of, and prior to, scene parsing and local object encoding.

With respect to specific context effects, the delayed appearance of a Probabili ty effect leaves open the possibil- ity that pr iming between the representations of successive- ly fixated objects m a y be the central mechanism under- lying this effect in free scene exploration. Research on the orthogonally manipulated effects of prime-target related- ness and target-scene relatedness on object fixation times should shed more light on this issue. As for the Support effect, its late surfacing appears to m n counter to the view that its origin should be placed in a disruption of precon- ceptual image-segmenta t ion processes guided by a physi- cal model of the world (Verfaillie & Wagemans, 1987). However , the use of line drawings in the present researach may necessitate a more comprehensive image analysis than would normal ly be needed to situate an object in 3-D space. An examinat ion of the impact of local depth cues (e.g., shadowing or differences in texture gradients) on the effect can probably clarify this. Finally, the late Position effect suggests that a specific position in a locally and globally interpreted scene constitutes the information necessary for this effect to appear. Whether this implies that general positional qualifications (e.g., "on top of, .... on the ground surface," "underneath," etc.) play no part at all, is just one of the many questions that still need to be dealt with in the study of real-world scene and object perception.

Acknowledgements. The research presented in this paper was supported by the Belgian Government through agreement RFO/A1/04 of the Incen- tive Program for Fundamental Research in Artificial Intelligence. The authors would like to thank Rik Delabastita, Marleen De Vijver, Koen Lamberts, and Marcel Lenaerts for their help in the preparation of the stimuli, Karl Verfaillie and Johan Wagemans for their comments at various stages of this project, and Johan Van Rensbergen and Gert Storms for their assistance in the data analysis.

Appendix B

Objects Non-objects

/ " /

329

References

Antes, J. R. (1974). The time course of picture viewing. Journal of Experimental Psychology, 3, 62 - 70.

Antes, J. R., Mann, S. M., & Penland, J. G. (1981). Localprecedence in picture naming: The importance of obligatory objects. Paper presented at the 198 ! meeting of the Psychonomic Society.

Antes, J. R., & Penland, J. G. (1981). Picture context effects on eye movement patterns. In D. E Fischer, R. A. Monty, & J. W. Senders (Eds.), Eye movements: Cognition and visual perception (pp. 157 - 170). Hillsdale, NJ: Erlbaum.

Antes, J. R., Penland, J. G., & Metzger, R. L. (1981). Processing global information in briefly presented pictures. Psychological Research, 43, 277 - 292.

Antes, J. R., Singsaas, R A., & Metzger, R. L. (1978). Components of pictorial informativeness. Perceptual and Motor Skills, 47, 459 - 464.

Biederman, I. (1981). On the semantics of a glance at a scene. In M. Kubovy & J. R. Pomerantz (Eds.), Perceptual Oq~anization. Hills- dale, NJ: Erlbaum.

Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115 - 147.

B iederman, I. (1988). Aspects and extensions of a theory of human image understanding. In Z. W. Pylyshyn (Ed.), Computational processes in human vision: An interdisciplinary apptvach (pp. 3 7 0 - 428). Nor- wood, NJ: Ablex.

Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational viol- ations. Cognitive Psychology, 14, 143 - 177.

Boyce, S. J., Pollatsek, A., & Rayner, K. (1989). Effects of backgaound information on object identification. Journal of Experimental Psy- chology: Human Perception and Performance, 15, 556 - 566.

Carr, T. H., MeCauley, C., Sperber, R. D., & Parmelee, C. M. (1982). Words, pictures and priming: On semantic activation, conscious identification, and the automaticity of information processing. Jour- nal of Experimental Psychology: Human ,Perception and Perform- ance, 8, 757 -777.

Cliff, N. (1987). Analyzing multivariate data. Orlando, FL: Harcourt. Friedman, A. (1979). Framing pictures: The role of knowledge in auto-

matized encoding and memory for gist. Journal of Experimental Psychology: General, 108, 3 1 6 - 355.

Hanson, A. R., & Riseman, E. M. (1978). VISIONS: A computer system for interpreting scenes. In A. Hanson & E. Riseman (Eds.), Computer vision systems (pp. 3 0 3 - 333). New York: Academic Press.

Henderson, J. M., Polla~sek, A., & Rayner, K. (1987). Effects of foveal priming and extrafoveal preview on object identification. Journal of Experimental Psychology: Human Perception and Performance, 13, 449 - 463.

Henderson, J. M., Pollatsek, A., & Rayner, K. (1989). Covert visual attention and extrafoveal information use during object identifica- tion. Perception & Psychophysics, 45, 196 - 208.

Huttenlocher, J., & Kubicek, L. E (1983). The source of relatedness effects on naming latency. Journal of Experimental Psychology: Learning, Memory and Cognition, 9, 486 - 496.

Inhoff, A. W. (1984). Two stages of word processing during eye fixations in the reading of prose. Journal of Verbal Learning and Verbal Behaviol; 23, 612 - 624.

Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences. Monterey, CA: Brooks/Cole.

Klatsky, G. J., Teitelbaum, R. C., Mezzanotte, R. J., & Biederman, I. (1981). Mandatory processing of the background in the detection of objects in scenes. Proceedings of the Human Factors Society, 25, 272 - 276.

Kroll, L E, & Potter, M. C, (1984). Recognizing words, pictures and concepts: A comparison of lexical, object and reality decisions. Jour- nal of Verbal Learning and Verbal BehavioJ, 23, 39 - 66.

Locher, R J., & Nodine, C. E (1987). Symmetry catches the eye. In J. K. O'Regan & A. L4vy-Schoen (Eds.), Eye Movements: From Physiol- ogy to Cognition (pp. 353 - 361) North-Holland: Elsevier.

Loftus, G. R. (1983). Eye fixations on text and scenes. In K. Rayner (Ed.), Eye movements in reading (pp. 359 - 376). New York: Aca- demic Press.

Loftus, G. R., & Mackwortla, N. H. (1978). Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 4,565 - 572.

Loftus, G. R., Nelson, W. W., & Kallman, H. J. (1983). Differential acquisition rates for different types of information from pictures. Quarterly Journal of Experimental Psychology, 35A, 187 - 198.

McArthur, D. J. (1982). Computer vision and perceptual psychology. Psychological Bulletin, 92,283 - 309.

Mart, D. (1978). Representing visual information: A computational ap- proach. In A. R. Hanson & E. M. Riseman (Eds.), Computer vision systems (pp. 61 - 80). Orlando, FL: Academic Press.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman.

Metzger, R. L., Antes, J. R. (1983). The nature of processing early in picture perception. Psychological Research, 45, 267 - 274.

Nodine, C. E, Carmody, D. R, & Kundel, H. L. (1978). Searching for NINA. In J. W. Senders, D. E Fisher, & R. A. Monty (Eds.), Eye movements and the higher psychological functions (pp. 241 -258). Hillsdale, NJ: Erlbaum.

Palmer, S. E. (1975). Visual perception and world knowledge: Notes on a model of sensory-cognitive interaction. In D. A. Norman & D. E. Rumelhart (Eds.), Explorations in cognition (pp. 2 7 9 - 307). San Francisco: Freeman.

Rayner, K., & PoUatsek, A. (1987). Eye movements in reading: A tutorial review. In M. Coltheart (Ed.), Attention and Performance XII, (pp. 327 - 362). London: Erlbaum.

Riseman, E. M., & Hanson, A. R. (1987). A methodology for the devel- opment of general knowledge-based vision systems. In M. Arbib & A. Hanson (Eds.), Vision, brain and cooperative computation (pp. 285 - 328). Cambridge: MIT Press.

Schank, R. (1982). Dynamic: memory. A theory of reminding and learning in computers andpeople. Cambridge: Canabridge University Press.

Verfaillie, K., & Wagemans, J. (1987). Constraints in perception and cognition of objects, scenes and events. Psychological Report, 72, Laboratory of Experimental Psychology, University of Leuven, Bel- gium.

Young, L. R., & Sheena, D. (1975). Survey of eye movement recording methods. Behavior Research Methods & Instrumentation, 7, 397 - 429.