syntactic ambiguity resolution in discourse: modeling the effects of referential context and lexical...

23
Journal of Experimental Psychology: I.earning, Memory, and Cognition 1998, Vol. 24, No. 6, 1521-1543 Copyright 1998 by the American Psychological Association, Inc. 0278-7393/98/$3.00 Syntactic Ambiguity Resolution in Discourse: Modeling the Effects of Referential Context and Lexical Frequency Michael J. Spivey Cornell University Michael K. Tanenhaus University of Rochester Sentences with temporarily ambiguous reduced relative clauses (e.g., The actress selected by the director believed that...) were preceded by discourse contexts biasing a main clause or a relative clause. Eye movements in the disambiguating region (by the director) revealed that, in the relative clause biasing contexts, ambiguous reduced relatives were no more difficult to process than unambiguous reduced relatives or full (unreduced) relatives. Regression analyses demonstrated that the effects of discourse context at the point of ambiguity (e.g., selected) interacted with the past participle frequency of the ambiguous verb. Reading times were modeled using a constraint-based competition framework in which multiple constraints are immediately integrated during parsing and interpretation. Simulations suggested that this framework reconciles the superficially conflicting results in the literature on referential context effects on syntactic ambiguity resolution. The question of how expectations created by discourse context are used in syntactic ambiguity resolution has been the subject of numerous experiments during the past decade (for a recent review, see Tanenhaus & Trueswell, 1995). Most of these studies have been motivated by contrasting claims made by theories in which ambiguity resolution is primarily guided by discourse-based principles, for ex- ample, minimizing new presuppositions while continuously updating a discourse model (cf. Altmann & Steedman, 1988; Crain & Steedman, 1985), and theories in which discourse information is used only to evaluate and, if necessary, revise an initial structure assigned according to simplicity-based structural principles (cf. Frazier, 1987). The typical study has used sentences with sequences of words that are temporarily ambiguous between a structure that modifies a definite noun phrase and a structure that introduces a new discourse event or entity. In neutral contexts, the modification analysis is typically the less preferred interpretation, resulting in increased processing difficulty if the sentence is disambiguated in favor of the noun phrase modification analysis. For example, The actress selected ... is temporarily ambiguous between a main clause in which selected is introducing a selecting event with Michael J. Spivey, Department of Psychology, Comell Univer- sity; Michael K. Tanenhaus, Department of Brain and Cognitive Sciences, University of Rochester. This work was supported by a National Science Foundation Graduate Research Fellowship and by National Institutes of Health Grant HD27206. We thank Kathleen Eberhard, Chuck Clifton, and three anonymous reviewers for helpful comments on an earlier version of the article; Greg Stevens and Alex Reed for assistance in data collection; and Ken McRae for extensive discussions about the modeling. Correspondence concerning this article should be addressed to Michael J. Spivey, Department of Psychology, Cornell University, Ithaca, New York 14853-9365. Electronic mail may be sent to mjs41 @cornell.edu. the actress as the agent, as in The actress selected a new costume and a relative clause in which selected modifies the actress as in The actress selected by the director believed that her performance was perfect. Discourse context is manipulated by using contexts that introduce either one or two possible referents for the definite noun phrase. Two- referent contexts (e.g., two actresses were auditioning for a play; the director chose one of the actresses but not the other) provide discourse support for a modification analysis, because modification is required to disambiguate the refer- ent of the noun phrase. In contrast, a context that introduces a unique referent (e.g., An actress and the producer's niece were auditioning for a play. The director chose the actress but not the niece) allows the definite noun phrase in the target sentence to be immediately integrated into the dis- course model and thus supports a nonmodification analysis. Unfortunately, the results of this literature have been largely inconclusive (for a review, see Spivey-Knowlton & Tanenlaaus, 1994; Tanenhaus & Trueswell, 1995). Whereas some studies have found clear effects of referential context (Altmann, Garnham, & Dennis, 1992; Altmann, Garnham, & Henstra, 1994; Altmann & Steedman, 1988; Britt, 1994; Britt, Perfetti, Garrod, & Rayner, 1992; Spivey-Knowlton, Trueswell, & Tanenhaus, 1993; Tanenhaus, Spivey-Knowl- ton, Eberhard, & Sedivy, 1995; van Berkum, Hagoort, & Brown, 1998), others have found weak or delayed effects (Britt et al., 1992, Experiment 3; Clifton & Ferreira, 1989; Ferreira & Clifton, 1986; Liversedge, 1994; Mitchell, Cor- ley,& Garnham, 1992; Murray & Liversedge, 1994; Spivey- Knowlton, TruesweU & Tanenhaus, 1993, Experiment 3). Recently, several research groups have argued that con- straint-based models of ambiguity resolution provide a useful framework for rationalizing the superficially contra- dictory literature on discourse context effects (MacDonald, Pearlmutter, & Seidenberg, 1994; Spivey-Knowlton, Trueswell, & Tanenhaus, 1993). Constraint-based models claim that the comprehension system continuously inte- grates multiple constraints to converge on a consistent 1521

Upload: independent

Post on 20-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Experimental Psychology: I.earning, Memory, and Cognition 1998, Vol. 24, No. 6, 1521-1543

Copyright 1998 by the American Psychological Association, Inc. 0278-7393/98/$3.00

Syntactic Ambiguity Resolution in Discourse: Modeling the Effects of Referential Context and Lexical Frequency

Michael J. Spivey Cornell University

Michael K. Tanenhaus University of Rochester

Sentences with temporarily ambiguous reduced relative clauses (e.g., The actress selected by the director believed that . . . ) were preceded by discourse contexts biasing a main clause or a relative clause. Eye movements in the disambiguating region (by the director) revealed that, in the relative clause biasing contexts, ambiguous reduced relatives were no more difficult to process than unambiguous reduced relatives or full (unreduced) relatives. Regression analyses demonstrated that the effects of discourse context at the point of ambiguity (e.g., selected) interacted with the past participle frequency of the ambiguous verb. Reading times were modeled using a constraint-based competition framework in which multiple constraints are immediately integrated during parsing and interpretation. Simulations suggested that this framework reconciles the superficially conflicting results in the literature on referential context effects on syntactic ambiguity resolution.

The question of how expectations created by discourse context are used in syntactic ambiguity resolution has been the subject of numerous experiments during the past decade (for a recent review, see Tanenhaus & Trueswell, 1995). Most of these studies have been motivated by contrasting claims made by theories in which ambiguity resolution is primarily guided by discourse-based principles, for ex- ample, minimizing new presuppositions while continuously updating a discourse model (cf. Altmann & Steedman, 1988; Crain & Steedman, 1985), and theories in which discourse information is used only to evaluate and, if necessary, revise an initial structure assigned according to simplicity-based structural principles (cf. Frazier, 1987).

The typical study has used sentences with sequences of words that are temporarily ambiguous between a structure that modifies a definite noun phrase and a structure that introduces a new discourse event or entity. In neutral contexts, the modification analysis is typically the less preferred interpretation, resulting in increased processing difficulty if the sentence is disambiguated in favor of the noun phrase modification analysis. For example, The actress selected . . . is temporarily ambiguous between a main clause in which selected is introducing a selecting event with

Michael J. Spivey, Department of Psychology, Comell Univer- sity; Michael K. Tanenhaus, Department of Brain and Cognitive Sciences, University of Rochester.

This work was supported by a National Science Foundation Graduate Research Fellowship and by National Institutes of Health Grant HD27206. We thank Kathleen Eberhard, Chuck Clifton, and three anonymous reviewers for helpful comments on an earlier version of the article; Greg Stevens and Alex Reed for assistance in data collection; and Ken McRae for extensive discussions about the modeling.

Correspondence concerning this article should be addressed to Michael J. Spivey, Department of Psychology, Cornell University, Ithaca, New York 14853-9365. Electronic mail may be sent to mjs41 @cornell.edu.

the actress as the agent, as in The actress selected a new costume and a relative clause in which selected modifies the actress as in The actress selected by the director believed that her performance was perfect. Discourse context is manipulated by using contexts that introduce either one or two possible referents for the definite noun phrase. Two- referent contexts (e.g., two actresses were auditioning for a play; the director chose one of the actresses but not the other) provide discourse support for a modification analysis, because modification is required to disambiguate the refer- ent of the noun phrase. In contrast, a context that introduces a unique referent (e.g., An actress and the producer's niece were auditioning for a play. The director chose the actress but not the niece) allows the definite noun phrase in the target sentence to be immediately integrated into the dis- course model and thus supports a nonmodification analysis.

Unfortunately, the results of this literature have been largely inconclusive (for a review, see Spivey-Knowlton & Tanenlaaus, 1994; Tanenhaus & Trueswell, 1995). Whereas some studies have found clear effects of referential context (Altmann, Garnham, & Dennis, 1992; Altmann, Garnham, & Henstra, 1994; Altmann & Steedman, 1988; Britt, 1994; Britt, Perfetti, Garrod, & Rayner, 1992; Spivey-Knowlton, Trueswell, & Tanenhaus, 1993; Tanenhaus, Spivey-Knowl- ton, Eberhard, & Sedivy, 1995; van Berkum, Hagoort, & Brown, 1998), others have found weak or delayed effects (Britt et al., 1992, Experiment 3; Clifton & Ferreira, 1989; Ferreira & Clifton, 1986; Liversedge, 1994; Mitchell, Cor- ley,& Garnham, 1992; Murray & Liversedge, 1994; Spivey- Knowlton, TruesweU & Tanenhaus, 1993, Experiment 3).

Recently, several research groups have argued that con- straint-based models of ambiguity resolution provide a useful framework for rationalizing the superficially contra- dictory literature on discourse context effects (MacDonald, Pearlmutter, & Seidenberg, 1994; Spivey-Knowlton, Trueswell, & Tanenhaus, 1993). Constraint-based models claim that the comprehension system continuously inte- grates multiple constraints to converge on a consistent

1521

1 5 2 2 SPIVEY AND TANENHAUS

interpretation (Bates & MacWhinney, 1989; McClelland, St. John, & Taraban, 1989; MacDonald, et al., 1994; Spivey- Knowlton & Sedivy, 1995; Spivey-Knowlton et al., 1993; Trueswell & Tanenhaus, 1994). Specific models differ in their details, but as a class they share two common features: (a) multiple constraints are combined to compute alternative interpretations in parallel and (b) the alternatives compete with one another during processing. Crucially, constraint- based models predict that the effectiveness of contextual constraints will be modulated by the strength of other relevant constraints.

Important supposing evidence comes from two recent self-paced reading studies using prepositional phrase ambi- guities (Britt, 1994; Spivey-Knowlton & Sedivy, 1995). In these studies, the noun inside the prepositional phrase (e.g., The man fixed the door with the rusty lock/screwdriver) determined whether the prepositional phrase modified the noun phrase (with the rusty lock) or the verb phrase (with the rusty screwdriver). Reading times beginning at the noun indicated whether one of the resolutions of the ambiguity was easier than the other. Both studies found that referential manipulations were most effective when verb-based biases were weakest.

However, neither of these studies either quantified the strength of the contributing constraints or provided an explicit mechanism linking constraint integration to process- ing difficulty. This makes it difficult to evaluate claims about whether context has immediate effects or whether it operates during a second stage of processing. Distinguishing among these alternatives using data from the prepositional phrase ambiguities used in these studies is further complicated because the studies did not use an unambiguous baseline (see, however, studies by Spivey-Knowlton, 1996, and Tanenhaus et al., 1995, reporting referential effects estab- lished by visual contexts using spoken sentences with prepositional phrase attachment ambiguities and an unam- biguous baseline).

This article examines the influence of discourse con- straints on syntactic ambiguity resolution using the relative clause-main clause ambiguity, which affords an unambigu- ous baseline. In order to consider some of the relevant constraints that enter into this ambiguity, consider the fragment the actress selected . . . . A reduced relative clause is temporarily ambiguous when it contains a verb that uses the same morphological form for the past tense and the passive participle. If selected is a past tense form, the actress is the subject of a main clause in the active voice and would typically play the semantic role of agent. In contrast, if selected is a passive participle, the actress is the object of a relative clause in the passive voice and would typically play the semantic role of theme or patient, though other roles are possible depending on the argument structure of the verb.

Argument structure is also important for evaluating postambiguity constraints that immediately follow the verb (MacDonald, 1994). For example, the verb selected can be used intransitively (The actress selected first) or transitively (The actress selected the lead role). If the verb is followed by a preposition, for example, by, in The actress selected by . . . . then a past tense use of selected would require an

intransitive argument structure. In contrast, a past participle use of selected would be transitive because the noun phrase preceding the verb (the actress) is the logical object. Thus, if the verb is typically used transitively, a preposition such as by immediately following the verb provides strong evidence for a relative clause. If the verb is typically used with an intransitive argument structure, then a preposition is still consistent with a main clause.

This analysis suggests that, even setting aside discourse factors, a variety of constraints should be important for resolving the reduced relative ambiguity (MacDonald et al., 1994). These include, but are not limited to (a) the semantic fit between the first noun phrase and the different thematic roles it would play in a main clause versus a relative clause; (b) the frequency with which the ambiguous verb occurs as a participle in a passive clause as compared with a past tense in an active clause; (c) the frequency with which the verb occurs with certain argument structures; and (d) the informa- tion provided by postambiguity constraints.

There is now a substantial literature demonstrating the importance of each of these constraints for the processing of reduced relative clauses (Burgess, Tanenhaus, & Hoffman, 1994; MacDonald, 1994; MacDonald et al., 1994; Pearlmut- ter & MacDonald, 1995; Tabossi, Spivey-Knowlton, McRae, & Tanenhaus, 1994; Trueswell, 1996; Trueswell, Tanenhaus, & Garnsey, 1994). There is also evidence that these con- straints interact. MacDonald et al. (1994) presented a meta-analysis showing that contextual manipulations have been most successful in studies using verbs that are fre- quently used as participles. Trueswell (1996) has shown that semantic fit interacts with participle frequency as estimated using the Francis and Ku~era (1982) frequency counts. MacDonald (1994) has demonstrated interactions among semantic fit, preferred argument structure, and strength of postambiguity constraints. Burgess et al. (1994) showed that thematic fit interacts with the availability of constraining parafoveal information.

The importance of these constraints can be highlighted by comparing the clear garden-path sentence The horse raced past the barn fell with the sentence The land mine buried in the sand exploded. The latter sentence has the same structure but does not cause noticeable processing difficulty. In The horse raced past the barn fell, the horse is a typical agent of a racing event, raced is used about 10 times more frequently as a past tense than as a past participle (Francis & Ku~era, 1982), and it is typically used intransitively. All of these factors strongly bias the main clause interpretation. In The land mine buried in the sand exploded, the land mine is an atypical agent of a burying event, but a good theme, buried occurs about 5 times more frequently as a past participle than as a past tense (Francis & Ku~era, 1982), and it is typically used transitively. Thus all of the constraints conspire in favor of the relative clause interpretation.

From a constraint-based perspective, then, the fact that several previous studies using referential contexts with reduced relatives have failed to find context effects (e.g., Britt et al., 1992; Ferreira & Clifton, 1986; Murray & Liversedge, 1994) may be due to other constraints being so strongly biased in favor of the main clause that discourse

DISCOURSE AND LEXICAL CONSTRAINTS 1523

effects were masked. Because the recent literature has provided insights into the nature of these constraints, it is possible to create materials in which discourse context is more likely to exhibit strong effects, therefore allowing a clear test of the hypothesis that discourse constraints can apply immediately.

The research reported here had two primary goals. The first goal was to determine whether discourse context would have immediate effects on processing reduced relative clauses under conditions where opposing constraints would not be expected to overwhelm the context effects. The second goal was to model the results using a multiple constraints approach with an explicit competition algorithm (e.g., Spivey-Knowlton, 1994; Stevenson, 1994). The mod- eling was undertaken to evaluate the claim that discourse context interacts with other constraints and to see whether superficially conflicting patterns of results in the literature could be accounted for solely by differences in the strength of constraints due to differences in the materials. The general framework of the model will be presented in the introduction to Experiment 1, after we have described our stimulus materials.

Exper iment 1

This experiment was designed to test whether discourse contexts would reduce or eliminate processing difficulty for reading temporarily ambiguous reduced relative clauses compared with an unambiguous baseline. Reduced relative clauses with morphologically unambiguous verbs were used as the baseline, allowing us to compare ambiguous and unambiguous sentences with the same structure and number of words, as recommended by Ferreira and Henderson (1993).

Table 1 presents a sample set of materials. These materials were previously used in self-paced reading studies reported in Spivey-Knowlton et al. (1993). A complete listing of the materials can be found in the appendix to that article. In the

Table 1 Sample Materials From Experiments 1 and 2

Stimulus group Example text

One-referent context An actress and the producer's niece were auditioning for a play. The director selected/chose the actress but not the niece.

Two actresses were auditioning for a play. The director selected/chose one of the actresses but not the other.

The actress/selected/by the director/ believed that/her performance was perfect.

The actress/chosen/by the director/be- lieved that/her performance was per- fect.

The actress/who was/selected/by the director/believed that/ber perfor- mance was perfect.

Two-referent context

Ambiguous reduced relative target sen- tence

Unambiguous reduced relative target sen- tence (Exp. 1)

Full (unreduced) rela- tive target sentence (Exp. 2)

Note. Critical recording regions are shown in target sentences. Exp. = experiment.

present experiments, we used the same experimental stimuli and design as in Spivey-Knowlton et al. (1993). However, the experimental measure in the present study, eye move- ments during reading, is often considered to be more sensitive to initial syntactic processing than is self-paced reading (e.g., Rayner, Sereno, Morris, Schmauder, & Clifton, 1989).

The target sentences (see Table 1) began with a noun phrase-verb-prepositional phrase sequence. The noun phrase always began with a definite article followed by an animate noun (e.g., The actress). For the ambiguous sentences, the verb form was morphologically ambiguous between a simple past tense and a passive participle (e.g., selected). Unambiguous sentences were constructed either by replac- ing the ambiguous verb (e.g., selected) with a similar verb for which the past participle form was morphologically unambiguous (e.g., chosen, in Experiment 1) or by using a full relative clause (e.g., who was selected, in Experiment 2). The prepositional phrase always began with the preposition by and introduced an agent (e.g., by the director). The main clause biasing context introduced a single referent for the initial noun phrase in the target sentence (e.g., an actress), whereas the relative clause biasing context introduced two potential referents (e.g., two actresses).

Figure 1 illustrates some of the constraints that are most relevant for these materials. Discourse context is one source of constraint, with one-referent contexts biasing the main clause more strongly than the relative clause, and two- referent contexts biasing the relative clause more than the main clause. The relative frequency with which the ambigu- ous verb occurs as a simple past tense and as a passive participle is another important constraint. Trueswell (1996) has shown that the more frequently the verb is used as a passive participle, the stronger the support for the relative clause structure. We should note that past participle fre- quency, as measured by Francis and Ku~era (1982) counts, is only a rough estimate of the tense and voice bias associated with a verb's -ed form. 1 Nonetheless, it captures a significant portion of the item-specific variance associated with indi- vidual verbs in reduced relative constructions (MacDonald et al., 1994; Trueswell, 1996).

The prepositional phrase that follows the verb in these materials provides more evidence for the relative clause than the main clause. We used the preposition by because it provides a strong constraint in support of a relative clause: by after an -ed verb is typically used to introduce an agent in a passive construction (Hanna, Barker, & Tanenhaus, 1995;

In general, the participle form of a verb can in fact have any tense and is not always in the passive voice, and frequency counts from Francis and Ku~era (1982) collapse across all of these uses. However, for the purposes of distinguishing between a main clause and a reduced relative, in sentences where there is no auxiliary verb preceding the morphologically ambiguous verb, the participle form of that verb is typically in the past tense and it must be in the passive voice. Thus, the coarseness of this metric as an indicator of past participle availability limits the degree to which these simple verb-form frequency counts can be expected to account for item-by-item variance in reading times.

1524 S P I V E Y A N D T A N E N H A U S

Provisional Interpretation of Syntactic Ambiguity

°,,.~ O

r~

[..,

>

Reduced "~ t r Main Relative ] k Clause

,<

~..d° O

Main Clause Bias Discourse Information

Figure 1. A schematic diagram of the competitive integration model. The syntactic alternatives receive input from four sources of constraint, which then receive feedback from the syntactic alternatives. The sources of constraint are defined in terms of the amount of probabilistic support they provide for the reduced relative and for the main clause. The dashed-line boxes indicate domains of normalization, which produce probabilistic competition. Prob. = probability; RR = reduced relative; MC = main clause.

McRae, Spivey-Knowlton, & Tanenhaus, 1998). Also, read- ers typically do not fixate separately on by but instead process it parafoveally when fixating on the preceding verb (Trueswell et al., 1994). Thus, for the purposes of this study, all of these constraints can be treated as being more or less available when the verb is being processed.

Finally, we are also assuming a configurational bias favoring the main clause over the relative clause. A sentence initial sequence of noun phrase-verb -ed is more typically the beginning of a main clause than the beginning of a reduced relative clause (MacDonald et al., 1994; McRae et al., 1998; Tabossi et al., 1994). For present purposes, we remain agnostic about whether this configurational bias is best characterized at a structural level (e.g., Gibson, Sch- utze, & Salomon, 1996; Mitchell, Cuetos, Corley, & Brysba- ert, 1995; Stevenson, 1994) or whether it emerges from other more local constraints (e.g., Juliano & Tanenhaus, 1995; MacDonald et al., 1994; Tabor, Juliano, & Tanenhaus, 1997; Trueswell, 1996). Treating the clause bias as a unitary constraint allows the model to remain neutral between constraint-based models that include conditional probabili-

ties based on categories along with other constraints (e.g., Jurafsky, 1996) and models in which the configurational bias would be eliminated when it was decomposed into other lexical constraints (MacDonald et al., 1994). In addition, treating the configurational bias as a separate constraint also allows one to simulate a two-stage model by having it precede other constraints (McRae et al., 1998). Note, however, that in the simulations of constraint-based models we are not treating the configurational bias as the sole parsing principle of a first stage in processing that precedes the use of other constraints as in models like the two-stage garden-path model of Frazier and colleagues (Frazier, 1987).

Although the constraints we selected are clearly estab- lished in the literature, there are other constraints that we did not directly model. Verb argument structure preferences and the thematic fit of the noun phrase to the agent and patient roles of the verb are perhaps the most relevant of these constraints. The extent to which our general modeling effort is compromised by not including certain item-specific constraints depends on the degree to which the items in the experiment vary along that dimension. The greater the

DISCOUgSV. ~ D t ~ v a c ~ c o ~ s ~ r r s 1525

variability, the more important it is to include the constraint. It also depends on the extent to which the constraint is subsumed by or correlated with other constraints.

The most important dimension of argument structure for the materials we used is likely to be transitivity. All of our verbs had strong transitive preferences, as suggested by the null context sentence completions in Spivey-Knowlton et al. (1993). In addition, verb form frequency is correlated with transitivity (Trueswell, 1996). Thus, the verb form fre- quency constraint captures some of the item-specific variabil- ity due to argument structure and the overall transitivity bias should be captured by the main clause bias. However, it seems likely that there is item-specific variability in the materials that is only captured by argument structure. Therefore, the current model will account for less of this variance than it would had argument structure been incorpo- rated as a separate constraint. One clear example of where argument structure and verb form frequency conflict in our stimuli is the verb watched. Watched occurs much more frequently as a simple past than as a past participle (Francis & Ku~era, 1982); however, it exhibited a high transitive bias in the verb frame preference norms of Connine, Ferreira, Jones, Clifton, and Frazier (1984). Nonetheless, despite not accounting for some of the variance due to argument structure, the constraints we did select were sufficient to capture a significant amount of item-specific variance.

We also did not include thematic fit as a separate constraint. There were several reasons for this. The first is that there was little range in how typical the nouns were as themes for the verb they occurred with. Secondly, in the stimulus items we used, the typicality o f the nouns as agents for specific verbs happened to be correlated with verb form frequency for that verb. Stepwise regressions correlating item-specific ambiguity effects with thematic fit and verb form frequency showed that frequency was the better predictor, with no significant effects of goodness of agent once frequency was partialled out. Finally, it is not clear to what extent prototypical thematic fit remains a relevant constraint when the noun phrase refers to a previously established discourse entity whose role in the event referred to by the verb is strongly constrained by the discourse, as was the case for the materials used here.

We are assuming a constraint-based framework in which ambiguity resolution is accomplished by integrating mul- tiple sources of information. During processing, each con- straint provides some degree of support for the main clause or the relative clause structures. These alternatives compete for activation, computed as probabilities. As Figure 1 illustrates, discourse constraints will compete with other constraints. Nonetheless, with the current materials, one would expect to see strong effects of referential context because the conflgurational bias will be somewhat counter- balanced by parafoveal information that supports the relative clause. Under these conditions, two-referent contexts should sharply reduce or even eliminate the main clause preference that would normally be present beginning at the verb. Referential context should also interact with tense fre- quency, a prediction that we explored in Experiment 2. For one-referent contexts, however, there should still be an

initial main clause preference which, as the sentence un- folds, shifts to a relative clause preference as additional evidence supports the relative clause interpretation.

'Me~od Participants. Twenty undergraduate students from the Univer-

sity of Rochester participated for course credit. Materials. Initial construction of stimulus materials produced

24 sentences, each with a one-referent context and a two-referent context. Sentence completions on these items revealed that 8 of them did not exhibit a strong contextual constraint (cf. Spivey- Knowlton et al., 1993). To avoid testing for immediate influences from contexts that are not strongly constraining, these items were not used in the eye-tracking study. The remaining 16 sentences, with their one- and two-referent contexts, were identical to those used in self-paced reading experiments by Spivey-Knowiton et al. (1993).

The eye-tracking experiment had a 2 × 2 factorial design with context (one-referent or two-referents) and ambiguity (ambiguous reduced relative or unambiguous reduced relative) as the indepen- dent variables. Four of the 16 target sentences were assigned to each of the four experimental conditions, which were rotated to create four versions of each stimulus. Each participant was exposed to only one of the four stimulus lists, and therefore to only one version of any one experimental item. (Consequently, stimulus list was later entered into the statistical analyses as a between-subjects and a between-items factor--thus making the degrees of freedom in those analyses equal to N - 4.) The 16 experimental stimuli were randomly embedded within 32 filler stimuli, with at least one filler stimulus intervening every 2 experimental stimuli. All of the experimental stimuli and half of the filler stimuli were followed by yes-no questions to verify that participants were reading carefully.

In constructing the filler items, care was taken to avoid predict- able contingencies that might allow participants to become aware of the experimental manipulation. For example, we wished to avoid having participants be able to induce that when an identical pair of referents (e.g., Two actresses) was introduced in context, then a relative clause was likely to follow. Of the 32 filler contexts, 12 began with a pair of identical referents (e.g., Two ministers were walking . . . or Two infants were playing . . . ) , much like the two-referent experimental contexts, but ended with temporarily ambiguous main clause sentences that referred to both entities, (e.g., they). Sixteen of them began with a pair of different referents (e.g., a mathematician and a physicist were discussing . . . or a policeman and a sheriff were working . . . ) , much like the one- referent experimental sentences. Finally, 4 filler contexts described events primarily involving only one participant. Additionally, of the 32 filler stories (with 3-4 sentences each), 4 contained relative clauses (all of which were in contexts that introduced a pair of different referents). All other filler sentences were main clauses without embedded relatives.

Previous data with these same stimuli and fillers, from self-paced reading (Spivey-Knowlton et al., 1993), showed significant effects of processing difficulty (in the one-referent context) ranging from 55 ms to 110 ms. Using a conservative typical effect size of 65 ms, and a typical standard deviation of 110 ms, we computed a power analysis for the present experimental design. For subject analyses (n = 20), the design yielded a power of .84, and for item analyses (n = 16), the design yielded a power of .77.

Procedure. Contexts and sentences were presented on a 13- inch color monitor, one line at a time. Participants read the sentences by pressing the mouse button to present each new line of text. Aline of text spanned no more than 20 degrees of visual angle,

1526 SPIVEY AND TANENHAUS

with each character and space taking up approximately 15 min arc. The critical recording regions of the target sentences were always contained on one line of text. Eye movements were monitored with a Dr. Bouis Oculometer that measured horizontal eye position continuously (the software sampled this analog signal at 1000 Hz) with accuracy to within 20 rain arc. Stimulus presentation and the eye movement record were controlled by a Macintosh II computer. The 9articipant's head was held motionless during trials by a dental bite bar.

At the beginning of the session, the participant's horizontal eye positions were calibrated to horizontal screen positions, and a practice session of 8 trials was conducted. Participants were instructed to read each short story of three or four sentences as naturally as possible and to answer questions accurately. In both practice and experimental sessions, accuracy of the eye position signal was checked in between each trial and adjusted if necessary. Trials in which the track was inaccurate, or in which the partici- pant's first fixation of the target sentence was not at the beginning of the sentence (11% of all critical trials), were excluded from further analysis. The data were not transformed in any other way. During the experimental session of 48 trials, participants were encouraged to remove themselves from the bite bar for a short break in between trials whenever they needed to for comfort. The entire session took approximately 1 hr.

Results

All participants answered the questions with 80% or better accuracy, the cutoff we used for including a partici- pant's data. For eye-movement analysis, the target sentences were segmented into four critical regions: initial noun phrase, verb, by phrase, main verb + one word. The verb region marks the introduction of ambiguity. The by phrase provides strong probabilistic disambiguation but is still formally syntactically ambiguous. The sentence could still turn out to be an intransitive main clause with, for example, a locative by phrase (i.e., The boy stood by the telephone pole). Recall, however, that all of the verbs had strong transitive preferences, making this an unlikely type of continuation. Finally, the main verb region provides com- plete syntactic disambiguation of the ambiguity as a reduced relative clause.

Reading times. We begin, in Table 2, with a global measure of processing difficulty, total reading time, and we follow this with a more fine-grained analysis, first-pass reading time. Total reading time was measured as the mean

total time that participants spent in each region, including regressive fixations (rereadings). Clearly, the syntactic ambi- guity in the ambiguous reduced relative produced substan- tial processing difficulty at the by phrase in the one-referent context but not in the two-referent context.

Analyses of variance (ANOVAs) were computed on total reading times at the verb region and the by phrase region; see top half of Table 3. Results showed a significant increase in total reading time at the verb due to syntactic ambiguity. However, at the by phrase, the interaction between context and ambiguity (Table 3, middle) shows that the difference between ambiguous and unambiguous sentences was reli- ably modulated by context.

Nonetheless, results from total reading times may conflate initial processing effects and "garden-path recovery" effects (Rayner et al., 1989). We therefore examined first-pass reading times, which separate processing effects during the first forward pass through a region from processing effects during regressive eye movements. Table 4 presents the first-pass reading times for each region. Note, in the one-referent context, the substantial difference between ambiguous and unambiguous first-pass reading times at the by phrase (65 ms), compared with the absence of such a difference in the two-referent context ( - 7 ms). This sug- gests that readers experienced difficulty in processing the relative clause in the one-referent context but not in the two-referent context.

ANOVAs computed for first-pass reading times are shown in the bottom half of Table 3. The only consistently reliable result for first-pass reading times was the interaction be- tween context and ambiguity at the by -phrase. This crucial interaction at the by phrase is highlighted in Figure 2. There were no significant effects at the main verb region. Nonethe- less, in the two-referent context, reading times were 21 ms longer at the ambiguous verb compared with the unambigu- ous verb. There are two possible explanations for this effect. First, it could reflect a rapid revision process as argued for by two-stage models. Secondly, it could reflect the fact that the amount of competition for reduced relatives in two-referent contexts depends on lexically specific factors, with signifi- cant competition occurring for those items whose verbs were more biased toward the main clause. This predicts a correlation, in the two-referent context condition, between

Table 2 Mean Total Reading Times (in Milliseconds) Across All Four Regions in Experiment 1

Ambiguity condition

Target sentence

The actress selected (chosen) by the director believed that

One-referent context

Ambiguous relative 292 331 ' 496 439 Unambiguous relative 308 320 417 436 Ambiguous - unambiguous - 16 11 79 3

Two-referent context

Ambiguous relative 317 312 385 431 Unambiguous relative 285 291 413 424 Ambiguous - unambiguous 32 21 -28 7

DISCOURSE AND LEXICAL CONSTRAINTS

Table 3 Analyses of Variance for Reading Times From Experiment 1

1527

Reading time analysis F df MSE p

Total reading times at verb Context

Participants 2.383 1,16 6,573 .142 Items 2.020 1,12 8,406 .176

Ambiguity Participants 11.031 1,16 6,688 .004"* Items 6.610 1,12 6,213 .021"

Context x Ambiguity Participants 0.220 1,16 11,668 .646 Items 0.530 1,12 6,308 .478

Total reading times at by phrase Context

Participants 2.487 1,16 23,715 .134 Items 1.575 1,12 29,284 .229

Ambiguity Participants -0.027 1,16 20,800 .871 Items 0.033 1,12 14,961 .858

Context × Ambiguity Participants 6.201 1,16 5,876 .024* Items 7.610 1,12 5,911 .015"

First-pass reading times at verb Context

Participants 0.598 1,16 1,745 .451 Items 0.219 1,12 3,039 .646

Ambiguity Participants 0.805 i, 16 2,007 .383 Items 0.237 I, 12 2,347 .634

Context X Ambiguity Participants 4.608 1,16 1,116 .047" Items 0.806 1,12 3,049 .383

First-pass reading times at by phrase Context

Participants 5.481 1,16 7,691 .032* Items 2.461 1,12 9,980 .143

Ambiguity Participants 2.263 1,16 7,499 .152 Items 0.319 1,12 3,626 .583

Context × Ambiguity Participants 5.195 1,16 5,129 .037" Items 7.862 1,12 2,997 .016*

*p < .05. **p < .01.

Table 4 Mean First-Pass Reading Times (in Milliseconds) Across All Four Regions in Experiment 1

Ambiguity condition

Target sentence

The actress selected (chosen) by the director believed that

One-referent context

Ambiguous relative 253 255 400 349 Unambiguous relative 270 263 335 372 Ambiguous - unambiguous - 17 - 8 65 - 2 3

Two-referent context

Ambiguous relative 261 263 318 370 Unambiguous relative 267 242 325 358 Ambiguous - unambiguous - 6 21 - 7 12

420

One-Referent 400'

.~ 380' [..,

~ 360.

~ 340-

~ 320.

300

-Referents

1528 SPIVEY AND TANENHAUS

! !

Ambiguous Unambiguous

Relative Clause Condition

Figure 2. Experiment 1. First-pass reading times at the by phrase for ambiguous and unambiguous reduced relative clauses in both referential contexts.

processing difficulty and verb form frequency, with the processing difficulty carded by those items with the highest simple past tense frequencies. This prediction was explored in Experiment 2.

Regressive eye movements. In addition to reading times, it may also be informative to analyze the frequency of regressive eye movements to various regions of the sen- tence. Increased frequency of rereading certain regions in certain conditions may be indicative of increased processing difficulty. As seen in Table 5, the ambiguous sentence in the one-referent context typically showed more regressive eye movements than the other conditions. However, no effects were statistically significant in this analysis.

Our model assumes that by is typically available parafo- veally during fixation on the preceding verb. To see whether this was the case, we analyzed the probability of readers actually skipping (not fixating) the word by during the first pass through the sentence. If a reader fixates the verb and

then the eyes saccade past by to land somewhere in the subsequent noun phrase, it may be inferred that by was processed parafoveally during viewing of the verb. The overall mean probability of skipping by on the first pass through a target sentence was .68. This is consistent with previous experiments with similar sentences (Trueswell et al., 1994).

Discussion

The results showed clear effects of discourse context on the resolution of the relative clause ambiguity. In the one-referent context, reading times for the by phrase were significantly longer for ambiguous relative clauses com- pared with unambiguous controls, reflecting the usual main clause preference. However, when the context provided referential support for the relative clause, reading times were similar for the ambiguous and unambiguous reduced relatives.

Although these results clearly provide support for the hypothesis that discourse context can have immediate ef- fects on syntactic ambiguity resolution, as proposed by Crain and Steedman (1985) and Altmann and Steedman (1988), it is important to note that our experiments were not designed to provide a test of the presuppositional hypothesis proposed by these authors. For example, referential contexts like the ones that we, and others, have used may invoke a strong expectation in the reader that the future input will discriminate between the members of the set introduced in the context (Spivey-Knowlton, 1992; Spivey-Knowlton & Sedivy, 1995). Although it is possible to deconfound purely presuppositional manipulations from expectations (cf. Se- divy & Spivey-Knowlton, 1994; Spivey-Knowlton & Se- divy, 1995), we did not attempt to do so in the materials used in these experiments. In addition, for some of the target sentences, a main clause continuation of the ambiguous fragment would have been somewhat implausible in the one-referent context. For example, one of our context items described a dragon killing a knight, and the accompanying target sentence began The knight killed . . . . In such a context, it would be highly implausible for the knight to be the agent of the killing event. Thus, the effects of the referential context might have been augmented by effects of plausibility. Note, however, that these plausibility effects can only operate in conjunction with the context. In the absence of the context, a main clause continuation would have been

Table 5 Mean Number of Regressive Eye Movements to a Region: Experiment 1

Ambiguity condition

Target sentence

The actress selected (chosen) by the director believed that

One-referent context Ambiguous relative .21 Unambiguous relative .15

Two-referent context Ambiguous relative .18 Unambiguous relative .10

.38 .43 .50

.22 .41 .34

.26 .32 .33

.24 .49 .35

DISCOURSE AND LEXICAL CONSTRAINTS 1529

completely plausible (e.g., knights often kill). Thus, for plausibility to come into play, the reader must have been using the context to interpret the referential noun phrase and the ambiguous verb.

Exper imen t 2

This experiment had two goals. The first goal was to replicate the results of Experiment 1. This is important because several previous studies have not found immediate referential effects with the reduced relative ambiguity (Britt et al., 1992; Ferreira & Clifton, 1986; Murray & Liversedge, 1994). The second goal was to examine the hypothesis that referential constraints interact with verb-specific constraints. The constraint that is most likely to have strong effects with our materials is the relative frequency with which the -ed form of the verb is used as a past tense and as a past participle. To correlate context effects with relative fre- quency, for individual verbs, it was necessary for us to compare the same verb in ambiguous and unambiguous relative clauses, unlike those used in Experiment 1. Thus, we compared reduced relative clauses (e.g., The actress selected by the director believed that her performance was perfect) with full, unreduced relatives (The actress who was selected by the director believed that her performance was perfect). Use of the same verbs also provides a data set that can be used for simulations with an implemented competition model.

Method

Participants. Twenty undergraduate students from the Univer- sity of Rochester participated for course credit.

Materials. The materials and experimental design were identi- cal to that of Experiment 1, except that the syntactically ambiguous sentence was compared with an unreduced relative clause, instead of a morphologically unambiguous reduced relative clause, and the contexts always used the verb that was in the target sentence (i.e., selected, not chose). Eye movements were recorded in the same manner as in Experirnent 1.

Procedure. Data were collected in the same manner as in Experiment 1. Because of inaccurate tracks, or the participant's first fixation of the target sentence not being at the beginning of the sentence, 9% of the critical trials were excluded from analysis.

Results and Discussion

All participants answered the questions with 80% or better accuracy. For eye-movement analysis, the target sentences were segmented into the same four regions as before. In the uureduced relative target sentence, the relative pronoun region, who was, was excluded from analysis because it has no counterpart for comparison in the ambigu- ous reduced relative target sentence.

Reading times. Table 6 shows the mean total reading time that participants spent in each recording region, includ- ing regressive fixations. Participants spent more time read- ing the reduced relative sentence when it was preceded by the one-referent context than in any other experimental condition. In the two-referent context, participants spent about the same amount of time reading the reduced relative sentence as the unreduced relative sentence. An ANOVA for the initial verb region found a main effect of reduction, with the reduced relative clauses being read more slowly than uureduced relatives; see Table 7. At the by phrase, there were marginal main effects of both context and relative clause reduction; see Table 7. Most important, there was a reliable interaction between context and relative clause reduction; see Table 7. There were no significant results at the main verb region.

First-pass reading times are presented in Table 8. In the ANOVA for the initial verb region itself, there were no significant effects. At the by phrase, there was a reliable interaction between context and reduction but no main effects; see Table 7. The interaction at the by phrase is highlighted in Figure 3. Note the similarity between Figure 2 (from Experiment 1) and Figure 3 (from Experiment 2). Finally, there were no significant effects at the main verb region.

Regressive eye movements. Table 9 shows that the ambiguous reduced relative in the one-referent context exhibited the highest frequency of regressive eye move- ments. An ANOVA revealed only marginal results. There was a suggestive main effect of context at the by phrase, Fl(1, 16) = 4.20, MSE = .1529,p = .057; F2(1, 12) = 4.38, MSE = .1257, p = .054. Additionally, there was an interaction between context and reduction at the main verb

Table 6 Mean Total Reading Times (in Milliseconds) Across All Four Regions in Experiment 2

Ambiguity condition

Target sentence

The actress selected by the director believed that

One-referent context

Reduced relative 345 401 555 582 Unreduced (full) relative 342 317 469 513 Reduced - unreduced 3 84 86 69

Two-referent context

Reduced relative 330 343 457 509 Unreduced (fuH)relative 322 317 456 484 Reduced - unreduced 8 26 1 25

1530 SPIVEY AND TANENHAUS

Table 7 Analyses of Variance for Reading Times From Experiment 2

Reading time analysis F df MSE p Total reading times at verb

Context Participants 0.867 1,16 13,578 .366 Items 1.385 1,12 4,212 .258

Reduction Participants 9.23 1,16 15,662 .008** Items 16.477 1,12 7,489 .001"*

Context × Reduction Participants 0.726 1,16 8,645 .407 Items 0.764 1,12 6,418 .396

Total reading times at by phrase Context

Participants 9.408 1,16 8,638 .007"* Items 3.688 1,12 18,003 .074

Reduction Participants 3.162 1,16 16,932 .060 Items 3.688 1,12 18,003 .074

Context × Reduction Participants 5.473 1,16 4,225 .033" Items 4.347 1,12 11,245 .050*

First-pass reading times at verb Context

Participants 0.064 1,16 8,480 .804 Items 0.516 1,12 5,715 .484

Reduction Participants 2.209 1,16 13,449 .157 Items 1.561 1,12 8,529 .231

Context x Reduction Participants 3.219 1,16 7,560 .092 Items 0.220 1,12 4,593 .884

First-pass reading times at by phrase Context

Participants 1.330 1,16 14,282 .266 Items 2.687 1,12 6,296 .127

Reduction Participants 1.838 1,16 8,459 .194 Items 1.333 1,12 11,748 .271

Context × Reduction Participants 9.017 1,16 4,023 .008"* Items 5.214 1,12 7,462 .041"

*p < .05. **p < .01.

Table 8 Mean First-Pass Reading Times (in Milliseconds) Across All Four Regions in Experiment 2

Ambiguity condition The actress

Target sentence

selected by the director believed that

Reduced relative Unreduced (full) relative Reduced - unreduced

One-referent context

293 329 299 256 - 6 73

477 451 406 490

71 - 3 9

Reduced relative Unreduced (full) relative Reduced - unreduced

Two-referent context

310 289 307 286

3 3

403 419 418 419 -15 0

500

+ One-Referent -Referents 480.

~ 460. ° ~

~ 440, .

N 4 2 0 -

~ 400,

380

DISCOURSE AND LEXICAL CONSTRAINTS 1531

! !

Reduced Unreduced

Relative Clause Condition

Figure 3. First pass reading times at the by phrase for reduced and unreduced (full) relative clauses in both referential contexts. Note the similarity between this interaction and that in Experiment 1 (see Figure 2).

that was reliable only by items, F1(1, 16) = 1.99, MSE = .2862,p > .1; F2(1, 12) = 7.32, MSE = . l l17 ,p < .02.

As in Experiment 1, we again analyzed the probability of readers skipping (not fixating) the word by during the first pass through the sentence. The overall mean probability of skipping by on the first pass through the sentence was .70. This is consistent with the overall mean observed in Experiment 1 and with previous work (Trueswell et al., 1994).

In summary, the critical interaction between context and relative clause reduction for first pass reading times at the by phrase indicated an immediate influence of discourse con- text in the resolution of syntactic ambiguity, as we also saw in Experiment 1. When the discourse context supported the main clause interpretation, substantial processing difficulty was observed for the reduced relative clause as compared

with the full (unreduced, unambiguous) relative clause. However, when the discourse context supported the relative clause interpretation by introducing two potential referents for the initial noun phrase, little or no processing difficulty was observed. The data pattern is similar to that obtained in Experiment 1 with the exception that reduction effects were observed earlier for the one-referent contexts.

Interactions Between Frequency and Context

To evaluate the hypothesis that referential context inter- acts with other relevant sources of constraint, we tested for a correlation between referential context effects and the rela- tive frequency with which each verb is used as a past participle. As Figure 4 illustrates, the higher the past participle frequency, the more support for the relative clause at the verb.

For the two-referent contexts, in which readers are biased by the discourse toward a relative clause reading, reading times for verbs with higher past participle frequencies should be faster compared with reading times for verbs with lower past participle frequencies. The logic behind this prediction is based on a competition assumption. If both the local context (verb frequency information) and the global context (referential discourse information) support the rela- tive clause alternative, there will be less competition be- tween the relative clause and main clause alternatives because the relative clause starts out with the vast majority of the probabilistic activation. In contrast, if the local context supports the main clause alternative (because of low frequency of the past participle form of the verb) while the global context still supports the relative clause alternative, both syntactic alternatives will be quite active and the competition will lead to substantial processing difficulty (cf. Spivey-Knowlton, 1994).

To estimate verb form frequency, we used the metric employed by Trueswell (1996). Word frequency effects, in general, follow a logarithmic function (e.g., Solomon & Postman, 1952). Therefore, we used the log of the frequency of the verb appearing as a past participle in raw tokens per million (taken from Francis & Ku~era, 1982). This number must then be normalized by the overall frequency of the verb appearing in any form. The resulting metric is log VBN/ IogBASE, where VBN is the frequency of the past participle

Table 9 Mean Number of Regressive Eye Movements to a Region: Experiment 2

Ambiguity condition

Target sentence

The actress selected by the director believed that

One-referent context

Reduced relative .18 .35 Unreduced (full) relative .15 .27

.38 .50

.35 .14

Two-referent context

Reduced relative .13 .22 Unreduced (full) relative .07 .11

.22 .39

.16 .37

1 5 3 2 SPIVEY AND TANENHAUS

3501

3001 O

~. 250

200

150

100

50 i

"~ 0 o

-50

-100

-15£ - • .45 .5

• ! m ! | I . . . . . . . " - • -

0 . . . . . . . . . .

O

- . . , . , . , - , . , . , - .

.55 .6 .65 .7 .75 .8 .85 .9 Past Participle Availability

Figure 4. Past participle availability (logVBN/IogBASE) pre- dicts the magnitude of initial processing difficulty (Reduced- Unreduced in milliseconds) at the verb, in the two-referent context. Each circle is a stimulus item. See text for results when the outlier is excluded.

for that verb and BASE is the overall frequency of that verb. 2 This value provides a rough estimate of the availability of the past participle form for a given verb. For example, the past participle availability of the verb selected is .89, and the past participle availability of watched is .48.

For each stimulus item in the two-referent context condi- tion of Experiment 2, past participle availability was entered into a regression analysis to predict the magnitude of processing difficulty in first-pass reading times of the verb. Processing difficulty was quantified as the difference in first-pass reading time for the verb when it appeared in the reduced relative clause versus the full relative clause. Using the same verb in the full relative clause as a baseline, we were able to factor out item specific differences that are unrelated to ambiguity. (However, it should be noted that the full relative clause is probably not a perfect zero baseline. In a strongly relative-clause-supporting context, it may be slightly infelicitous to use a full relative clause, and this may be responsible for some of the stimulus items having negative magnitudes of processing difficulty in the two- referent context.) As predicted by a competition-based account, there was a strong negative correlation between past participle availability and processing difficulty at the verb in the two-referent context; r 2 = .66, p < .001 (see Figure 4). The correlation is negative because there is less competition when both the context and the lexical frequency information provide converging evidence for the relative clause.

However, as can be seen in Figure 4, the distribution of stimulus items contains one very conspicuous outlier: the verb watched. The past participle availability for watched is 2.9 standard deviations below the mean for the set of 16 verbs. Although this item performed as predicted (i.e., its processing difficulty score was 3.0 standard deviations

above the mean), an outlier such as this violates the normality assumption in a regression analysis (see also the discussion by Murray and Liversedge, 1994). In addition, the measure of past participle availability used in the regression reported thus far assumes that frequency informa- tion is equally weighted for all verbs. This assumption is likely to be incorrect for two reasons. Most important, frequency estimates are likely to be less accurate when the corpus contains few exemplars. Second, the processing system might weight frequency information more strongly for more frequently occurring verbs. For example, a verb that appears twice per million words, and is used both times as a past participle, may not bias the reader toward a reduced relative clause as strongly as a verb that appears 100 times per million and always as a past participle. However, the log VBN/IogBASE metric would assign both verbs the same score. When watched and the three verbs with a base frequency of less than 20 per million were removed from the regression, the past participle still predicted processing difficulty at the verb; r 2 = .34, p < .05. 3 Regression analyses predicting item-by-item processing difficulty were also computed for stimuli in the one-referent contexts. In contrast to the two-referent context condition, thi,~ analysis did not reveal any significant correlations with past participle avail- ability, r 2 = .03, p > .5, nor when the four outlier items are excluded, r 2 = .10,p > .3.

The results of these regression analyses, in the two- referent context, indicate that reading times for the indi- vidual verb that introduced the syntactic ambiguity were modulated by the degree to which its relevant lexical frequencies supported a relative clause reading. This result provides support for MacDonald et al.'s (1994) claim that context effects are modulated by lexical frequencies, and it complements Trueswell's (1996) finding that verb form frequency modulates within-sentence thematic effects. It also supports the more general claim that multiple sources of constraint contribute to the initial syntactic ambiguity resolu- tion process.

A Competit ive Integration Model

Modeling Reading Times for Individual Items

Thus far we have been assuming that there is a systematic relationship between processing difficulty, as measured by reading times, and the strength of evidence for competing syntactic alternatives. However, we have not formalized our assumptions. This is problematic because in the absence of explicit assumptions, constraint-based models are difficult to evaluate. Moreover, it is hard to compare predictions made by competing models. In this section, we describe a simple

2 The exact metric used by Trueswell (1996) was the natural log (In) of the frequencies, but the metric is a ratio (normalized by the overall frequency); therefore, IogVBN/logBASE = lnVBN/ lnBASE.

3 With only watched removed from the analysis, the regression has a comparable slope to when it is included, but the correlation is only marginally reliable, r 2 = .24, p = .06.

DISCOURSE AND LEXICAL CONSTRAINTS 1533

implementation of the framework presented in Figure 1 and test it against the stimulus items from Experiment 2.

The computational model that we implemented is in- tended as a generic realization of a model in which (a) multiple constraints are combined to compute alternative interpretations in parallel and (b) the alternatives compete with one another during processing until one achieves criterion activation. Thus, our framework is broadly compat- ible with the general architectural assumptions made by a number of recently implemented models of syntactic ambi- guity resolution (e.g., Burgess & Lund, 1994; Jurafsky, 1996; McClelland et al., 1989; Pearlmutter, Daugherty, MacDonald, & Seidenberg, 1994; Stevenson, 1994; Tabor et al., 1997; Trueswell, Kim, Lund, & Burgess, 1995). An important difference between this model and those of Burgess and Lund (1994), McClelland et al. (1989), Pearl- mutter et al. (1994), Tabor et al. (1997), and Trueswell et al. (1995) is that instead of using distributed representations of the syntactic alternatives, it idealizes the competing syntac- tic representations into pairs of localist nodes. Although localist representations do not share some of the advantages of distributed representations (e.g., graceful degradation and a certain degree of neurophysiological plausibility), they allow a much simpler look into the model to observe and interpret its state at a particular time. On the other extreme, an important difference between this model and those of Jurafsky (1996) and Stevenson (1994) is that it produces a continuously graded degree of syntactic preference that varies from item to item (and therefore a graded degree of processing difficulty that varies from item to item), whereas the other models either produce a discrete syntactic prefer- ence and therefore a strict "garden-path" or "no garden- path," or group large classes of lexical items and treat them identically in determining syntactic preference. It is impor- tant to note that none of the models just mentioned have attempted to make explicit item-by-item predictions of the degree of processing difficulty caused by syntactic ambigu- ity (or alleviated by discourse context), though these models could, of course, be modified to make item-specific predictions.

It is important to keep in mind that we have implemented a model of constraint integration during ambiguity resolu- tion and not a model of how the syntactic alternatives are generated. Thus, the model cannot account for sources of variance in processing time that are due to generation of syntactic alternatives. This is clearly an important limitation of the model. However, ambiguity resolution is in itself a central component of language comprehension. Moreover, it has served as the central testing ground for evaluating conflicting claims about parsing theories, typically under conditions where a temporarily ambiguous structure is compared with an unambiguous baseline, precisely the domain in which the model is most appropriate.

Implementation. In order to develop an implementation, we needed to adopt an explicit competition algorithm and estimate the parameters for the inputs and weights, for the constraints pictured in Figure 1. The algorithm we used, normalized recurrence, implements competition between syntactic alternatives using recurrent feedback and normal-

ization (Spivey-Knowlton, 1996). 4 This competition algo- rithm is similar in spirit to the approach developed by Stevenson (1994). First, each pair of constraint nodes was normalized to a sum of 1.0 for main clause (Me) and reduced relative (RR):

Sc.,(norm) = S~,,I ~ Sc, a. (1) a

Sc, a represents the activation of each constraint node (i.e., the c th constraint that is connected to the a th interpretation node). S,.o (norm) is the same variable but normalized within each constraint. Constraints were then integrated at each of the interpretation nodes by means of a weighted sum based on Equation 2.

Io = ~ [wc × Sc.o(norm)]. (2) 0

The activation of the a ~ interpretation node is represented by Io. The weight on the connection linking the c th constraint node to interpretation node la is represented by we. Equation 2 was applied to each interpretation node and was summed across all constraint nodes that fed into it. Finally, Equation 3 determined how the interpretation nodes sent positive feedback to the constraints commensurate with how respon- sible the constraints were for that interpretation node's activation. Note that the weights were equal in both directions.

S~,~ = S,,~(norm) + la × Wc × S~.a(nOrm). (3)

These three steps (Equations 1-3) were computed in sequence within each cycle of competition. Thus, as cycles of competition take place, the difference between the two interpretation nodes gradually increases.

This integration of constraints converts disparate formats of representation into the common medium of probabilistic support for mutually exclusive interpretations and allows biases from these information Jsources to simultaneously contribute to resolution of the RR-MC ambiguity. More- over, the normalized recurrence competition algorithm al- lows these information sources to indirectly resolve each other's ambiguities (not unlike the interactive activation model, cf. McClelland & Rumelhart, 1981 and Rumelhart & McClelland, 1982). For example, in a two-referent context with an equi-biased verb, the RR will be more active than the MC, because the context and the parafoveal by strongly support the RR. Through normalized recurrence, these biases will drive the system toward fully supporting the RR and thus gradually resolve the verb's own tense ambiguity as well.

Four sources of constraint were used: verb tense (derived from Francis & Ku~era, 1982), a probabilistic main clause bias (derived from corpus analyses, e.g., Tabossi et al.,

4 Recent work in computational ncuroscience has emphasized the importance of recurrent feedback (e.g., Douglas, Koch, Mahowald, Martin, & Suarez, 1995) and normalization (e.g., Carandini & Heegcr, 1994) for input recognition.

1534 SPIVEY AND TANENHAUS

1994), discourse information (derived from sentence comple- tions in the different discourse contexts), and parafoveal information (derived from the presence of by after the verb, supporting the relative clause). Each constraint was con- densed into its probabilistic support for either the RR or the MC interpretations. For example, the verb frequency infor- mation was condensed into RR-MC probabilities by divid- ing the individual log-transformed values by their sum according to the following equations:

P(RR) = (logVBN/IogBASE)/[(logVBN/logBASE)

+ (IogVBD/IogBASE)],

P(MC) = (logVBD/IogBASE)/[(logVBN/IogBASE)

+ (logVBD/IogBASE)],

where VBN is the verb's frequency as a past participle, VBD is the verb's frequency as a simple past, and BASE is the verb's overall frequency.

Ideally, biases for each of the constraints should be independently established for each of the stimulus items and the weights on the constraints should be independently motivated. In subsequent work that builds on the present work, we have used ratings and corpus analyses to establish biases for each of the constraints. Weights on constraints were set to simulate data from offline fragment completions. These weights were then used for the simulations of the online reading-time data (Hanna, Spivey-Knowlton, & Tanen- haus, 1996; McRae et ai., 1998). We did not, however, have appropriate norms or completions to follow all of these procedures here. To minimize a priori assumptions, the present model used equal weights (.25) for all four con- straints. The rationale for each of the biases is described in detail below.

Estimating biases. Following TruesweU (1996), we used the measure of past participle availability from the regres- sion analyses described earlier for the biases of the verb tense constraint. The biases for the parafoveal constraint were set using a corpus analysis conducted by McRae et al. (1998) in which the Wall Street Journal and Brown corpora (Marcus, Santorini, & Marcinkiewicz, 1993) were searched for a set of 40 verbs to calculate an independent estimate of the degree to which a by phrase subsequent to a verb-ed biases a reader toward a reduced relative. We found 124 sentences in which by directly followed the -ed form of the verb, of which 99 were by phrases introducing agents in passive constructions. Thus, the by bias was set to .2 (25A24) for the main clause and .8 (99A24) for the reduced relative clause. The main clause bias was derived from a corpus analysis reported in Tabossi et al. (1994). For the verbs used in that study, 92% of sentence-initial noun phrase (NP) verb-ed sequences continued as a main clause, whereas 8% continued as a reduced relative, giving us biases of .92 and .08. Thus, the main clause and the parafoveai by biases used in the current simulations were the same as those used in McRae et al. (1998), which also used sentence initial

reduced relative clauses with animate noun phrases and by phrases.

We did not have independent estimates of the individual strength of each discourse context. Accordingly, we assumed that each one-referent context had the same bias and each two-referent context had the same bias. The actual values of the discourse bias were set so that averaging the discourse bias with the main clause bias would approximate the mean proportion of relative clause completions obtained in the completion data when participants completed the NP verb fragment in a biased discourse context. As reported in Spivey-Knowlton et al. (1993), these values were 25% for the one-referent context and 43% for the two-referent context. This procedure assumes that the same biases and weights used in fragment completions are appropriate for online simulations, an assumption that is supported by recent work by McRae et al. (1998) and Hanna et al. (1996). We did not use the completions to set different biases for each context because these completions do not represent an independent estimate of the strength of the context, because they take into account the context in conjunction with the NP and the verb. For the two-referent context, the resulting biases were .22 for the main clause and .78 for the reduced relative clause. For the one-referent context, the biases were .58 for the main clause and .42 for the reduced relative clause. Note that it might seem odd that the one-referent context should be as evenly biased between the main clause and reduced relative as it is. However, in several fragment completion studies we have found that simply including contexts with multiple different referents increases the proportion of reduced relative completions compared with completions without contexts (cf. Trueswell & Tanenhaus, 1991). 5

As shown in Panel A of Figure 5, the model takes the weighted sums of these inputs: one computed as the support for the RR, one computed as the support for the MC.

5 In previous versions of the model, conducted before the McRae et al. (1998) work was completed, we used slightly different weights and biases. The weights in these simulations were .22 (2/9) each for the discourse context, by, and the main clause bias, and .33 (3/9) for the tense bias. The main clauses bias was .85/.15, the by bias was .15/85, and the one-referent and two-referent contexts were .67/.33 and .33/.67, respectively. The context biases values were chosen to approximate the completions given a .85 main clause bias. Simulations with these weights are presented in Tanenhaus et al. (in press). We adopted the weights and biases used in the final version of this article because they involved making simpler a priori assumptions than the earlier weights. It is important to acknowledge that we did not use an algorithm that systematically searched the possible parameter space of the model as in McRae et al. (1998). Thus, we cannot prove that there is not another set of very different weights that would also fit the data at the verb. However, our experience with these simulations and others with similar data sets suggests that this is quite unlikely. In McRae et al., we used a procedure for systematically searching the parameter space to assign weights. The weights we adopted here are roughly consistent with those used by McRae et al. using this procedure. Note that we did not use the same weight fitting procedure here because we did not have appropriate offline data (gated fragment completions) to use in setting weights.

DISCOURSE AND LEXICAL CONSTRAINTS 1535

A Provisional Interpretation • of Syntactic Ambiguity

RR MC

r~o i~ / ~A ~ ~ ' / / " \ 1 ~~ormP~:tio~ × ~ × ~ ~ 4 ~ 4

Main Clause Bias Discourse Information

Provisional Interpretation V • of Syntactic Ambiguity

RR MC

Verb . / " ~ ~ ~ ~ Parafoveal T e n s e / V ° l ~ ~ ~ Inf°rmati°n

Main Clause Bias Discourse Information

Provisional Interpretation I~, • of Syntactic Ambiguity

RR MC

Main CI muse Bias

• Parafoveal ~ Infonnation

g4 Di~ourse Information

Figure 5. Schematic diagram of normalized recurrence in the competitive integration model. In Panel A, activation of provisional interpretations is the weighted sum of the four constraints. In Panel B, those same input values are multiplied by the activation of the provisional interpretation for feedback to the input constraints. Those constraints accumulate activation and then renormalize to one before integrating again to compute the new provisional interpretation activations (Panel C). RR = reduced relative; MC = main elanse.

Because the weights sum to 1.0 and the individual inputs are normalized to one, the respective mean values in the RR node and MC node will always sum to 1.0. For the recurrence that produces competition, the activations of these provisional interpretation nodes then act like weights

themselves and multiply with the individual weighted input values to send positive feedback, thus rewarding each input node proportional to both its separate contribution and the resultant RR or MC activation. At each time step, the input nodes accumulate activation from the feedback connections, and the activations of each pair of input nodes are renormal- ized to 1.0. Instead of having the activations of the MC and RR nodes accumulate iteratively, we simply recompute them as the weighted sums of their inputs at the new timestep. 6 For example, a stimulus item in a two-referent context, using the verb presented, would start off with the values in Figure 5A, where the RR and MC values are the weighted sums of their respective inputs. Those weighted input values (e.g., % × .60 = .20) are then multipled by the MC and RR activations (e.g., .20 × .57) to send feedback that is added to the current values of the original inputs (Figure 5B). Those new input values are then renormalized to 1.0, and their means are computed again at the RR and MC nodes (Figure 5C). This process (compute-mean, compute-feedback, renor- malize) repeats until one of the provisional interpretations reaches a criterion activation.

Dynamic criterion. The criterion used in the model is a function of how long the alternatives have been competing. Following McRae et al. (1998), this dynamic criterion was 1-xt, where x is a constant and t is the time step of competition. As competition lasts longer, the criterion for stopping competition gets more lenient, with the maximum competition duration being (1-1In)Ix time.steps, where n is the number of competing alternative interpretations. (l/n would be the probabilistic activations of the n alternatives if all inputs were perfectly equi-biased [without a dynamic criterion, in such a situation, the model would compete for eternity]. 1-1In is the amount of the probability space that the dynamic criterion must traverse in order to reach those equi-biased alternatives, and dividing that by x gives the number of steps [of size x] that the dynamic criterion would take to get there.) A dynamic criterion is particularly necessary for modeling eyetracking data across multiple regions of the sentence because fixation durations are partially determined by a preset "timing program" (e.g., Rayner & Pollatsek, 1989; Vaughan, 1983). Essentially, the reader will spend only so long in a given region of a sentence before making the next saccade. Explorations of the param- eter space of the model applied to several studies suggests that a constant of about .01 for the dynamic criterion

6 We do this because we do not see the provisional interpretation nodes as separate representations, hut rather as abstractions over the respective patterns of activation across the lower level nodes. In this way, the abstraction over the pattern of activation produces a positive feedback effect (self-reinforcing as well as cross- reinforcing) that allows the global representation to, over time, become "more than the sum of its parts." Tabor and Tanenhaus (1998) show that a dynamical systems parser will develop attrac- tors that correspond to competing syntactic analyses, generating similar competition patterns to those obtained with a competitive integration model with localist interpretation nodes. Tabor and Tanenhaus modeled the McRae et al. (1998) results applying the framework developed in Tabor et al. (1997).

1536 SPIVEY AND TANENHAUS

provides good fits to first-pass eye-movement data (Tanen- haus, Spivey-Knowlton, & Hanna, in press).

As the model iterates toward the dynamic criterion, the syntactically available interpretations compete for probabil- ity resources. In this way, it is similar to Stevenson's (1994) competition algorithm. The number of iterations necessary to achieve the criterion activation indicates how long the syntactic alternatives are actively competing and thus how long readers spend processing the ambiguous verb in the reduced relative clause, compared with the full relative clause.

Simulation. We tested the model by determining whether it could predict the item-specific reduction effects for the 16 stimuli in the two-referent context condition. The regression analyses reported earlier established that there was item- specific variance that was correlated with tense, the one constraint for which the model had different biases for different items. The durations of competition produced by this algorithm predicted the processing difficulty in first-pass reading times at the verb (reduced relative minus full relative) for each item in the two-referent contexts at least as well as the frequency metric used in Figure 4; r 2 = .37, p < .02 (see Figure 6). This result arises out of the fact that the normalized recurrence competition algorithm, which uses the verb frequency information (as well as the other constraints) in an inherently nonlinear fashion, produced near equal initial activation of the RR and MC nodes for stimulus items that showed considerable processing diffi- culty at the verb, especially for the verb watched--the clear outlier in Figure 6. As with the verb frequency regression analyses described earlier, the verb watched is an outlier that violates the normality assumption, and the stimulus items with low frequency verbs have rather unreliable estimates of the stimulus parameters. When watched and the verbs with

350

,-, 300

250

200

= 150

50 0

~ -50

-100

-150 17.5

- ' - i a i | m l l n

0

O

O O

f v

O O

20 22.5 25 27.5 30 32.5 35 37.5 40 Cycles of Competition

Figure 6. The competitive integration model's predictions of processing difficulty at the verb for the 16 stimulus items from Experiment 2. Each circle is a stimulus item. Processing difficulty is computed as first-pass reading time at the verb in the reduced relative minus that in the full relative.

fewer than 20 occurrences per million were excluded from the analysis, as was done previously, the prediction was still reliable; r 2 = .38, p < .05. Thus, without manipulating any free parameters from item to item, only changing the input values as indicated by the lexical frequencies, the model is able to approximate the i tem-by-item variation in the first-pass reading time differences at the verb in the two- referent context for Experiment 2. 7

Modeling Reading Times Across Multiple Regions

Thus far, we have restricted ourselves to modeling effects at a single region, the first verb, taking advantage of the fact that all of the relevant constraints in Experiment 2 can be treated as being simultaneously available when the ambigu- ity is introduced. We now extend the model to simulate results across successive regions in the sentence. This extension also allows us to apply the model to comparable experiments in the literature. For this purpose, we restrict ourselves to those experiments to which the model directly applies, namely experiments that manipulated referential contexts for reduced relative clauses compared with an unambiguous baseline, s This allows a preliminary test of the claim that the superficially conflicting results in the literature can be unified within a constraint-based framework.

We apply the model to three different data sets. First, we show that the model provides an approximate fit to the results of Experiment 2 across all three critical regions: verb, by phrase, and main verb + one word. The model exhibits the early interaction between context and relative clause reduction, followed by no processing difficulty at the main verb region. Second, the model approximates the results of a

7 Initially, it might seem as though the appropriate next step would be to simulate the item-by-item data from Experiment 1. However, in Experiment 1, the unambiguous baseline condition involved different verbs (e.g., chosen, slain) than those in the ambiguous condition (e.g., selected, killed). Although condition means that collapse across the individual items might smooth over the noise introduced by comparing different verbs (as is done in Figure 7) an item-by-item analysis of the sort done for Experiment 2 would be essentially uninformative in the case of Experiment 1 because of word length differences and base frequency differences. What might be done instead is an item-by-item analysis of the context effect in Experiment 1 regardless of the baseline condition. For example, a difference can be computed between the mean reading time for an ambiguous verb in its one-referent context and the mean reading time for that same verb in its two-referent context. The model can then predict those item-by-item differences by taking the differences between its competition durations for each ambiguous verb in its one-referent and two-referent contexts. Across the 16 stimulus items, this model's prediction is statistically significant, r e = .25, p < .05. When watched and the three low frequency verbs are excluded from the regression analysis (leaving only 12 data points), the effect is marginal, re = .31, p = .06.

s Other studies of reduced relatives in referential contexts (Britt et al., 1992; Ferreira & Clifton, 1986) can also, in principle, be accommodated by the constraint-based framework. However, these studies did not use an unambiguous baseline sentence to test their effects. Therefore, a clean measure of processing difficulty cannot be computed.

DlSCOtmSE AND LEXlCAL CONSaXAINTS 1537

word-by-word self-paced reading study that used the same stimulus materials (Spivey-Knowlton et al., 1993, Experi- ment 3). In these results, the sentences with two-referent contexts showed processing difficulty at the ambiguous verb, whereas sentences in one-referent contexts did not. This pattern then reversed at the NP within the by phrase. Finally, the model provides an approximate fit to the results of Murray and Liversedge (1994, Experiment 2), in which the two-referent context exhibited more processing difficulty throughout the entire target sentence. Thus, by changing only the inputs to the model (not its internal parameters), the normalized recurrence competition algorithm can account for results that show immediate context effects, delayed context effects, and even reversed context effects.

For modeling the results of Experiment 2, we compared the mean competition value for the sentences in the one- referent context with that for the sentences in the two- referent context, already computed for the verb region in the previous section. For the results at the by phrase, where the noun provides strong support for the relative clause, a new input was added: a bias of .875 for the RR and .125 for the MC, consistent with McRae et al. (1998). For simplicity, this new input was given a weight of 1.0, and then all weights were normalized to one (giving it a weight of .5, and the previous constraints' weights were thus halved). The activa- tion values of the inputs, after competition had reached the dynamic criterion from the previous region, were then used to resume competition at the new region. Thus, the model

incrementally approximates the competition for each fixa- tion in the sentence. As readers made an average of about 1.75 fixations in the by phrase region (see also Trueswell et al., 1994), competition durations in this region were multi- plied by 1.75. Once the dynamic criterion was reached for competition in this region, mean competition times were computed for the one-referent context items and for the two-referent context items. For the main verb region, a new input was again added: a bias of 1.0 for the RR and 0 for the MC, as this region marks the point of complete syntactic disambiguation. This new input was also given a weight of 1.0, and all weights were renormalized to one.

Figure 7 shows that the model provides approximate fits for condition means for the data from Experiment 2. At the by phrase, competition had decreased for the two-referent context, whereas it had increased somewhat for the one- referent context. At the main verb, competition lasted only briefly for both contexts.

The model was also implemented to simulate self-paced reading results (Spivey-Knowlton et al., 1993) using the same stimulus materials as those in Experiment 1. In word-by-word self-paced reading, the word by is not visible during reading of the verb; thus only verb tense-voice, main clause bias, and discourse information were entered into the integration with their weights normalized to one. At this region, an interesting result was observed. The model actually exhibited more competition for items in the two- referent context than for those in the one-referent context. In

.¢, ,~

~0

02

120

100"

80"

60

40

20

0

-20

-40

-60

----O-- Human Data Model Results One-Referent Context .... O- '" One-Referent Context Two-Referent Context .... A---- Two-Referent Context

A i t

n I 1

selected by the director believed that

50

'40

g

'30 ~

3 '20

10

0

Figure 7. The competitive integration model approximates processing difficulty across all three critical regions for the two contexts in Experiment 2. Processing difficulty is computed as first-pass reading time in the reduced relative minus that in the full relative, for each region. Model results are overlaid on the data to show general correspondence of the patterns.

1538 SPIVEY AND TANENHAUS

the one-referent context, competition at the verb was quickly resolved in favor of the MC, whereas many of the items in the two-referent context were resolved in favor of the RR, but only after substantial competition. Indeed, this pattern is reflected in the human data (Figure 8).

When the input for by was included at the next region, with a weight of 1.0 (at which point all weights were renormalized to a sum of 1.0), the activations of the other inputs had become somewhat polarized because of competi- tion at the verb. As a result, competition was about equal at the by region in the two contexts.

At the next region, the, a new input was added with a .875 bias for the RR and .125 for the MC. These biases were established from a corpus analysis reported in McRae et al. (1998). As in the previous simulation, this new input was given a weight of 1.0, and all weights were renormalized to one. Competition at this region increased for the one- referent context and decreased for the two-referent context.

Then, at the head noun of the by phrase, a new input was added with a bias of .99 for the RR and .01 for the MC, and its weight was set at 1.0, with all weights then being normalized to one. Again, these bias values were based on McRae et al,'s (1998) gated sentence completions.

Then, at the main verb region, a new input was added with a 1.0 bias for the RR and 0 bias for the MC, and its weight was set at 1.0, with the weights then being normalized to

one. As Figure 8 shows, the model predicts a crossover data pattern that is often described as a delayed effect of context (e.g., Fereirra and Clifton, 1986). This crossover in relative processing difficulty, with greater competition for the two- referent context early in the sentence followed by greater competition for the one-referent context later in the sen- tence, is in fact what was found in Spivey-Knowlton et al. (1993, Experiment 3). Note, however, that the model predicts that ambiguity resolution should take place more rapidly than it does in human data. One reason why this might be the case is that one-word-at-a-time self-paced reading often shows delayed effects, presumably because processing of the word may lag behind the button press. To better simulate this inertia effect, we reduced the weight for each new input to .75. As Figure 9 shows, this lagged model does a better job of simulating the human reading time data.

Finally, the model also simulates the superficially paradoxi- cal situation in which a two-referent context actually in- creases processing difficulty for the relative clause through- out the entire sentence (Murray & Liversedge, 1994, Experiment 2; see also Ferreira & Clifton, 1986, Experiment 3). The verb tense information for Murray and Liversedge's (1994) verbs was determined for the 36 stimulus items, using the appendixes presented in Liversedge (1994). The verbs in their sentences were typically followed by a noun phrase beginning with the (e.g., The salesman paid th_..e_e

m r ~

40

30

20

10

0

H u m a n Da ta - - - O - - One-Referent Context

,Ik Two-Referent Context

Model Results " " O .... One-Referent Context " " ~ .... Two-Referent Context

I i ! i I

selected by the director believed

'50

'40

30

20

10

o ..~ ..~

o

o

Figure 8. The competitive integration model approximates the pattern of processing difficulty across the five critical regions for the two contexts in word-by-word self-paced reading (Spivey- Knowlton et al., 1993, Experiment 3). Processing difficulty is computed as self-paced reading time in the ambiguous reduced relative minus that in the morphologically unambiguous reduced relative, for each word. Model results are overlaid on the data to show general correspondence of the patterns. (As this measure of processing difficulty is qualitatively different from that in the other experiments, the raw scale factor implied by this overlay is not expected to coincide with the others.)

DISCOURSE AND LEXICAL CONSTRAINTS 1539

40

30

°=,,~

tX0 .~ 20 r~ r/J

10

---O-- H u m a n D a t a

One-Referent Context

Two-Referent Context

Model Results

.... O--" One-Referent Context

.... • ~ ' " Two-Referent Context

! ! i i i 0

selected by the director believed

5 0

4 0 = O

.~- 0 )

30 r,.)

20

L)

10

Figure 9. To better approximate the inertia and processing lag of self-paced reading, the competitive integration model simulated the Spivey-Knowlton et al. (1993; Experiment 3) results with each new incoming constraint being given a weight of .75 (instead of 1.0) prior to weight normalization.

money put it . . . . The guest grilled th___e steak said i t . . . ) , which is locally consistent with a main clause. Therefore, parafoveal information was coded as a .5 bias toward the RR and a .5 bias toward the MC. The main clause bias was the same as in the previous simulations. The discourse informa- tion was derived from their sentence completion results (in the same manner as the previous simulations), where they observed 4% reduced relative completions in the one- referent context and 10% in the two-referent context (Mur- ray & Liversedge, 1994, Experiment 3). Using the same weights as before, mean competition values were computed for the verb. For the rest of the relative clause (typically an NP), the NP was typically a good theme for the verb, consistent with a main clause interpretation (e.g., The girl made th._ee cake cut it . . . . Correspondingly, this new input was given a bias of .25 toward the RR and .75 toward the MC. Its weight, as in the previous simulations, was set at 1.0, with all weights then being normalized to 1.0. Competition in the rest of tbe relative clause was multiplied by 1.75 (as in the simulation of the by phrase from our Experiment 2). Thus, the only differences between this simulation and the simulation of our Experiment 2 were the input values for verb tense-voice, parafoveal information, discourse infor- mation, and subsequent regions of the sentence; all weights and the constraint integration regime were the same across both simulations. As Murray and Liversedge (1994) com- bined the verb and the rest of the relative clause into one region, we simply added the mean competition values for

these two regions for comparison to their eye-movement data (Figure 10).

For the relative clause region, the model showed slightly more competition in the two-referent context than in the one-referent context. Essentially, all items in the one- referent context were quickly resolved in favor of the MC, whereas a few of the items in the two-referent context were near equal in their MC/RR activations, and therefore there was lengthy competition between the two syntactic alterna- fives. For the main verb region (region of syntactic disam- biguation), a new input was entered into the integration, with a 1.0 bias toward the RR and 0 bias toward the MC. Its weight was set at 1.0, and all weights were then normalized to 1.0. Again, competition at this region was slightly greater in the two-referent context than in the one-referent context.

In summary, a competition model provides approximate fits for three different data patterns using a consistent set of parameters. In Experiment 2, two-referent contexts com- pletely eliminated processing time differences be tw~n full and reduced relatives at the point of disambiguation. In Spivey-Knowlton et al. (1993, Experiment 3), two-referent contexts speeded ambiguity resolution but did not eliminate processing difficulty. In Murray and Liversedge (1994), supportive contexts actually increased processing difficulty. Our simulations suggests that this paradoxical result may be because participants sometimes did not recover from the incorrect main clause analysis for the ambiguous sentences in the one-referent contexts.

1 5 4 0 SPIVEY AND TANENHAUS

:::t o

.=. r~

o ~o

30

20

10

0

-10

-20

H u m a n Data /

- - O - - One-Referent Context ~ .,A A Two-Referent Context J , , , ' . i0

0 # t #

t# ~ f t oi

4 ,,,J¢/ #0 pap P oO fO 0

#0 0 ill I~ ~t , / Model Results

45

, " / .... O " " One-Referent Context

.... ' ~ " - Two-Referent Context

! I

grilled the steak said it tasted nice -30 30

O

40 O

O

35

Figure 10. The competitive integration model approximates the pattern of processing difficulty across the two critical regions for the two contexts in an eyetracking experiment with a different set of materials (Murray & Liversedge, 1994; Experiment 2). Processing difficulty is computed as first-pass per word reading time in the reduced relative minus that in the full relative, for each region. Model results are overlaid on the data to show general correspondence of the patterns. (As this measure of processing difficulty is qualitatively different from that in the other experiments, the raw scale factor implied by this overlay is not expected to coincide with the others.)

It is important to note that the data patterns associated with each of these experiments have often been given different interpretations in the literature. The first data pattern is typically taken as evidence that context can affect initial syntactic processing, whereas the other two data patterns are taken as evidence for an initial stage of processing in which syntactic processing is encapsulated from the effects of discourse, with discourse influencing an evaluation and revision stage.

Thus far we have demonstrated that a multiple constraints model using the same weights provides approximate fits to three of the different data patterns that have been obtained in the literature on discourse context effects. As our central claim is that these data patterns are best accounted for without appealing to a delay in the use of context, we also implemented two additional versions of the model using the same parameters and weights. In the first version, all of the constraints other than the main clause bias were delayed for 4 cycles of competition at the verb, the time it takes the model to resolve itself entirely in favor of a main clause. Thus this model can be seen as an implementation of a two-stage garden-path model (el. McRae et al., 1998). To compare the goodness of fit for this model with the no-delay simulations, we calculated root mean squared error values (RMS). The RMS values for the original, no-delay simula- tions were 33.98 for Experiment 2, 9.15 for the self-paced

reading study, and 26.30 for the Murray and Liversedge (1994) data. The comparable simulations with the delay models had RMS values of 36.13 for Experiment 2, 12.1 for the self-paced reading study, and 26.82 for the Murray and Liversedge data. The RMS values were consistently smaller for the no-delay version, indicating that it provided better fits of the data.

In the second simulation, we delayed only the discourse bias by four cycles in order to simulate a model in which all within-sentence constraints apply before discourse con- straints. The RMS values were again higher than for the no-delay simulations. The RMS values were 39.70 for Experiment 2, 12.42 for the self-paced reading study, and 26.87 for the Murray and Liversedge (1994) data. Finally, for both classes of delay models, making the delay longer resulted in increasingly poorer fits to the data.

It is likely that by changing the values of the weights we could have generated better fits for all of these simulations, including the original constraint-based model simulations. However, these simulations do show that with the weights and parameters we motivated for these stimuli, the no-delay model does a better job of simulating the data than the delay model.

The success of these simulations provides important evidence for the claim that syntactic ambiguity resolution involves an early integration of multiple information sources

DISCOURSE AND LEXICAL CONSTRAINTS 1541

and a process of competition between syntactic alternatives. It is important to emphasize, however, that the bias values and weights used in these simulations are clearly only approximations. In future work it will be important to include more precise independently motivated item-specific parameter estimates as well as systematic procedures for assigning weights. Work building on the results presented here that moves considerably in this direction is reported in Hanna et al., 1997, McRae et al., 1998, and Tanenhaus et al., in press. In addition, it will be important to evaluate the particulars of the competition algorithm and the function that maps competition onto reading times. For example, it is an open question as to whether normalized recurrence is the optimal competition algorithm. It is possible that explicit inhibitory connections between the RR and MC nodes (Spivey-Knowlton, 1994), or perhaps even a simple decay function followed by normalization, might be sufficient to produce competition that will fit the data. 9 Similarly, the particular function used for the dynamic criterion may be overly simplistic.

General Discussion

The research presented here makes two contributions. The first is primarily empirical. Experiments 1 and 2 showed that processing difficulty for reduced relative clauses was sharply reduced when the context contained a pair of referents for the initial noun phrase. This result strongly supports the claim that readers make immediate use of constraints established by the discourse context during ambiguity resolution. We also presented suggestive evidence that discourse context interacts with lexical frequency, a fact that accounts for discrepancies in the discourse context litera- ture, as proposed by MacDonald et al. (1994). Consider, for example, two studies with reduced relatives that failed to find referential context effects. Britt et al. (1992, Experiment 3) had a somewhat effective contextual manipulation (17% RR completions in the two-referent context, cf. Spivey- Knowlton & Tanenhaus, 1994), but the verbs in their target sentences only weakly supported the past participle tense (mean past participle availability: .57, compared with .76 for the verbs in our study). In contrast, the verbs used by Murray and Liversedge (1994) had relatively high past participle availability (.68), but their contextual manipulation Was weak (10% RR completions in the two-referent context).

The second contribution is primarily theoretical. We showed that reading times for individual items, as well as superficially conflicting data patterns in the ambiguity resolution literature, can be simulated within a multiple constraints framework using a simple competition algo- rithm. The specific computational model described in this article demonstrates how graded variation in context effects, across stimulus items as well as across experiments, can be due to informational biases inherent in the stimulus materi- als, not to architectural constraints on the processing system. In fact, with fixed model parameters, and varying only stimulus parameters, the competitive integration model accounts for a variety of effects in syntactic ambiguity resolution that were previously seen as mutually exclusive.

This fact highlights the importance of developing quantita- tive models of sentence processing, and the dangers of making simple inferences from reading times in the absence of explicit models. For example, the standard assumption that increased processing difficulty at points of ambiguity is evidence for a garden-path is clearly problematic, as is the assumption that delayed effects of a constraint mean that the constraint is not used during a putative first-stage in processing. Rather, the visibility of the effects of different constraints will vary depending on their strength and avail- ability, as well as on the presence of other relevant con- straints. Thus, constraint-based principles can provide a general mechanism for understanding properties of ambigu- ity resolution that have often led researchers to propose linguistically specific architectural constraints.

In summary, then, the results of our experiments and simulations support claims about the importance of dis- course representations in online syntactic ambiguity resolu- tion, made by proponents of referential theory (e.g., Altmann & Steedman, 1988; Crain & Steedman, 1985). They also support recent claims about the importance of lexical representations in online syntactic ambiguity resolution made by proponents of the constraint-based lexicalist ap- proach to ambiguity resolution (e.g., MacDonald et al., 1994; TruesweU & Tanenhaus, 1994). Finally, they suggest that constraint-based models incorporating competition pro- vide a general framework for understanding the time-course with which these, and other, constraints are integrated in real-time sentence processing.

9 In the present work, we avoided explicit inhibitory connections because we see the RR and MC nodes as abstractions over patterns of activation at the lower level. Thus, they act as emergent representations with no explicit nodes in the system (hence, their activation does not accumulate, it is simply recomputed at each iteration). In addition to trying inhibitory connections, we also tried several versions of a simple decay function followed by normaliza- tion, but the results of this competition algorithm did not match the data. In the end, the normalized recurrence algorithm provided, by far, the best account of the data.

References

Altmann, G., Garnham, A., & Dennis, Y. (1992). Avoiding the garden-path: Eye movements in context. Journal of Memory and Language, 31, 685-712.

Altmann, G., Garnham, A., & Henstra, L (1994). Effects of syntax in human sentence parsing: Evidence against a structure-based proposal mechanism. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 209-216.

Altmann, G., & Steedman, M. (1988). Interaction with context during human sentence processing. Cognition, 30, 191-238.

Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing. New York: Cam- bridge University Press.

Britt, M. A. (1994). The interaction of referential ambiguity and argument structure in parsing of prepositional phrases. Journal of Memory and Language, 33, 251-283.

Britt, M. A., Peffetti, C. A., Garrod, S., & Rayner, K. (1992).

1542 S~VEYANDTANENHAUS

Parsing and discourse: Context effects and their limits. Journal of Memory and Language, 31, 293-314.

Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguis- tic data. Proceedings of the 16th Annual Conference of the Cognitive Science Society (pp. 90-95). I-Iilisdale, NJ: Erlbaum.

Burgess, C., Tanenhaus, M., & Hoffman, M. (1994). Parafoveal and semantic effects on syntactic ambiguity resolution. Proceedings of the 16th Annual Conference of the Cognitive Science Society (pp. 96-99). Hillsdale, NJ: Erlbaum.

Carandini, M., & Heeger, D. (1994). Summation and division by neurons in primate visual cortex..Science, 264, 1333-1336.

Clifton, C., & Ferreira, E (1989). Ambiguity in context. Language and Cognitive Processes, 4, 77-104.

Connine, C., Ferreira, E, Jones, C., Clifton, C., & Frazier, L. (1984). Verb frame preferences: Descriptive norms. Journal of Psycholinguistic Research, 13, 307-319.

Crain, S., & Steedman, M. (1985). On not being led up the garden path. In D. Dowty, L. Kartunnen, & H. Zwicky (Eds.), Natural language parsing. Cambridge, England: Cambridge University Press.

Douglas, R., Koch, C., Mahowald, M., Martin, K., & Suarez, H. (1995). Recurrent excitation in neocortical circuits. Science, 269, 981-985.

Ferreira, E, & Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 348-368.

Ferreira, E, & Henderson, J. (1993). Reading processes during syntactic analysis and reanalysis. Canadian Journal of Experi- mental Psychology, 47, 247-275.

Francis, W., & Ku~era, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.

Frazier, L. (1987). Theories of syntactic processing. In J. Garfield (Ed.), Modularity in knowledge representation and natural language processing. Cambridge, MA: MIT Press.

Gibson, E., Schutze, C., & Salomon, A. (1996). The relationship between the frequency and the processing complexity of linguis- tic structure. Journal of Psycholinguistic Research, 25, 59-92.

Hanna, J., Barker, C., & Tanenhaus, M. (1995). Integrating local and discourse constraints in resolving lexical thematic ambigu- ities. Poster presented at the Eighth Annual City University of New York Conference on Human Sentence Processing, Tucson, AZ.

Hanna, J., Spivey-Knowlton, M., & Tanenhans, M. (1996). Integrat- ing discourse and lexical frequency information in resolving lexical thematic ambiguities. Proceedings of the 15th Annual Conference of the Cognitive Science Society (pp. 266--271). Mahwah, NJ: Erlbaum.

Hanna, J., Tanenhaus, M., & Spivey-Knowlton, M. (1997). Integrat- ing contextual and sentential constraints in ambiguity resolu- tion. Manuscript in preparation.

Juiiano, C., & Tanenhaus, M. (1995). A constraint-based lexicalist account of subject/object attachment ambiguities. Journal of Psycholinguistic Research, 23, 459-471.

Jurafsky, D. (1996). A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science, 20, 137-194.

Liversedge, S. (1994). Referential context, relative clauses and syntactic parsing. Unpublished doctoral dissertation, University of Dundee, Scotland.

MacDonald, M. (1994). Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes, 9, 692-715.

MacDonald, M., Pearlmutter, N., & Seidenberg, M. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 101, 676-703.

Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a

large annotated corpus of English: The Penn Treebank. Compu- tational Linguistics, 19, 313-330.

McClelland, J., & Rumelhart, D. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88, 375--407.

McClelland, J., St. John, M., & Taraban, R. (1989). Sentence comprehension: A parallel distributed approach. Language & Cognitive Processes, 4, 287-335.

McRae, K., Spivey-Knowlton, M., & Tanenhaus, M. (1998). Modeling the effects of thematic fit (and other constraints) in on-line sentence comprehension. Journal of Memory and Lan- guage, 37, 283-312.

Mitchell, D., Corley, M., & Garnham, A. (1992). Effects of context in human sentence parsing: Evidence against a discourse-based proposal mechanism. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 69-88.

Mitchell, D., Cuetos, E, Corley, M., Brysbaert, M. (1995). Exposure- based models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records. Journal of Psycho- linguistic Research, 24, 469-488.

Murray, W., & Liversedge, S. (1994). Referential context effects on syntactic processing. In C. Clifton, K. Rayner, & L. Frazier (Eds.), Perspectives in Sentence Processing (pp. 359-388). Hillsdaie, NJ: Erlbaum.

Pearlmutter, N., Daugherty, K., MacDonald, M., & Seidenberg, M. (1994). Modeling the use of frequency and contextual biases in sentence processing. In Proceedings of the 16th Annual Confer- ence of the Cognitive Science Society (pp. 699-704). Hillsdale, NJ: Erlbaum.

Pearlmutter, N., & MacDonald, M. (1995). Probabilistic con- straints and working memory capacity in syntactic ambiguity resolution. Journal of Memory and Language, 43, 521-542.

Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, NJ: Prentice Hall.

Rayner, K., Sereno, S., Morris, R., Schmauder, R., & Clifton, C. (1989). Eye movements and on-line language comprehension processes. Language and Cognitive Processes, 4, 21-50.

Rumelhart, D., & McClelland, J. (1982). An interactive activation model of context effects in letter perception: II. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60-94.

Sedivy, J., & Spivey-Knowlton, M. (1994). The use of structural, lexical, and pragmatic information in parsing attachment ambigu- ities. In C. Clifton, L. Frazier, & K. Rayner (Eds.), Perspectives on sentence processing. I-Iillsdale, NJ: Erlbaum.

Solomon, R., & Postman, L. (1952). Frequency of usage as a determinant of recognition thresholds for words. Journal of Experimental Psychology, 43, 195-201.

Spivey-Knowlton, M. (1992). Another context effect in sentence processing: Implications for the principle of referential support. Proceedings of the 14th Annual Conference of the Cognitive Science Society (pp. 486--491). Hillsdale, N J: Erlbaum.

Spivey-Knowlton, M. (1994). Quantitative predictions from a constraint-based theory of syntactic ambiguity resolution. Pro- ceedings of the 1993 Connectionist Models Summer School (pp. 130-137). Hillsdale, NJ: Erlbaum.

Spivey-Knowlton, M. (1996). Integration of linguistic and visual information: Human data and model simulations. Unpublished doctoral dissertation, University of Rochester.

Spivey-Knowlton, M., & Sedivy, J. (1995). Resolving attachment ambiguities with multiple constraints. Cognition, 55, 227-267.

Spivey-Knowlton, M., & Tanenhaus, M. (1994). Referential con- text and syntactic ambiguity resolution. In C. Clifton, K. Rayner & L. Frazier (Eds.), Perspectives in sentence processing (pp. 415-439). Hillsdale, NJ: Erlbaum.

DISCOURSE AND LEXlCAL CONSTRAINTS 1543

Spivey-Knowlton, M., Tmeswell, J., & Tanenhaus, M. (1993). Context effects in syntactic ambiguity resolution: Discourse and semantic influences in parsing reduced relative clauses. Cana- dian Journal of Experimental Psychology, 37, 276--309.

Stevenson, S. 0994). A competitive attachment model for resolv- ing syntactic ambiguities in natural language parsing. Unpub- fished doctoral dissertation, University of Maryland College Park.

Tabor, W., Juliano, C., & Tanenhans, M. K. (1997). Parsing in a dynamical system: An attractor-based account of the interaction of lexical and structural constraints in sentence processing. Language & Cognitive Processes, 12, 211-271.

Tabor, W., & Tanenhans, M. K. (1998). Dynamical models of sentence processing. Manuscript submitted for publication.

Tabossi, P., Spivey-Knowlton, M., McRae, K., & Tanenhaus, M. (1994). Semantic effects on syntactic ambiguity resolution: Evidence for a constraint-based resolution process. In C. Umilut & M. Moscovitch (Eds.), Attention and performance XE Hillsdale, NJ: Erlbaum.

Tanenhaus, M., Spivey-Knowlton, M., Eherhard, K., & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632-1634.

Tanenhans, M., Spivey-Knowlton, M., & Hanna, J. (in press). Modeling the effects of discourse and thematic fit in syntactic ambiguity resolution. In M. Crocker, M. Picketing, & C. Clifton (Eds.), Architectures and Mechanisms of Language Acquisition and Processing.

Tanenhans, M., & Trueswell, J. (1995). Sentence comprehension. In J. Miller & P. Eimas (Eds.), Speech language and communica- tion. Volume 11 of the handbook of perception and cognition (pp. 217-262). San Diego, CA: Academic Press.

TruesweU, J. (1996). The role of lexical frequency in syntactic ambiguity resolution. Journal of Memory and Language, 35, 566-585.

Trueswell, J., Kim, A., Lurid, K., Burgess, C. (1995). Thematic fit as discourse instantiation of leMcdily specific information: A distributed processing model of thematic integration. Poster presented at the 8th annual CUNY Conference on Human Sentence Processing, University of Arizona, Tucson, AZ.

Trueswell, J., & Tanenhans, M. (1991). Tense, temporal context and syntactic ambiguity resolution. Language & Cognitive Processes, 6, 303--338.

Trueswell, J., & Tanenhans, M. (1994). Toward a lexicalist approach to syntactic ambiguity resolution. In C. Cfifton, L. Frazier, & K. Rayner (Eds.), Perspectives on sentence process- ing. Hillsdale, NJ: Erlbaum.

Tmeswell, J., Tanenlmus, M., & Garnsey, S. (1994). Semantic influences on parsing: Use of thematic role information in syntactic disambiguation. Journal of Memory and Language, 33, 285-318.

van Berkum, J., Hagoort, P., & Brown, C. (1998). Rapid discourse context effects in sentence processing: ERP evidence. Talk presented at the 1 lth Annual City University of New York Con- ference on Human Sentence Processing, Rutgers University, NJ.

Vaughan, J. (1983). Control of fixation duration in visual search and memory search: Another look. Journal of Experimental Psychology: Human Perception and Performance, 8, 709--723.

Received September 25, 1995 Revision received April 10, 1998

Accepted April 10, 1998 •