eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 summary of...
TRANSCRIPT
![Page 1: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/1.jpg)
Eye movement evidence that readers maintain and act
on uncertainty about past linguistic input
(Supporting Information)
Roger Levy, Klinton Bicknell, Tim Slattery, and Keith Rayner
1 Summary of uncertain-input sentence-comprehension
model
Levy 2008 [1] introduced a model of noisy-channel sentence comprehension under uncertain
input in which a comprehender uses a probabilistic grammar which defines a joint probability
distribution over word sequences w and structural representations, together with perceptual
input I obtained from reading a sentence w∗ incrementally, to form posterior inferences about
what the sentence and its structure may be. As researchers who know the true sentence w∗
being read by an experimental participant but not the perceptual input I obtained at any
point during reading, we marginalize over perceptual input to obtain the comprehender’s
expected inferences about the sentence being read:
P (w|w∗) =
∫I
PC(w|I,w∗)PT (I|w∗) dI (1)
where PC is the comprehender’s probability distribution and PT is the true noise distribution.
We can apply Bayes’ rule to obtain
P (w|w∗) = PC(w)
∫I
PC(I|w)PT (I|w∗)
PC(I)dI (2)
∝ PC(w)Q(w,w∗) (3)
1
![Page 2: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/2.jpg)
where Q(w,w∗) is proportional to the integral in Equation (2) and represents the average
effect of perceptual noise. For a given partial sentence w∗, we represent Q(w,w∗) as a
function over w by constructing a weighted finite-state automaton in the log (base-2) semiring
[2] that recognizes only w∗ and gives it zero cost, then adding edit, insertion, and deletion
arcs with costs equal to a noise parameter λ times the Levenshtein edit distance between
the original arc’s label and the new arc’s label (for full details see [1]).
To model the behavioral consequences of reading a new word w∗i in a sentence, we assume
that if w∗i dramatically changes the comprehender’s beliefs about the earlier content of a sen-
tence, then the comprehender will tend to respond behaviorally by longer fixation times and
possibly making regressive saccades. We define Pi(w[0,j)) to be the probability distribution
over the sequence of words starting at the beginning of the sentence and continuing up to but
not including the position occupied by w∗j , conditioning on the perceptual input obtained
from words w∗1...i, and use the Kullback-Leibler (K-L) divergence D
(Pi(w[0,i))||Pi−1(w[0,i))
)to quantify the change in this probability distribution sentence induced by reading w∗
i . This
quantity is shown in main-submission Figure 2 as a function of λ. The probabilistic context-
free grammar [3] used for the main submission consisted of the non-terminal rewrite rules
given in Table 1 plus all terminal rewrite rules (of the form part-of-speech→word) found in
the parsed Brown corpus; rule probabilities are estimated from the parsed Brown corpus
[4, 5].
2 Orthographic neighbors and grammatical analysis
The syntactic analysis of the sentence differs dramatically between the true sentence and the
variants in which at has been replaced by an orthographically similar near-neighbor word.
Figure 1 illustrates the difference between the analyses for the at→and substitution, using
the categories of the grammar in Table 1. The mapping from these analyses to analyses
within mainstream syntactic frameworks involve straightforward tree transformations of the
type widely used in computational linguistics [6].
Levy, Bicknell, Slattery, Rayner – Supporting Information 2
![Page 3: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/3.jpg)
ROOT → S 0.00 VP/NP → V 0.1
S → S-base CC S-base 7.3 VP → V PP 2.0
S → S-base 0.01 VP → V NP 0.7
S-base → NP-base VP 0 VP → V 2.9
NP → NP-base RC 4.1 RC → WP S/NP 0.5
NP → NP-base 0.5 RC → VP-pass/NP 2.0
NP → NP-base PP 2.0 RC → WP FinCop VP-pass/NP 4.9
NP-base → DT N N 4.7 PP → IN NP 0
NP-base → DT N 1.9 S/NP → VP 0.7
NP-base → DT JJ N 3.8 S/NP → NP-base VP/NP 1.3
NP-base → PRP 1.0 VP-pass/NP → VBN NP 2.2
NP-base → NNP 3.1 VP-pass/NP → VBN 0.4
VP/NP → V NP 4.0
Table 1: The probabilistic grammar used to compute K-L divergences in the main submission.
Rule weights given as negative log-probabilities in bits.
3 Experiment
3.1 Materials & Design
Our experimental design involved crossing two factors: first, the use of at versus toward
as the post-verbal preposition early in the sentence; second whether the critical participial
verb used in the object-modifying reduced relative clause had the same orthograpic form
(ambiguous) or different orthographic form (unambiguous) as the simple-past member of
the verb’s paradigm. We used 24 experimental items in the study (given in Appendix A); a
sample item is shown below.
Levy, Bicknell, Slattery, Rayner – Supporting Information 3
![Page 4: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/4.jpg)
S
S-base
NP
NP-base
DT
The
N
coach
VP
V
smiled
PP
IN
at
NP
NP-base
DT
the
N
player
RC
VP-pass/NP
VBN
tossed
NP
NP-base
DT
the
N
frisbee
S
S-base
NP
NP-base
DT
The
N
coach
VP
V
smiled
CC
and
S-base
NP
NP-base
DT
the
N
player
VP
V
tossed
NP
NP-base
DT
the
N
frisbee
Figure 1: Syntactic analyses of sentence with true words (left) and near-neighbor at→and
substitution (right) under the probabilistic grammar
(1) a. at, ambiguous:
The coach smiled at the player tossed a frisbee by the opposing team.
b. at, unambiguous:
The coach smiled at the player thrown a frisbee by the opposing team.
c. toward, ambiguous:
The coach smiled toward the player tossed a frisbee by the opposing team.
d. toward, unambiguous:
The coach smiled toward the player thrown a frisbee by the opposing team.
We constructed four stimulus lists, rotating items among these four conditions in a Latin
Square. These 24 experimental stimuli were interleaved with 36 fillers. Order of presenta-
tion was randomized differently for each participant, subject to the constraint that no two
experimental items appeared consecutively.
3.2 Procedure
40 native-English speaker undergraduate students at UC San Diego participated in the ex-
periment. All had normal vision or corrected to normal vision, and were naive as to the
purpose of the experiment. Participants read each sentence while their eye movements were
monitored by an SR Eyelink 2000 eye-tracker, obtaining one eye-position sample every 1/2
millisecond with a spatial resolution of 0.01 degrees (binocular viewing, recording right-eye
Levy, Bicknell, Slattery, Rayner – Supporting Information 4
![Page 5: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/5.jpg)
only). Each sentence was presented on a single line in 14 point Courier New font on a 19 inch
LCD monitor positioned 55 cm in front of the participants (1 degree of visual angle ≈ 3 char-
acters). The eye-tracker was calibrated prior to beginning the experiment and subsequently
was recalibrated between trials as necessary.
3.3 Regions of analysis and data processing
We divided each experimental item into seven regions of analysis, as follows:
Subj MV Prep Obj Critical Spill Final
/The coach/ smiled/ {at,toward}/ the player/ {tossed,thrown}/ a frisbee/ by the opposing team./
Each trial was inspected by hand using the University of Massachussetts EyeDoctor
software suite (http://www.psych.umass.edu/eyelab/software/). We discarded any trial
in which there was track loss prior to some fixation in any region other than the Final region.
This resulted in loss of 15.3% of trials. Most of these track losses were due to the participant
blinking.
We examined a number of standard eye movement measures [7] including: (1) the fre-
quency with which a region was skipped on first reading, (2) first fixation duration (the
duration of the first fixation on a region when no material to the right of the region has yet
been fixated), (3) first pass reading time (the total fixation time on a region the first time
it is entered, when no material to the right of the region has yet been fixated; also called
gaze duration for regions consisting of only one word), (4) go-past time (the accumulated
time from when a reader first fixates on a region until their first fixation to the right of
the region; this measure includes any regressions the reader makes prior to moving forward
past the word), (5) total reading time (the sumed time of all fixations on a region), (6)
regressions out of a region immediately after first-pass reading, and (7) regressions
into a region. These measures were computed for the regions of the sentence described
above; we do not report results for the Subject and Final regions as there were no significant
results on measures meaningful for these regions.
3.4 Statistical analysis method
We report most results using traditional by-participants (F1) and by-items (F2) ANOVAs.
Consistent with standard practice, reading times more than four standard deviations outside
Levy, Bicknell, Slattery, Rayner – Supporting Information 5
![Page 6: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/6.jpg)
the mean for each condition in each region were discarded as outliers. In cases where the
assumptions of ANOVA are badly violated (heavily imbalanced data and/or binary responses
with by-subject or by-item means close to 0 or 1), we use mixed-effects models with crossed
random effects of subject and item [8] using the lme4 package in R [9]. For experimental
psycholinguistic data such as ours, the question of precisely what random-effects structure to
specify for a multi-level model for inference on the fixed effects remains an open question. In
principle, for an n-condition experiment it could be appropriate to use a full n×n covariance
matrix (that is, arbitrary random interactions) for each of the by-subject and by-item random
effects. In practice, however, it is often difficult to obtain reliable convergence with such
complex random-effects structure for psycholinguistic datasets of our size. Therefore we
adopted the following principles, based on discussion in [8]. For each analysis, we began
by fitting a model with random intercepts by-subject and by-item. We then fit one model
with random intercepts by-subject and random interactions by-item, and another model
with random interactions by-subject and random intercepts by-item. We used likelihood-
ratio tests to compare each of these models with the random intercepts-only model. If
neither of these models yielded a significant improvement in log-likelihood, we report fixed-
effects results based on the random intercepts-only model. If at least one of these models
yielded a significant improvement in log-likelihood, we attempted to fit a final model with
random interactions by-subject and by-item. If this model converged and yielded a significant
improvement over the better of the two intermediate models by the likelihood-ratio test, we
report fixed-effects results based on this model; otherwise, we report fixed-effects results
based on the better of the two intermediate models. For linear models, random-effects
model comparisons were done using restricted maximum-likelihood estimation; fixed-effects
results are reported based on maximum-likelihood estimation. For logit models, Laplace
approximation of maximum likelihood was always used. Statistical significance for linear
models is reported as a t-statistic associated with the parameter estimate—for our datasets,
a t-statistic of 2 or greater corresponds approximately to p < 0.05 significance [8], and a
t-statistic of 1.65 or greater corresponds approximately to marginal p < 0.1 significance; for
logit models, as a p-value based on the Wald statistic [10]. Our factorial contrasts (which are
all two-way) were converted to a centered numeric representation to eliminate correlations
among main effects and lower-order interactions.
Levy, Bicknell, Slattery, Rayner – Supporting Information 6
![Page 7: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/7.jpg)
3.5 Results
For measures 1–7 described in Section 3.3 above, condition-specific means can be found in
Table 2; results of statistical analysis are shown in Table 3. Our results include a number
of main effects of preposition type at/toward on fixation times and regressive-saccade be-
havior that are presumably driven by the dramatic difference in length between these two
prepositions and its effect on first-pass reading behavior. These differences include main
effects on first-fixation (by items), first-pass, go-past, and total reading times, as well as
outward first-pass regression frequency, on the preposition region, and on first-fixation,
go-past time, total reading time, and outward first-pass regression frequency on the object
region. Readers skipped at far more often than they skipped toward, took more time to
read toward than at, regressed from toward more than at, had longer first fixations on the
region immediately following toward (the object region) than on the region following at,
and regressed more from the object region into at than into toward.
For skip probability, we found a highly significant effect of at/toward on the preposi-
tion region, with far more skipping of at than of toward. We also found a significant effect
of ambiguity on skip probability on the critical region, with more frequent skipping in the
unambiguous condition than in the ambiguous condition (p < 0.05 in a mixed-effects logit
model). This is almost certainly due to the fact that mean word length was shorter in the un-
ambiguous condition (5.63 characters) than in the ambiguous condition (6.42 characters).1
For spillover region skip probability, ANOVAs found a significant main effect (p < 0.05)
of ambiguity by participants, but these skip probabilities were close to 0, such that tradi-
tional ANOVA results are unreliable. Mixed-effects logit models found no reliable effects of
condition on spillover-region skip probability.
Our key results involve fixation times and first-pass regressions out involving the crit-
ical region, and first-pass regressions into the preposition region. On the critical region
we find a main effect of ambiguity on first-pass reading times, with longer times in the am-
biguous conditions, plus a numerical interactive trend for effect size to be larger in the at
conditions than in the toward conditions. More crucially, we found significant interactions
on go-past times and first-pass regressions out (GoPast, RegOut), with the the at+ambig
condition condition having the longest times and the most regressions out. On the prepo-
1A mixed-effects logit model with fixed effects of preposition× ambiguity plus word length found a highly
significant effect of word length on critical-region skip probability (p < 0.001) and no effects of condition.
Levy, Bicknell, Slattery, Rayner – Supporting Information 7
![Page 8: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/8.jpg)
Table 2: Means and standard errors for eye-movement measures
MV Prep Obj Crit Spill
Skip
at ambig 0 (0) 59 (4) 2 (1) 3 (1) 1 (1)
at unambig 1 (1) 58 (4) 2 (1) 8 (2) 1 (1)
toward ambig 1 (1) 1 (1) 2 (1) 2 (1) 3 (1)
toward unambig 0 (0) 3 (1) 1 (1) 5 (2) 1 (1)
FirstFix
at ambig 245 (7) 235 (15) 215 (5) 283 (11) 251 (8)
at unambig 243 (7) 232 (13) 219 (6) 279 (12) 273 (10)
toward ambig 250 (9) 243 (7) 235 (8) 286 (10) 265 (10)
toward unambig 264 (9) 257 (11) 233 (8) 285 (10) 261 (8)
FirstPass
at ambig 332 (11) 251 (19) 363 (18) 355 (15) 426 (21)
at unambig 329 (12) 241 (14) 374 (15) 322 (15) 442 (20)
toward ambig 305 (12) 294 (10) 358 (14) 359 (15) 443 (24)
toward unambig 333 (11) 288 (12) 358 (13) 343 (15) 453 (19)
GoPast
at ambig 391 (19) 294 (24) 502 (24) 476 (24) 679 (40)
at unambig 409 (17) 303 (22) 568 (23) 399 (20) 681 (33)
toward ambig 393 (18) 330 (14) 399 (18) 399 (20) 660 (44)
toward unambig 406 (20) 346 (19) 420 (16) 409 (18) 652 (34)
Total
at ambig 573 (35) 362 (22) 768 (47) 596 (29) 813 (52)
at unambig 566 (32) 364 (23) 758 (42) 626 (42) 818 (42)
toward ambig 574 (38) 490 (23) 641 (38) 640 (39) 776 (45)
toward unambig 605 (34) 505 (31) 659 (41) 616 (35) 829 (49)
RegOut
at ambig 11 (3) 6 (2) 24 (4) 21 (3) 28 (4)
at unambig 14 (3) 4 (1) 26 (4) 12 (2) 33 (4)
toward ambig 15 (3) 10 (2) 8 (2) 10 (2) 25 (3)
toward unambig 12 (3) 13 (3) 9 (2) 14 (2) 26 (3)
RegIn
at ambig 34 (5) 36 (4) 53 (4) 35 (4) 32 (4)
at unambig 38 (4) 31 (4) 44 (4) 44 (5) 32 (4)
toward ambig 40 (4) 31 (4) 42 (4) 37 (4) 33 (4)
toward unambig 40 (5) 31 (4) 36 (5) 42 (4) 34 (4)
Levy, Bicknell, Slattery, Rayner – Supporting Information 8
![Page 9: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/9.jpg)
Table 3: F -statistics for main effects and interactions for the eye-movement measures in
Table 2 (.p < 0.1,∗ p < 0.05,† p < 0.01,‡ p < 0.001)
MV Prep Obj Crit Spill
F1 F2 F1 F2 F1 F2 F1 F2 F1 F2
Skip
at <1 <1 269.79‡ 359.69‡ <1 <1 3.71. 2.25 1.67 2.22
ambig <1 <1 <1 <1 <1 <1 5.74∗ 4.37∗ 6.03∗ <1
at:ambig 1.01 <1 <1 <1 <1 <1 <1 <1 <1 1.90
FirstFix
at 3.72. 5.01∗ 1.03 4.46∗ 5.81∗ 6.93∗ <1 <1 <1 <1
ambig 1.11 1.07 1.15 <1 <1 <1 <1 <1 1.65 1.30
at:ambig 2.01 <1 <1 1.49 <1 <1 <1 1.17 2.59 1.14
FirstPass
at 1.35 <1 8.99† 14.76‡ <1 2.15 <1 2.11 <1 <1
ambig 1.82 1.41 <1 <1 <1 <1 5.15∗ 3.52. <1 <1
at:ambig 3.30. 2.92 <1 <1 <1 <1 <1 1.08 <1 <1
GoPast
at <1 <1 4.68∗ 4.58∗ 41.48‡ 47.50‡ 3.15. 1.86 <1 <1
ambig <1 <1 <1 <1 4.87∗ 6.90∗ 3.16. 3.33. <1 <1
at:ambig <1 <1 <1 <1 2.26 <1 4.77∗ 6.99∗ <1 <1
Total
at 1.08 3.11. 34.22‡ 47.12‡ 16.61‡ 11.10† <1 <1 <1 <1
ambig <1 <1 <1 <1 <1 <1 <1 <1 <1 1.18
at:ambig <1 1.15 <1 <1 <1 <1 1.11 <1 <1 <1
RegOut
at <1 <1 10.47† 8.42† 18.75‡ 46.21‡ 4.60∗ 6.82∗ 2.86. 1.50
ambig <1 <1 <1 <1 <1 <1 <1 1.55 <1 <1
at:ambig 1.21 1.80 1.65 1.62 <1 <1 11.42† 5.67∗ <1 <1
RegIn
at 2.10 1.77 <1 1.28 9.34† 5.53∗ <1 <1 <1 1.09
ambig <1 <1 <1 <1 4.85∗ 7.36∗ 4.58∗ 3.91. <1 <1
at:ambig <1 <1 1.14 1.79 <1 <1 <1 <1 <1 <1
Levy, Bicknell, Slattery, Rayner – Supporting Information 9
![Page 10: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/10.jpg)
Table 4: Means and standard errors for eye-movement measures in trials where the preposi-
tion was fixated
MV Prep Obj Crit Spill
GoPast
at ambig 344 (24) 294 (24) 437 (37) 473 (34) 625 (52)
at unambig 417 (29) 303 (22) 419 (32) 405 (33) 696 (56)
toward ambig 389 (18) 330 (14) 398 (18) 397 (19) 660 (44)
toward unambig 402 (19) 346 (19) 416 (16) 404 (18) 653 (34)
RegOut
at ambig 8 (4) 13 (5) 9 (3) 21 (5) 23 (6)
at unambig 15 (5) 10 (4) 9 (3) 11 (3) 30 (6)
toward ambig 14 (3) 10 (2) 8 (2) 9 (2) 25 (3)
toward unambig 12 (3) 14 (3) 8 (2) 13 (3) 26 (3)
sition region, we found a main effect of preposition type on frequency of inward regressive
saccades (RegIn), driven by a numerical interactive trend: inward regressive saccades were
most common in the at+ambig condition.
We obtained two significant main effects that we believe are unlikely to be relevant
to the present study. These include a significant main effect of preposition type on first-
fixation reading time at the main-clause verb (MV), with reading times higher in the toward
condition than in the at condition; and a significant main effect of ambiguity on go-past time
on the object region (the player), with go-past time longer in the unambiguous condition
than in the ambiguous condition. It is possible that these are preview effects related to
superficial properties of the subsequent region. Given that these effects are significant only
at the p < 0.05 level, are not seen in related eye-movement measures on the region in question
(e.g., we see no effect of ambiguity on outward first-pass regressions from the object region),
are not interactive, and that Table 3 involves 210 main-effect hypothesis tests, we do not
consider these two effects to be of immediate concern in interpreting the key results of our
study.
Levy, Bicknell, Slattery, Rayner – Supporting Information 10
![Page 11: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/11.jpg)
Table 5: F -statistics for main effects and interactions for the eye-movement measures in
trials where the preposition was fixated (Table 4) (.p < 0.1,∗ p < 0.05,† p < 0.01,‡ p < 0.001)
MV Prep Obj Crit Spill
F1 F2 F1 F2 F1 F2 F1 F2 F1 F2
GoPast
at <1 <1 4.68∗ 4.58∗ <1 <1 1.46 2.17 <1 <1
ambig 3.20. <1 <1 <1 <1 <1 2.39 3.23. 1.19 <1
at:ambig 2.67 <1 <1 <1 <1 <1 3.24. 3.22. 2.71 <1
RegOut
at <1 <1 <1 <1 <1 <1 1.29 3.41. <1 <1
ambig <1 <1 <1 <1 <1 <1 1.28 1.20 2.14 <1
at:ambig 2.24 <1 <1 1.42 <1 <1 6.12∗ 3.62. <1 <1
3.5.1 Trials in which the preposition was fixated
Because of the high frequency of skipping the preposition in the at conditions, we also
analyzed go-past and regressions-out measurements in the subset of trials on which the
preposition region was not skipped. The means and standard errors for these trials are
shown in Table 4, and F -statistics are presented in Table 5. The qualitative patterns for these
measures are identical to those patterns observed for all fixations, although the significance
levels on all effects have decreased due to the loss of over half the data. At the critical region,
mixed-effects models found the interactions to be significant for go-past time (t = 2.16) and
marginal (p = 0.055) for regressions out.
3.5.2 Regressive saccades in detail
We also examined in greater detail the distribution of first-pass regressive saccades between
regions of the sentence. Table 6 shows the regression matrix of the relative frequency of
first-pass regressive saccades into each region of the sentence, as a function of the region
from which the saccade originated. It is quite clear that most first-pass regressive saccades
are short and do not skip over regions of analysis. There are no major differences across
conditions, with the exception that first-pass regressions from the spill-over region jump over
the critical region and reach the object region more frequently in the ambiguous conditions,
and most frequently in the at+ambig condition. In mixed-effects logit models, there was
a significant main effect of ambiguity on this pattern (p < 0.001), but the interaction was
marginal (p > 0.08).
In addition, we examined first-pass regressive-saccade behavior beyond the first regressive
Levy, Bicknell, Slattery, Rayner – Supporting Information 11
![Page 12: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/12.jpg)
Subj MV Prep Obj Crit
MV 1.00 0.00 0.00 0.00 0.00
Prep 0.12 0.88 0.00 0.00 0.00
Obj 0.00 0.33 0.67 0.00 0.00
Crit 0.00 0.02 0.08 0.90 0.00
Spill 0.00 0.00 0.02 0.37 0.61
(a) at/tossed
Subj MV Prep Obj Crit
MV 1.00 0.00 0.00 0.00 0.00
Prep 0.07 0.93 0.00 0.00 0.00
Obj 0.02 0.23 0.75 0.00 0.00
Crit 0.00 0.07 0.10 0.83 0.00
Spill 0.00 0.02 0.00 0.09 0.89
(b) toward/tossed
Subj MV Prep Obj Crit
MV 1.00 0.00 0.00 0.00 0.00
Prep 0.07 0.93 0.00 0.00 0.00
Obj 0.00 0.17 0.83 0.00 0.00
Crit 0.00 0.00 0.05 0.95 0.00
Spill 0.00 0.02 0.00 0.19 0.80
(c) at/thrown
Subj MV Prep Obj Crit
MV 1.00 0.00 0.00 0.00 0.00
Prep 0.04 0.96 0.00 0.00 0.00
Obj 0.05 0.19 0.76 0.00 0.00
Crit 0.03 0.00 0.09 0.88 0.00
Spill 0.00 0.00 0.00 0.10 0.90
(d) toward/thrown
Table 6: First-pass regression matrix. Rows denote region of departure, columns denote
entry region. Numbers are proportions.
saccade from a region. As seen in Table 6, it was rare for readers to regress from the critical
region or beyond directly back to the preposition region in a single saccade. However, in
many cases the first regressive saccade was not immediately followed by a sequence of forward
saccades, but rather by a series of overall backward-moving saccades. To quantify this, we
computed what we will call here go-past regressions. We define a reader to have had a go-past
regression from region Y to region X if s/he had a first-pass regression from region Y and
subsequently fixated on region X before saccading past region Y. Go-past regression counts
are shown in Table 7; we analyzed these using mixed logit models. As seen, there are three
salient patterns in these data. First, there is a main effect of at/toward in go-past regressions
from the object to the preposition and to the main-clause verb, presumably driven by the
difference in length between the two prepositions (both p < 0.01). Second, there is a main
effect of critical-word ambiguity in go-past regressions from the spillover region to the object
and preposition regions (both p < 0.05 in a mixed logit model); in these cases, there are
Levy, Bicknell, Slattery, Rayner – Supporting Information 12
![Page 13: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/13.jpg)
Table 7: Frequency of go-past regressions
from Object from Critical from Spillover
Subj← MV← Prep← Subj← MV← Prep← Obj← Subj← MV← Prep← Obj← Crit←at/tossed 0 15 47 1 2 11 43 0 4 10 33 50
at/thrown 4 19 48 2 2 4 26 1 1 1 17 65
toward/tossed 1 4 13 2 2 2 20 3 3 12 21 49
toward/thrown 2 8 18 2 2 5 29 4 4 7 13 54
Ambiguous Unambiguous
At 62% 72%
Toward 70% 69%
Table 8: Question-answering accuracy
also numerical interactions such that there are superadditively many go-past regressions in
the at+ambig condition, but neither interaction coefficient reached significance. Finally,
there is an interactive pattern in go-past regressions from the critical region to the object
and preposition regions, with the most such regressions in the at+ambig condition (object
region: p < 0.05, preposition region: p = 0.087).
3.5.3 Question-answering accuracy
Average question-answering accuracy on fillers, at 89.6%, was considerably higher than for
experimental items, and no subject answered filler questions below 72% accuracy; this lowest
accuracy level of 72% is significantly above chance (p < 0.01) by a two-tailed binomial test.
On experimental items, in contrast, participants’ question-answering accuracy was relatively
low (68.5% overall). Condition-specific accuracies are given in Table 8; in no experimental
condition did accuracy exceed 72%. We interpret this pattern as indicating that participants
were reading attentively, but that ditransitive reduced relative clauses involving passivization
on the first object (e.g., tossed the player the frisbee can make sentences quite difficult to
comprehend indeed.
As Table 8 indicates, accuracy was lowest in the at+ambig condition. In 2× 2 ANOVA
analyses we found no significant main effects on accuracy and an interaction significant only
by items (F1(1, 39) = 1.52, p = 0.226; F2(1, 23) = 5.46, p = 0.029). Our mixed logit model
analysis revealed a marginal main effect of ambiguity (p = 0.08), and a significant interaction
Levy, Bicknell, Slattery, Rayner – Supporting Information 13
![Page 14: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/14.jpg)
(p < 0.05).
3.5.4 Question subtypes
Sixteen of our twenty-four questions queried some property of the reduced relative clause,
including whether the main-clause object was the agent or the goal of the RRC verb. There
were four such types of questions, illustrated in (2) below, flanked by codes used in Ap-
pendix A and correct answers.
(2) [O Vred] Did the player toss/throw a frisbee? NO
[So Vred O] Did someone toss/throw the player a frisbee? YES
[O Vred PP] Did the player toss/throw the opposing team a frisbee? NO
[PP Vred O] Did the opposing team toss/throw the player a frisbee? YES
Each type of RRC-directed question was used in four items. (The question type and by-
condition accuracy for each item can be found in Appendix A.) Mean question-answering
accuracies for RRC-directed and non-RRC-directed question types in each condition are given
in Table 9. Because these data are unbalanced, we analyzed them only with a mixed-effects
logit model.2 This model found a significant main effect of question type (p < 0.01) and
significant interactions between preposition and ambiguity (p < 0.05) and between question
type and ambiguity (p < 0.01). That is, readers were systematically worse at answering
RRC-directed questions than at answering non-RRC-directed questions. Although there
is a numerical trend for at+toward -condition RRC-directed questions to be answered less
accurately than any other question type, this three-way interaction was not statistically
significant.
3.6 Analyses contingent on participant question-answering accu-
racy
One possible concern regarding by the relatively low overall question-answering accuracy
(as stated in Section 3.5.3, 68.5% overall for experimental items) is that the reaction-time
2Specifying separate random effects of subject & item for each of the eight condition types proved com-
putationally prohibitive, so we collapsed the original four conditions to +/−at+ambig, which did not ap-
preciably lower model likelihood for the four-condition analysis of the previous section, and also indicated
qualitatively similar conclusions about fixed effects.
Levy, Bicknell, Slattery, Rayner – Supporting Information 14
![Page 15: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/15.jpg)
Table 9: Question-answering accuracy by question type
at ambig at unambig toward ambig toward unambig
Not RRC Question 0.812 0.800 0.875 0.762
RRC Question 0.531 0.688 0.612 0.656
and regressive-saccade measurements during sentence reading might reflect processes that
have little to do with normal language-comprehension, such as guessing.3 To address this
possibility, we conducted separate analyses of our crucial online measures (first-pass and go-
past durations, and first-pass and go-past regressions) for two separate participant sugbroups:
those whose question-answering accuracy was above median participant accuracy, and those
below median participant accuracy. The logic behind these analyses is that if processes
such as guessing underlie the crucial eye-movement patterns found in this experiment, these
patterns should be at least as strongly evident in low-accuracy participants than in high-
accuracy participants.
We used two different participant accuracy scores to determine our subgroups: accuracy
on filler-item questions and accuracy on experimental-item questions. In each case, it hap-
pened that seventeen participants lay above the median (the high-accuracy group), eighteen
lay below the median (the low-accuracy group), and five lay on the median and were thus
excluded from the analysis. We present results based on filler-item accuracy first; the two
accuracies are correlated at r = 0.479 (p < 0.01), but the filler-item accuracy has the advan-
tage of being logically independent of experimental-item online behavior. Table 10 presents
by-condition means for each of these cases among high- and low-accuracy comprehenders.
Because these data are unbalanced, we analyze them with mixed-effects models (see Sec-
tion 3.4), conducting separate analyses for high-accuracy and low-accuracy participants. In
first-pass reading times, high-accuracy participants had a marginally significant main effect
of ambiguity (t = 1.74) and a marginally significant interaction between preposition and
ambiguity (t = 1.76), whereas low-accuracy participants had no significant main effects or
interactions (all t < 1.62). In go-past reading times,high-accuracy participants had a sig-
nificant interaction between preposition and ambiguity (t = 2.50), whereas low-accuracy
3We thank an anonymous reviewer for raising this point.
Levy, Bicknell, Slattery, Rayner – Supporting Information 15
![Page 16: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/16.jpg)
Table 10: Crucial measures as a function of comprehender accuracy on filler questions
High-accuracy comprehenders (n = 17) Low-accuracy comprehenders (n = 18)
FirstPass GoPast RegOut GPReg QA FirstPass GoPast RegOut GPReg QA
at ambig 349 (19) 455 (38) 18 ( 4) 8 70 ( 6) 364 (27) 498 (38) 23 ( 4) 0 56 ( 5)
at unambig 298 (19) 372 (29) 10 ( 3) 0 80 ( 3) 340 (23) 433 (33) 14 ( 4) 5 64 ( 4)
toward ambig 330 (22) 356 (26) 7 ( 3) 0 77 ( 3) 375 (24) 411 (30) 9 ( 3) 2 67 ( 4)
toward unambig 332 (21) 395 (24) 14 ( 4) 3 75 ( 4) 358 (26) 418 (33) 12 ( 4) 0 60 ( 4)
participants had no significant main effects or interactions (all t < 1.62). In first-pass re-
gressions, high-accuracy participants had a marginal interaction between preposition and
ambiguity (pz = 0.06) whereas low-accuracy participants had a numerical trend toward an
interaction which was insignificant (pz = 0.16). Go-past regressions were too rare in either
participant subgroup to analyze reliably, but the numerical trend was toward the predicted
interaction only in the high-accuracy group. On experimental-item question-answering accu-
racy, both groups had marginal interactions between preposition and ambiguity (pz = 0.094
and pz = 0.089 for high-accuracy and low-accuracy participants respectively).
Table 11 presents by-condition means for high- and low-accuracy comprehenders as de-
termined by experimental-item accuracy. In first-pass reading times, both groups had a
marginally significant main effect of ambiguity (t = 1.9 and t = 1.97 respectively). In
go-past reading times,high-accuracy participants had a significant main effect of preposi-
tion (t = 2.6) and a significant interaction between preposition and ambiguity (t = 2.50),
whereas low-accuracy participants had a numerical trend toward the predicted interaction,
but no significant main effects or interactions (all t < 0.8). In first-pass regressions, high-
accuracy participants had a marginal main effect of preposition (pz = 0.09) and a marginal
interaction between preposition and ambiguity (pz = 0.08) whereas low-accuracy partici-
pants had a marginal interaction between preposition and ambiguity (pz = 0.07). As with
the filler-accuracy split, go-past regressions were too rare in either participant subgroup
to analyze reliably, but the numerical trend was toward the predicted interaction only in
the high-accuracy group. On experimental-item question-answering accuracy, high-accuracy
comprehenders had no significant effects of condition, whereas low-accuracy comprehenders
had a significant main effect of ambiguity (pz = 0.03) and a significant interaction be-
tween preposition and ambiguity (pz = 0.01). Because question-answering accuracy was
the criterion by which the two participant groups were determined, it is not surprising that
question-answering accuracy shows different qualitative patterns across the two groups.
Levy, Bicknell, Slattery, Rayner – Supporting Information 16
![Page 17: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/17.jpg)
Table 11: Crucial measures as a function of comprehender accuracy on experimental ques-
tions
High-accuracy comprehenders (n = 17) Low-accuracy comprehenders (n = 18)
FirstPass GoPast RegOut GPReg QA FirstPass GoPast RegOut GPReg QA
at ambig 349 (22) 519 (48) 25 ( 5) 7 80 ( 4) 350 (21) 452 (35) 18 ( 4) 0 44 ( 3)
at unambig 313 (19) 396 (35) 13 ( 4) 2 83 ( 4) 336 (27) 415 (29) 10 ( 3) 0 64 ( 3)
toward ambig 314 (16) 355 (26) 11 ( 3) 0 79 ( 3) 393 (26) 404 (26) 7 ( 3) 0 60 ( 4)
toward unambig 307 (17) 391 (28) 16 ( 4) 4 81 ( 4) 347 (25) 403 (27) 13 ( 3) 0 59 ( 4)
In sum, there is no clear evidence that the crucial interactions in online measurements
found in our study are disproportionately strong among low-accuracy participants, as one
would expect if inability to understand the sentences were driving these online interactions.
To the contrary, the numerical patterns suggest that these crucial interactions are at least as
strong, if not stronger, when comprehension accuracy is high. The clearest of these results are
that (1) in first-pass durations, high-accuracy participants (based on the filler-accuracy split)
showed a marginally significant interaction between preposition and ambiguity, whereas this
interaction was insignificant when either all participants or only low-accuracy participants are
considered; and (2) in go-past durations, high-accuracy participants (based on either split)
showed a significant interaction between preposition and ambiguity, whereas low-accuracy
participants did not.
3.7 Plausibility norming
We also conducted a plausibility norming study on the main-clause portions of our items
(i.e. The coach smiled at/toward the player for (1)) in order to address a possible confound.4
If the toward -condition main clauses are overall less plausible than the at-condition main
clauses, it is possible that the interactive pattern of greatest difficulty in the at+tossed con-
dition could arise from initial misanalysis of tossed as a main verb with player as its subject,
followed by rapid reanalysis into a reduced-relative or coordinate-verb analysis whose diffi-
culty is greater the more plausible the main-clause structure is. While we believe that this
is an unlikely explanation even if there are systematic differences in main-clause plausibility,
because the reanalysis would not involve any change to the structure of the main clause
itself, we addressed the point empiricially by reanalyzing our results on the basis of plausi-
4We thank Lyn Frazier for pointing out this possible confound to us.
Levy, Bicknell, Slattery, Rayner – Supporting Information 17
![Page 18: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/18.jpg)
bility norms. 30 native-English speaker UC San Diego undergraduates, none of whom had
participated in the eye-tracking study, took part in the plausibility norming study. The main
clauses of the 24 items were split equally into two blocks and interleaved among 36 fillers.
Each sentence was rated for plausibility on a scale of 1 (implausible) to 7 (plausible), with
presentation order was randomized separately for each participant.
Analysis revealed that at-condition main clauses did indeed have higher overall average
plausibility, at 5.94, than toward -condition main clauses, at 5.49 (by participants: t29 =
3.7, p < 0.001; by items: t23 = 4.4, p < 0.001). To address the potential confound that this
difference in mean plausibility presents, we ranked our items by the difference in plausibility
rating between the at condition and the toward condition, and removed items in rank order
until the mean plausibility in the remaining item set in the toward condition was not lower
than that in the at condition. This left us with 10 of our 24 items, with mean at-condition
plausibility rating of 5.75 and toward -condition rating of 5.78. We then reran analysis of
critical-region go-past time and first-pass regressions out using only these 10 items. The
results are shown in Tables 12 and 13. This subset of plausibility-matched items shows no
qualitative differences from the full item set in go-past time or first-pass regressions out;
in fact, interaction sizes are numerically larger in this subset. Although the regressions-out
interaction within this subset fails to reach statistical significance in this reduced item set,
the go-past time interaction is more highly significant here than in the full item set. Because
the remaining set of 10 items was not fully counterbalanced, we also analyzed go-past times
and regressions out using linear and logit mixed models; the go-past time interaction was
confirmed as highly significant (t = 3.3), though the regressions out interaction was not
(p = 0.19). We conclude that plausibility differential in the at versus toward conditions does
not explain the interactive difficulty pattern observed in the at+tossed condition.
3.8 Analysis based on trial order
Another possible confound in interpretation of our experimental results is that the crucial
interactions found in our experiment (on first-pass and go-past durations, first-pass and go-
past regressions, and question-answering accuracy) could be driven by a learning effect.5 For
example, since toward is a less frequent word than at, and only the latter word appears in
filler sentences, it is possible that participants noticed the contingency that the NP after
5We thank an anonymous reviewer for raising this point.
Levy, Bicknell, Slattery, Rayner – Supporting Information 18
![Page 19: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/19.jpg)
GoPast RegOut
at ambig 492 (30) 23 (5)
at unambig 356 (25) 9 (3)
toward ambig 405 (28) 11 (3)
toward unambig 453 (29) 11 (3)
Table 12: Mean and standard error at
critical region for at–toward plausibility-
matched item subset
GoPast RegOut
F1 F2 F1 F2
at <1 <1 2.73 3.08
ambig 2.09 2.87 4.40∗ 2.90
at:ambig 9.47† 11.18† 1.92 1.55
Table 13: F -statistics at critical region for
at–toward plausibility-matched item sub-
set
toward was always followed by a reduced relative clause, whereas they did not learn such
a contingency involving at. This could allow participants to become increasingly effective
at processing the toward conditions, which could drive an interactive pattern of the sort we
see here if knowledge of this contingency could only be usefully applied to the ambiguous
conditions. To test for this possibility, we conducted analysis of our crucial measures based on
trial orders. These analyses took three forms: (1) division of the experiment into four blocks
based on trial order, and inspection of condition means and standard errors for each block;
(2) for time measurements, non-parametric regression fits of duration against trial order
in each condition; (3) mixed-effect model analyses to test for the presence of interactions
between trial order and condition.
Analysis (1)—means and standard errors by block—is presented in Table 14. In all four
online measures, the numerical size of the interaction in question (as measured by the sum of
the at+ambiguous and toward+unambiguous condition means, minus the at+unambiguous
and toward+ambiguous condition means) is largest in the first of the four blocks. Question-
answering accuracy behaves differently over the course of the experiment: in the at+ambiguous
condition it seems to fluctuate throughout the course of the experiment, whereas in the other
three conditions it clearly rises through the course of the experiment.
Analysis (2)—non-parametric regression analyses of duration against trial order in each
condition, using R’s non-parametric regression function lowess()—is presented in Tables 2
and 3 for first-pass and go-past durations respectively. In neither case do durations fall more
quickly in the toward+ambig condition than in the at+ambig condition.
For analysis (3), we used mixed-effects models due to their ability to handle continuous
covariates as well as imbalance (trial order was fully randomized and thus slightly imbal-
Levy, Bicknell, Slattery, Rayner – Supporting Information 19
![Page 20: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/20.jpg)
FirstPass GoPast
1 2 3 4 1 2 3 4
at ambig 405 (28) 346 (27) 349 (30) 375 (34) 504 (44) 550 (65) 483 (66) 494 (47)
at unambig 330 (23) 311 (23) 305 (23) 331 (22) 312 (43) 423 (42) 345 (38) 388 (37)
toward ambig 321 (21) 416 (35) 352 (18) 343 (23) 373 (41) 445 (46) 404 (42) 436 (80)
toward unambig 304 (24) 366 (33) 327 (21) 401 (37) 375 (43) 408 (47) 388 (35) 523 (91)
RegOut GPReg
1 2 3 4 1 2 3 4
at ambig 21 (7) 30 (7) 18 (6) 23 (6) 4 4 2 1
at unambig 11 (5) 18 (7) 12 (4) 7 (3) 0 1 1 2
toward ambig 9 (5) 6 (4) 8 (3) 12 (5) 0 0 1 1
toward unambig 18 (5) 12 (5) 17 (6) 6 (4) 2 1 0 2
QA
1 2 3 4
at ambig 62 (7) 55 (7) 68 (7) 54 (7)
at unambig 51 (7) 72 (7) 73 (7) 84 (5)
toward ambig 52 (7) 75 (6) 75 (5) 83 (6)
toward unambig 64 (7) 64 (7) 70 (7) 83 (6)
filler 89 (2) 86 (3) 91 (2) 91 (2)
Table 14: Crucial measures in first, second, third, and fourth quartiles of trial order
anced across conditions), with trial order—defined as one plus the number of experimental
items the participant had already seen—as a real-valued predictor variable, standardizing it
to eliminate correlation with other predictors (preposition and ambiguity) and to facilitate
interpretation. On first-pass times we found a significant main effect of ambiguity (t = 2.6),
a marginal main effect of preposition (t = 1.76), and a significant interaction between ambi-
guity and order (t = 2.6) such that durations in the ambiguous conditions became shorter
relative to the unambiguous conditions over the course of the experiment. No other effects
were significant, most crucially the three-way interaction between preposition, ambiguity,
and trial order (t = 0.36). On go-past times we found a significant main effect of ambiguity
(t = 2.17) and a significant interaction between preposition and ambiguity (t = 2.4); no
order effects were significant. On regressions out we found a marginal main effect of prepo-
sition (pz = 0.08), a significant interaction between preposition and ambiguity (pz = 0.02),
Levy, Bicknell, Slattery, Rayner – Supporting Information 20
![Page 21: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/21.jpg)
order
fpas
s
100
200
300
400
0 5 10 15 20 25
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
atambig
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
towardambig
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
atunambig
0 5 10 15 20 25
100
200
300
400
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●●
●● ●
●●
●●
●
●
towardunambig
Figure 2: First-pass times as a function
of trial order
order
gopa
st
200
400
600
800
0 5 10 15 20 25
●
●
●
●
●●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
atambig
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
● ●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●●●●
● ●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●●
towardambig
●
●
● ●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●●
●
●
●
atunambig
0 5 10 15 20 25
200
400
600
800
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
● ●
●●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
● ●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●
●●
●●
●
● ●●
●●
●
towardunambig
Figure 3: Go-past times as a function of
trial order
and no significant order effects. Counts are too small to ensure that analysis of go-past
regressions is completely reliable, but the analysis revealed a marginal interaction between
preposition and ambiguity (p = 0.10) consistent with the other findings we obtained on
this measure. Finally, on question-answering accuracy we did find a three-way interaction
between preposition, ambiguity, and trial order (pz = 0.048). To clarify the precise nature of
this three-way interaction, we conducted an equivalent mixed-effects analysis with the fixed
effects recoded as interactions between condition and (scaled) trial order, with no intercept or
main effect of order. This coding assigns a separate learning rate to each condition, allowing
us to investigate the extent to which there is evidence for learning in each of the four con-
ditions. On this analysis, there were significant learning effects—with question-answering
accuracy improving over the course of the experiment—in the at+unambiguous condition
(β = 0.48, pz < 0.01), the toward+ambiguous condition (β = 0.62, pz < 0.001), and the
toward+unambiguous condition (β = 0.48, pz < 0.01). Only the at+ambiguous condition
showed no significant learning effects in either direction (β = −0.01, pz = 0.93).6 For com-
6Although there is a suggestion from the coefficient estimates that the learning effect might be largest
in the toward+ambiguous condition, this possibility was not supported by likelihood-ratio tests between a
two-learning-rate model—one for at+ambiguous and one for the rest—and a model with one learning rate for
each condition (p = 0.91), nor by a test between the two-rate model and a model with a single learning rate
Levy, Bicknell, Slattery, Rayner – Supporting Information 21
![Page 22: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/22.jpg)
pleteness, we also analyzed trial order effects on filler-question accuracy (here, trial order
is defined as how many fillers have already been seen). A mixed-effects logit model also
found a significant learning effect such that participants improved during the course of the
experiment (β = 0.3, pz < 0.01).
There are two major points that emerge from the analyses presented in this section.
First, the analytic techniques employed here are sensitive enough to pick up on effects of
trial order, including two- and three-way interactions between trial order and experimental
manipulations. This can be seen from the significant interaction between trial order and
ambiguity in first-pass time, and from the significant three-way interaction on question-
answering accuracy. Second, despite the sensitivity of the analytic techniques, no effects of
trial order were obtained that could explain the crucial online interactions in our experiment.
The only relevant online effect of trial order was with ambiguity in first-pass durations;
furthermore, this effect was in the opposite direction (i.e. durations dropped over time in the
ambiguous conditions) as the overall trend in the experiment, and both non-parametric plots
and block-by-block means suggest that this learning effect was, if anything, driven more by
the at+ambiguous condition than by the toward+ambiguous condition.
The relationship of trial order with question-answering accuracy was different than with
online measures: over the course of the experiment, participants got better at answering
questions in all conditions (including on fillers) except in the at+ambiguous condition. Al-
though this pattern bears some resemblance to the possible confound in which participants
get differentially better at the toward+ambiguous condition, this hypothesis provides no ac-
count of why participants’ accuracy improves across all conditions—crucially including both
unambiguous conditions—at approximately the same rate. We believe that the most likely
account of the observed relationship between trial order, condition, and question-answering
accuracy is that—as indicated by all our crucial online measurements—the at+ambiguous is
indeed the most difficult of the four conditions, and that this great difficulty prevents par-
ticipants from making consistent improvements in sentence interpretation over the course of
this short experiment.
for the unambiguous conditions, one for the at+ambiguous condition, and one for the toward+ambiguous
condition (p = 0.73).
Levy, Bicknell, Slattery, Rayner – Supporting Information 22
![Page 23: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/23.jpg)
3.9 Ruling out a categorical misidentification account
One point that must be emphasized is that these results are not compatible with an account
that simply allows for occasional categorical misidentification of the word at. The reason
for this can be seen when we consider the four experimental conditions plus at-condition
variants with misidentification as a near-neighbor word:
(3) a. The coach smiled toward the player. . . tossed
b. The coach smiled at the player. . . tossed
c. The coach smiled {as/and} the player. . . tossed
(4) a. The coach smiled toward the player. . . thrown
b. The coach smiled at the player. . . thrown
c. The coach smiled {as/and} the player. . . thrown
On such an account, critical-region reading in at+tossed trials should reflect some mixture of
the critical-region behavior that would be obtained in reading correctly-identified (3b) and
(3c) sentences. We would expect critical-region difficulty in (3b) to be similar to that of (3a),
since the only difference between the two is the preposition that was used. The critical-region
difficulty of (3c), on the other hand, should be substantially smaller than that of either (3a)
or (3b), since a finite-verb reading is now available for tossed. The difficulty in the at+tossed
condition should thus be less, if anything, than in the toward+tossed condition. (Note that
any overall increase in difficulty associated with the use of toward in comparison with at
should show up as a main effect, not as an interaction.) In the unambiguous conditions of
(4), in contrast, no corresponding facilitation should occur as a result of categorical misiden-
tification as in (4c), since thrown cannot be a finite main verb. Therefore, any interaction
in a categorical-misidentification model should be facilitatory in the at+tossed condition,
which is the opposite of what our results indicate.
References
[1] Levy R (2008) A noisy-channel model of rational human sentence comprehension under
uncertain input. EMNLP 13 pp 234–243.
Levy, Bicknell, Slattery, Rayner – Supporting Information 23
![Page 24: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/24.jpg)
[2] Mohri M (1997) Finite-state transducers in language and speech processing. Comput
Linguist 23:269–311.
[3] Manning CD, Schutze H (1999) Foundations of Statistical Natural Language Processing
(MIT Press).
[4] Kucera H, Francis WN (1967) Computational Analysis of Present-day American English
(Providence, RI: Brown University Press).
[5] Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus
of English: The Penn Treebank. Comput Linguist 19:313–330.
[6] Collins M (2003) Head-driven statistical models for natural language parsing. Comput
Linguist 29:589–637.
[7] Rayner K (1998) Eye movements in reading and information processing: 20 years of
research. Psychol Bull 124:372–422.
[8] Baayen RH, Davidson DJ, Bates DM (2008) Mixed-effects modeling with crossed ran-
dom effects for subjects and items. J Mem Lang 59:390–412.
[9] Bates D (2005) Fitting linear mixed models in R. R News 5:27–30.
[10] Jaeger TF (2008) Categorical data analysis: Away from ANOVAs (transformation or
not) and towards logit mixed models. J Mem Lang 59:434–446.
A Experimental items
After each experimental item, question type, main-clause plausibility ratings, and question-
answering accuracy by condition are given. Question type codings are as given in (2), plus
as follows:
[S Vm] Did the coach smile? YES
[Vm O] Did someone smile at the player? YES
[S was Vred] Was the coach {tossed/thrown} a frisbee? NO
[O Vm] Did the player toss a frisbee? NO
Levy, Bicknell, Slattery, Rayner – Supporting Information 24
![Page 25: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/25.jpg)
Main-clause plausibility ratings are given in mean±standard-error format in the order at/toward.
Question-answering accuracy is given in the order
at+ambig/at+unambig/toward+ambig/toward+unambig
1. The students sighed at the professor {taught/given} a dancing lesson by the experi-
enced instructor. [S Vm] (6.38±0.18/5.41±0.36; 1.0/1.0/1.0/0.9)
2. The kindergartner grinned at the little girl {brought/chosen} a toy by her parents on
the first day of Chanukah. [S Vm] (6.18±0.40/5.85±0.27; 0.9/1.0/0.9/1.0)
3. The hostess shrugged at the customer {allowed/forbidden} the pleasure of eating sweets
by his doctor. [Vm O] (6.23±0.32/5.35±0.34; 0.8/0.9/1.0/0.8)
4. The nurse grimaced at a student {grabbed/stolen} a muffin by her friends from the
dining hall. [S was Vred] (5.82±0.31/4.69±0.44; 0.9/1.0/0.9/0.9)
5. The hotel owner scowled at the guest {brought/taken} a drink by the bellboy. [O Vm]
(6.15±0.34/5.59±0.23; 0.9/0.8/0.9/0.9)
6. The benchwarmers cheered at the player {tossed/thrown} a frisbee by the opposing
team. [O Vred] (5.35±0.45/4.85±0.46; 0.7/0.5/0.7/0.2)
7. The priest frowned at the woman {offered/given} a beer by the hostess. [O Vred]
(6.31±0.31/5.47±0.40; 0.4/0.9/0.6/0.9)
8. The foreman cried out at a carpenter {cut/sawn} a board by his buddy. [O Vm]
(5.18±0.37/5.15±0.45; 0.4/0.1/0.7/0.5)
9. The manager cursed at the waiter {served/given} pea soup by a trainee. [So Vred
O] (6.38±0.21/5.65±0.27; 0.3/0.7/0.6/0.7)
10. The receptionist winked at the young man {rented/shown} an apartment by his uncle.
[So Vred O] (6.76±0.14/5.85±0.27; 0.9/0.9/0.9/1.0)
11. The anthropologist looked on at the woman {knitted/woven} a shawl by her mother.
[O Vred PP] (5.46±0.35/5.12±0.35; 0.7/0.8/0.5/0.9)
12. James stared at the children {dyed/hidden} Easter eggs by their teachers. [PP Vred
O] (6.88±0.08/5.23±0.43; 0.6/0.8/0.6/0.8)
Levy, Bicknell, Slattery, Rayner – Supporting Information 25
![Page 26: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/26.jpg)
13. The soldiers fired at the sergeant {presented/shown} a list of charges by the judge the
previous day. [O Vred PP] (5.08±0.33/4.94±0.35; 0.7/0.7/0.6/0.6)
14. The town drunk snorted at the innkeeper {recited/sung} a verse by a traveling monk.
[PP Vred O] (5.82±0.32/4.23±0.50; 0.3/0.6/0.7/0.6)
15. The taxi driver signaled at the woman {tossed/thrown} a silver dollar by the passerby.
[O Vred PP] (5.92±0.33/5.94±0.20; 0.5/0.7/0.7/0.5)
16. The mime gestured at the artist {painted/drawn} a picture by her father while he was
on his deathbed. [S was Vred] (6.65±0.15/6.38±0.24; 0.8/0.6/0.7/0.3)
17. The trader sneered at the banker {clipped/given} a coupon by her boss. [So Vred
O] (5.77±0.28/5.71±0.29; 0.7/0.9/0.8/0.5)
18. The logger glared at the activist {planted/grown} a tree by his daughter. [O Vred]
(6.12±0.36/6.08±0.31; 0.7/0.8/0.7/0.9)
19. The little boy reached out at the girl {knitted/woven} a hat by her grandmother. [Vm
O] (6.00±0.32/6.47±0.17; 0.8/1.0/0.9/0.8)
20. The lobbyist smiled at the congressman {mailed/written} a letter by the CEO. [PP
Vred O] (6.24±0.32/5.69±0.35; 0.8/1.0/0.9/0.8)
21. The referee motioned at the athlete {hurled/thrown} a pass by the quarterback during
the third quarter. [O Vred] (6.15±0.25/6.59±0.15; 0.2/0.5/0.1/0.4)
22. The people in line rubbernecked at the man {removed/withdrawn} some money by his
wife from the uncooperative ATM. [So Vred O] (4.53±0.45/4.46±0.53; 0.4/0.5/0.3/0.5)
23. The landlord squinted at the tenant {carried/driven} a load of books by her boyfriend
from her office. [O Vred PP] (6.31±0.21/5.71±0.27; 0.4/0.5/0.7/0.7)
24. The actor coughed at the journalist {asked/chosen} a question by the editor for the
interview. [PP Vred O] (5.18±0.33/4.62±0.40; 0.3/0.4/0.8/0.5)
Levy, Bicknell, Slattery, Rayner – Supporting Information 26
![Page 27: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/27.jpg)
Filler sentences
Items 37–44 are practice sentences and were presented at the beginning of the experiment.
1. Two elementary school students were doing their homework in the adjacent room.
2. The leftovers in the fridge are starting to smell.
3. A tall glass full of apple juice spilled on the coffee table.
4. The architect didn’t recognize the old blueprints from college.
5. The stray dog sniffed at the garbage can in apparent search of food.
6. The limosine arrived at the party completely full of passengers.
7. A group of seagulls settled on the power lines lining the avenue.
8. Pierre just purchased a new cat from the pet store in the next town.
9. The monitor turned itself off after a thirty minutes of inactivity.
10. The last woman in line tapped her foot and stared at her watch impatiently.
11. An aspiring young model from Nebraska moved to Los Angeles and immediately started
looking for work.
12. The accountant went to his boss and complained that the office was too stuffy.
13. Brad tripped on the telephone cord and banged his knee on the table.
14. The cyclist hit a patch of ice and lost control of his bike.
15. She sharpened the scissors and started cutting out her Valentine’s card.
16. Josephine grabbed the trunk of the car and pulled hard to get it open.
17. The news anchor stared off into space and sipped her coffee.
18. The bar was thick with smoke and plenty of men in their sixties.
19. The accountant watched the manager search the desk for the missing check.
20. The teller saw the teenagers enter the bank before the robbery.
21. The violin instructor observed her students work their way through the difficult music.
Levy, Bicknell, Slattery, Rayner – Supporting Information 27
![Page 28: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/28.jpg)
22. A street musician witnessed two hoodlums attempt to break into a station wagon last night.
23. The receptionist noticed her boss go home extra early on Wednesday.
24. The docent scrutinized the intern cleaning the Florentine vase in the museum hall.
25. The jeweler spotted what he thought were three young men casing his store.
26. The jailed prostitute overheard two police officers discussing her case.
27. A janitor discovered a stray dog scratching at the cafeteria door after school.
28. The mechanical engineer who formerly consulted for Daniel’s startup has now started his
own company.
29. A woman who was wearing a straw hat rummaged in her purse as the bus pulled to a halt.
30. Six protesters who were carrying signs proclaiming opposition to the death penalty marched
up the street.
31. A salesman who tried to sell Adam a magazine subscription yesterday showed up at his door
again today.
32. The worker who was experiencing mood swings quit his job last week.
33. Paula’s sister in London knows at least three people who are vegan.
34. A congressional page who worked for a freshman congressman from Ohio stopped by the
office with tea.
35. None of the farmers who lived in the area expected the season to be so favorable to squash.
36. The novel that most appealed to Simon was unfortunately sold out at his favorite bookstore.
37. The judge heard the bailiff chuckle under his breath.
38. The night watchman detected an intruder tugging at the glass door on the balcony.
39. A gardener who dabbled in hybridizing tomato strains planted some imported seeds in his
newest plot.
40. Lauren worked with an editor who strongly disagreed with her usage of semicolons.
41. She inspected the grassy knoll for remnants of bullets from a high-powered rifle.
Levy, Bicknell, Slattery, Rayner – Supporting Information 28
![Page 29: Eye movement evidence that readers maintain and act on … · 2009. 11. 24. · 1 Summary of uncertain-input sentence-comprehension model Levy 2008 [1] introduced a model of noisy-channel](https://reader036.vdocuments.mx/reader036/viewer/2022071405/60faada5f9efdc6f294da6d9/html5/thumbnails/29.jpg)
42. The old shawl had been passed down to her from her great-grandmother from Ukraine.
43. The woman was severely overweight and had a history of medical problems because of it.
44. A cab driver arrived at the scene and picked up all four of the waiting businessmen.
Levy, Bicknell, Slattery, Rayner – Supporting Information 29