surprising parser actions and reading difficultydependency parser is a significant predictor of...

1
Surprising parser actions and reading difficulty Marisa Ferrara Boston 1 , John T. Hale 1 , Reinhold Kliegl 2 , and Shravan Vasishth 2 1 Michigan State University 2 Universit ¨ at Potsdam, Germany I NTRODUCTION This study examines whether surprisal (Hale 2001), a parser-based complexity measure, can predict German readers’ eye-fixation durations, an empirical measure of sentence processing difficulty. We use the Potsdam Sentence Corpus (PSC), a 144 sentence eye movement corpus read by 272 native German speakers. Figure 1: Fixations for a PSC sentence. Baseline predictors like log frequency (lf), log bigram frequency (bi), word length (len), and human predictability given the left context (pr) model the total reading time (TRT) of eye-fixations per word (Ehrlich & Rayner 1981). We hypothesize that the addition of syntactic effects, as measured by surprisal, will better model this data than these word-level factors alone. We also test the role of the beam-size k in surprisal calculation. log (TRT )= 5.4 - 0.02lf - 0.01bi - 0.59len -1 - 0.02pr A DEPENDENCY PARSER CALCULATES SURPRISAL ... Surprisal is the logarithm of the prefix-probability α eliminated in the transition from one word to the next. surprisal(n ) = log 2 α n-1 α n α n = d ∈D(G,wv ) Prob(d ) When the transition from previous word to current word is low probability (as at in, compared with alte in 1), the surprisal is high. The psycholinguistic claim is that behavioral measures should register increased cognitive difficulty: rare parser actions are cognitively costly. Figure 2: Surprisal for a PSC sentence. We calculate surprisal using an incremental dependency parser built to the specifications of Nivre 2004 with an added k -best search, as in (3) and (4). We calculated surprisal for 10 total models, k =1...9,100. Figure 3: State-based surprisal calculation. Figure 4: Dependency claims in parser states q . ... WHICH PREDICTS FIXATION DURATIONS INDEPENDENT OF BASELINE PREDICTORS Using TRT as a dependent measure, we compared the quality of fit of the baseline model above to the 10 models which had all the predictors of the baseline plus one of the 10 surprisals. We evaluated the change in relative quality of fit due to surprisal with the Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002). Low DIC = better fit. Figure 5: DICs from the multiple regressions using different versions of surprisal. Surprisal predicts fixation durations and improves model fit for TRT measures when used in addition to the baseline measures. Small k (1,2,3) is more valuable than high k (100), challenging the assumption that cognitive functions are global optima. C ONCLUSION This study demonstrates that surprisal calculated with a dependency parser is a significant predictor of reading times, an empirical measure of cognitive difficulty. Predictions derived even from very narrow-beamed parsers improve a baseline eye-fixation duration model, and support a boundedly rational view of the human parser. S. F. Ehrlich and K. Rayner. 1981. Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20:641655. J. Hale. 2001. A probabilistic Earley parser as a psycholinguistic model. In Proceedings of 2nd NAACL, 18. J. Nivre. 2004. Incrementality in deterministic dependency parsing. In Incremental Parsing: Bringing Engineering and Cognition Together, Barcelona, Spain. Association for Computational Linguistics. D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. van der Linde. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, 64(B):583639. {mferrara,jthale}@msu.edu, kliegl@uni-potsdam.de, vasishth@acm.org http://www.msu.edu/˜mferrara/

Upload: others

Post on 28-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Surprising parser actions and reading difficultyMarisa Ferrara Boston1, John T. Hale1, Reinhold Kliegl2, and Shravan Vasishth2

    1Michigan State University 2Universität Potsdam, Germany

    INTRODUCTION

    This study examines whether surprisal (Hale 2001), aparser-based complexity measure, can predict Germanreaders’ eye-fixation durations, an empirical measure ofsentence processing difficulty.

    We use the Potsdam SentenceCorpus (PSC), a 144 sentenceeye movement corpus read by272 native German speakers.

    Figure 1: Fixations for a PSC sentence.

    Baseline predictors like log frequency (lf), log bigram frequency(bi), word length (len), and human predictability given the leftcontext (pr) model the total reading time (TRT) of eye-fixationsper word (Ehrlich & Rayner 1981). We hypothesize that theaddition of syntactic effects, as measured by surprisal, willbetter model this data than these word-level factors alone. Wealso test the role of the beam-size k in surprisal calculation.

    log (TRT ) = 5.4− 0.02lf − 0.01bi − 0.59len−1 − 0.02pr

    A DEPENDENCY PARSER CALCULATES SURPRISAL...

    Surprisal is the logarithm of the prefix-probability αeliminated in the transition from one word to thenext.

    surprisal(n) = log 2

    (αn−1αn

    )αn =

    ∑d∈D(G,wv)

    Prob(d)

    When the transition from previous word to currentword is low probability (as at in, compared withalte in 1), the surprisal is high. Thepsycholinguistic claim is that behavioral measuresshould register increased cognitive difficulty: rareparser actions are cognitively costly.

    Figure 2: Surprisal for a PSC sentence.

    We calculate surprisal using an incremental dependency parser built to thespecifications of Nivre 2004 with an added k -best search, as in (3) and (4). Wecalculated surprisal for 10 total models, k=1...9,100.

    Figure 3: State-based surprisal calculation.

    Figure 4: Dependency claims in parser states q.

    ...WHICH PREDICTS FIXATION DURATIONS INDEPENDENT OF BASELINE PREDICTORS

    Using TRT as a dependentmeasure, we compared thequality of fit of the baseline modelabove to the 10 models which hadall the predictors of the baselineplus one of the 10 surprisals. Weevaluated the change in relativequality of fit due to surprisal withthe Deviance InformationCriterion (DIC) (Spiegelhalter etal. 2002). Low DIC = better fit.

    Figure 5: DICs from the multiple regressions using different versions of surprisal.

    Surprisal predicts fixationdurations and improves model fitfor TRT measures when used inaddition to the baselinemeasures. Small k (1,2,3) ismore valuable than high k (100),challenging the assumption thatcognitive functions are globaloptima.

    CONCLUSION

    This study demonstrates that surprisal calculated with adependency parser is a significant predictor of reading times, anempirical measure of cognitive difficulty. Predictions derived evenfrom very narrow-beamed parsers improve a baseline eye-fixationduration model, and support a boundedly rational view of the humanparser.

    S. F. Ehrlich and K. Rayner. 1981. Contextual effects on word perception and eyemovements during reading. Journal of Verbal Learning and Verbal Behavior,20:641655.

    J. Hale. 2001. A probabilistic Earley parser as a psycholinguistic model. InProceedings of 2nd NAACL, 18.

    J. Nivre. 2004. Incrementality in deterministic dependency parsing. InIncremental Parsing: Bringing Engineering and Cognition Together,Barcelona, Spain. Association for Computational Linguistics.

    D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. van der Linde. 2002. Bayesianmeasures of model complexity and fit (with discussion). Journal ofthe Royal Statistical Society, 64(B):583639.

    {mferrara,jthale}@msu.edu, [email protected], [email protected] http://www.msu.edu/˜mferrara/