noisy-channel theory of sentence comprehension · 2020. 2. 28. · uncertain input in language...

Noisy-channel theory of sentence comprehension

Emily MorganLSA 2019 Summer Institute

UC Davis

1

Rational analysis• Background assumption: cognitive agent is optimized via

evolution and learning to solve everyday tasks effectively1. Specify a formal model of the problem to be solved and

the agent’s goalsA. Make as few assumptions about computational

limitations as possible.2. Derive optimal behavior given the problem and goals3. Compare optimal behavior to agent behavior4. If predictions are off, revise assumptions, and iterate

2(Anderson, 1990, 1991)

Rational analysis: Sentence processing1. Specify a formal model of the problem to be solved and

the agent’s goalsGiven a sentence, recover a probability distribution over trees

A. Make as few assumptions about computational limitations as possible.Did not assume any memory limitations.

2. Derive optimal behavior given the problem and goalsDerived surprisal theory

3. Compare optimal behavior to agent behaviorCorrectly predicted many reading time results

4. If predictions are off, revise assumptions, and iterateBut let’s look at a case where the predictions are off…

3(Anderson, 1990, 1991)

An incremental inference puzzle for surprisal• Try to understand this sentence:

(a) The coach smiled at the player tossed the frisbee.

…and contrast this with:

(b) The coach smiled at the player thrown the frisbee.

(c) The coach smiled at the player who was thrown the frisbee.

(d) The coach smiled at the player who was tossed the frisbee.

• Readers boggle at “tossed” in (a), but not in (b-d)

Tabor et al. (2004, JML)

RT spike in (a)

4

• In classic garden-paths, you are lead astray by an initially plausible but ultimately incorrect analysis

Why is tossed/thrown interesting?

5

Why is tossed/thrown interesting?• In LCEs, you have already seen the correct main verb

smiled so the main verb interpretation of tossed “should not” be plausible

• But we get led astray anyway!• It appears that the parser is failing to make rational/optimal use

of its previous input 6

tossed

Rational analysis: Sentence processing1. Specify a formal model of the problem to be solved and

the agent’s goalsGiven a sentence, recover a probability distribution over trees

A. Make as few assumptions about computational limitations as possible.Did not assume any memory limitations.

2. Derive optimal behavior given the problem and goalsDerived surprisal theory

3. Compare optimal behavior to agent behaviorCorrectly predicted many reading time results

4. If predictions are off, revise assumptions, and iterateBut let’s look at a case where the predictions are off…

7(Anderson, 1990, 1991)

Uncertain input in language comprehension• Previous models of sentences processing made a

simplifying assumption: • Input is clean and perfectly-formed• No uncertainty about input is admitted

• Intuitively seems patently wrong…• We sometimes misread things• We can also proofread

8

Uncertain input in language comprehension• Uncertain input/noisy-channel hypothesis:

Comprehenders account for possible noise in the input• Leads to questions:

1. What behavioral evidence do we have for uncertain input/noisy channel theory of sentence comprehension?

2. What might a model of sentence comprehension under uncertain input look like?

3. What further predictions might such a model make?

9

Uncertain input in language comprehension• How could uncertain input explain Local Coherence

Effects?• Consider the sentences:1. The coach smiled at the player tossed a frisbee.2. The coach smiled as the player tossed a frisbee.3. The coach smiled and the player tossed a frisbee.• The comprehender might think it’s more likely that the

word at is wrong than that the speaker really meant #1.

10

Experimental design• In a free-reading eye-tracking study, Levy et al. (2009)

crossed at/toward with tossed/thrown:

• Prediction: interaction between preposition & ambiguity in some subset of:• Early-measure (first pass) RTs at critical region

tossed/thrown• First-pass regressions out of critical region• Go-past time for critical region• Regressions into at/toward

The coach smiled at the player tossed the frisbeeThe coach smiled at the player thrown the frisbeeThe coach smiled toward the player tossed the frisbeeThe coach smiled toward the player thrown the frisbee

11

Experimental results

First-passRT

Regressionsout

Go-pastRT

Go-pastregressions

Comprehensionaccuracy

The coach smiled at the player tossed…??

12

Today’s questions1. What behavioral evidence do we have for uncertain

input/noisy channel theory of sentence comprehension?• Local coherence effects

2. How can we model sentence comprehension under uncertain input?


13

Standard probabilistic sentence processing

14

T

w

Tree

word sequence

I noisy Input

• Standard probabilistic sentence processing:

• If (as experimenters) we know true sentence w* presented to the participant, but not the perceived input I.

• What does the participant believe the intended sequence w is?

A noisy-channel model

Levy (2008, EMNLP)

true noise modelcomprehender’s

model

comprehender’s prior prob. of w

similarity function(“kernel”)

15

• How can we represent the type of noisy input generated by a word sequence?

• Finite-state automata (FSAs)

• A type of grammar that generates strings

• Equivalently, it accepts/reject strings• This FSA accepts a, ab, abb, abbb, abbbb, etc.

Representing noisy input

16

Input symbolLog-probability

(surprisal)

Weighted/Probabilistic FSAs (pFSAs)• Every transition has a probability associated with it

• Here, represented as log probability (aka surprisal)• Total probability of a string is the sum of the transition

surprisals (plus the surprisal of the final state, if there are multiple)

• Equivalently, the product of the probabilities

17(Mohri, 1997)

grammar

+input

Combining grammar & uncertain input• Bayes’ Rule says that the evidence and the prior should

be combined (multiplied)• For probabilistic grammars, this combination is the formal

operation of weighted intersection

=BELIEF

Grammar affects beliefs about the future 18

Revising beliefs about the past• When we’re uncertain about the future, grammar + partial

input can affect beliefs about what will happen • With uncertainty of the past, grammar + future input can

affect beliefs about what has already happened

19

{b,c} {?} {b,c} {f,e}grammarword 1 words 1 + 2

20

Flexibility of pFSAs• Probabilistic FSAs can also allow us to represent inputs

of variable length• ε-transitions allow for the possibility of generating fewer

than two input symbols• Loops allow for the possibility of generating more than two

input symbols*

• This pFSA gives probability to infinitely many strings, but the most likely are {a,b}{a,b}

21

The noisy-channel model (FINAL)

• For Q(w,w*): a WFSA based on Levenshtein distance between words (KLD):

Result of KLD applied to w* = a cat sat

Prior Expected evidence

Cost(a cat sat)=0

Cost(sat a sat cat)=822

Incremental inference under uncertain input

• Near-neighbors make the “incorrect” analysis “correct”:

• Hypothesis: the boggle at “tossed” involves what the comprehender wonders whether she might have seen

Any of these changes makes tossed a main verb!!!

The coach smiled at the player tossed the frisbee(as?)

(and?)(who?)(that?)

(who?)(that?)

(and?)

23

The core of the intuition• Grammar & input come together to determine two possible

“paths” through the partial sentence:• tossed is more likely to happen along the bottom path

• This creates a large shift in belief in the tossed condition• thrown is very unlikely to happen along the bottom path

• As a result, there is no corresponding shift in belief

the coach smiled…

at(likely)

…the player…

as/and(unlikely)

…the player…

tossed

tossed

thrown

thrown

(line thickness ≈ probability)

24

Ingredients for the model

• Q(w,w*) comes from KLD (with minor changes)

• PC(w) comes from a probabilistic grammar (this time a probabilistic finite-state grammar, i.e. a grammar which can be represented as a pFSA)

• We need one more ingredient:

• a quantified signal of the alarm induced by word wi about changes in beliefs about the past

Prior Expected evidence

25

Quantifying alarm about the past

• Relative Entropy (KL-divergence) is a natural metric of change in a probability distrib. (Levy, 2008; Itti & Baldi, 2005)

• Our distribution of interest is probabilities over the previous words in the sentence• Because we’re allowing uncertain input, there is a probability

distribution over what each previous word may have been• Call this distribution Pi(w[0,j))

• The change induced by wi is the error identification signal EISi, defined as

new distribution old distribution

strings up to but excluding word jconditioned on words 0 through i

26

Error identification signal: example• Measuring change in beliefs about the past:

Change: EIS2 = 0.14 {b,c} {?} {b,c} {f,e}

27

Results on local-coherence sentences• Locally coherent: The coach smiled at the player tossed the frisbee• Locally incoherent: The coach smiled at the player thrown the frisbee

EIS greater for the variant humans boggle more on

(All sentences of Tabor et al. 2004 with lexical coverage in model) 28

Experimental data• Does the model make the correct predictions for the

experimental data of Levy et al. (2009)?The coach smiled at the player tossed the frisbeeThe coach smiled at the player thrown the frisbeeThe coach smiled toward the player tossed the frisbeeThe coach smiled toward the player thrown the frisbee

30

Model predictions

at…tossed

at…throwntoward…thrown

toward…tossed

(The coach smiled at/toward the player tossed/thrown the frisbee)31

Rational analysis1. Specify a formal model of the problem to be solved and

the agent’s goalsA. Make as few assumptions about computational

limitations as possible.2. Derive optimal behavior given the problem and goals3. Compare optimal behavior to agent behavior4. If predictions are off, revise assumptions, and iterate• Initially we assumed input was noiseless

• But we made incorrect predictions about LCEs (we predicted they shouldn’t cause difficulty)

• Revise our assumptions to include uncertain input• Now our theory correctly predicts LCEs

• What novel predictions does our new theory make? 32


input/noisy channel theory of sentence comprehension?• Local coherence effects (among others)

2. How can we model sentence comprehension under uncertain input?• One possibility is to use probabilistic finite state automata


33

Prediction 2: hallucinated garden paths• Try reading the sentence below:

While the clouds crackled, above the glider soared a magnificent eagle.

• There’s a garden-path clause in this sentence…• …but it’s interrupted by a comma.• Readers are ordinarily very good at using commas to

guide syntactic analysis:While the man hunted, the deer ran into the woodsWhile Mary was mending the sock fell off her lap

• “With a comma after mending there would be no syntactic garden path left to be studied.” (Fodor, 2002)

• We’ll see that the story is slightly more complicated.

34(Levy, 2010)

Prediction 2: hallucinated garden pathsWhile the clouds crackled, above the glider soared a magnificent eagle.

• This sentence is comprised of an initial intransitive subordinate clause…

• …and then a main clause with locative inversion. (c.f. a magnificent eagle soared above the glider)

• Crucially, the main clause’s initial PP would make a great dependent of the subordinate verb…

• …but doing that would require the comma to be ignored.• Inferences through …glider should thus involve a tradeoff

between perceptual input and prior expectations

35

• Inferences as probabilistic paths through the sentence:• Perceptual cost of ignoring the comma• Unlikeliness of main-clause continuation after comma• Likeliness of postverbal continuation without comma

• These inferences together make soared very surprising!

While the clouds crackled…

,(likely)

ø(unlikely)

…above the glider…(likely)

(unlikely)…above the glider…

soared

36

• Two properties come together to create “hallucinated garden path”1. Subordinate clause into which the main-clause inverted

phrase would fit well 2. Main clause with locative inversion

• Experimental design: cross (1) and (2)While the clouds crackled, above the glider soared a magnificent eagle.While the clouds crackled, the glider soared above a magnificent eagle.While the clouds crackled in the distance, above the glider soared a magnificent eagle.While the clouds crackled in the distance, the glider soared above a magnificent eagle.

• The phrase in the distance fulfills a similar thematic role as above the glider for crackled

• Should reduce hallucinated garden-path effect• We predict an interaction on reading times at soared

37

Prediction 2: Hallucinated garden paths• Methodology: word-by-word self-paced reading

• Readers aren’t allowed to backtrack• So the comma is visually gone by the time the inverted

main clause appears• Simple test of whether beliefs about previous input can

be revised

38

-----------------------------------------------------------------------While ---------------------------------------------------------------------- the ---------------------------------------------------------------------- clouds ---------------------------------------------------------------------- crackled, ---------------------------------------------------------------------- above ---------------------------------------------------------------------- the ---------------------------------------------------------------------- glider ---------------------------------------------------------------------- soared --------------------

Model predictions

While the clouds crackled, above the glider soared a magnificent eagle.

While the clouds crackledin the distance, abovethe glider soared a magnificent eagle.

While the clouds crackled, the glider soared above a magnificent eagle.

While the clouds crackledin the distance, the glider soared above a magnificent eagle.

39

Results: whole sentence reading times

Processing boggle occurs exactly where predicted

40

Hallucinated garden-path summary• The at/toward study showed that comprehenders note the

possibility of alternative strings and act on it• This study showed that comprehenders can actually

devote resources to grammatical analyses inconsistent with the surface string

41

Hallucinated garden paths cont’d• Sure, but punctuation’s weird stuff• What about real words?

• Bias against N N interpretation (at least sometimes)

42

I know that the desert trains could resupply the camp.

(Frazier & Rayner, 1987; Macdonald, 1993)

Hallucinated GPs with words• Bergen et al. (2012) used a bias against NN and toward

NV to test for GP hallucinations involving wordform change

The intern chauffeur for the governor hoped for more interesting work. [NN, “dense” neighborhood]

The intern chauffeured for the governor but hoped for more interesting work. [NV, “dense” neighborhood]

The inexperienced chauffeur for the governor hoped for more interesting work. [NN, “sparse” neighborhood]

Some interns chauffeured for the governor but hoped for more interesting work. [NV, “sparse” neighborhood]

43(Bergen, Levy, & Gibson, 2012)

Could be “intern chauffeured”

Could NOT be “inexperienced chauffeured”

Results• RT spike at disambiguating region for NN Dense

44(Bergen, Levy, & Gibson, 2012)




3. What further predictions might such a model make?• Hallucinated garden paths

4. What is the structure of the noise model?• What types of noise operations (e.g. inserting words,

deleting words, substituting words) do comprehenders think are more/less likely?

45

Structure of the noise model• Gibson et al. (2013) hypotheses:

• Short words, particular function words, are more likely to be confusable (e.g. at vs. toward)

• Prior probabilities should pull interpretations towards semantically plausible sentences

• Considering just insertions and deletions…• Fewer insertions/deletions is more likely than more

insertions/deletions• Comprehenders should infer the original more easily if the change

involves a deletion (to get from the intended message to the perceived message) rather than an insertion• It’s easy for a speaker to accidentally delete a word• For a speaker to accidentally insert a word, not only do they have

to accidentally decide to insert a word, they have to generate the specific word that they insert

46

Structure of the noise model• Consider the following alternation:

47

Sentence Plausibility Insertions Deletions

The cook baked Lucy a cake. Plausible 0 1

The cook baked Lucy for a cake. Implausible 1 0

The cook baked a cake for Lucy. Plausible 1 0

The cook baked a cake Lucy. Implausible 0 1

Structure of the noise model• Consider the following implausible sentences:

48

Sentence Construction EditsThe girl was kicked by the ball. passive 2IThe ball kicked the girl. active 2DThe tax law benefited from the businessman. intransitive 1IThe businessman benefited the tax law. transitive 1D

The cook baked Lucy for a cake. Prepositional Object (PO) benefactive 1I

The cook baked a cake Lucy. Double Object (DO) benefactive

1D

Often “corrected” to plausible interpretation inconsistent with literal meaning

Consistently given literal interpretation

Noisy-channel inference results• Confirmed

predictions:• Fewer edits is

more likely thanmore edits

• Deletions are more likely than insertions

49(Poppels & Levy 2015, replication of Gibson et al., 2013)

Exchanges in the noise model?

• Anecdotally, people don’t even notice the problem in this sentence

• Extraordinarily unlikely under the Gibson noise model• Because they only consider insertions and deletions

• But reasonably likely if word exchanges are admitted

• What would you predict about comprehenders noisy-channel inferences?

50(Poppels & Levy 2015)

This is a problem that I need to talk about Joe with.

Noisy-channel inference results• Confirmed

predictions:• Comprehenders

make noisy-channel inferences consistent with expecting exchanges in the noise model

51(Poppels & Levy 2015)




3. What further predictions might such a model make?• Hallucinated garden paths

4. What is the structure of the noise model?• Deletions more likely than insertions• Exchanges are also expected

52

noisy-channel theory of sentence comprehension · 2020. 2. 28. · uncertain input in language...

Documents