cse391 – 2005 nlp 1 natural language processing machine translation predicate argument structures...
TRANSCRIPT
CSE391 – 2005 NLP1
Natural Language Processing
• Machine Translation
• Predicate argument structures
• Syntactic parses
• Lexical Semantics
• Probabilistic Parsing
• Ambiguities in sentence interpretation
CSE391 – 2005 NLP2
Machine Translation
• One of the first applications for computers– bilingual dictionary > word-word translation
• Good translation requires understanding!– War and Peace, The Sound and The Fury?
• What can we do? Sublanguages.– technical domains, static vocabulary– Meteo in Canada, Caterpillar Tractor Manuals,
Botanical descriptions, Military Messages
CSE391 – 2005 NLP3
Example translation
CSE391 – 2005 NLP4
Translation Issues: Korean to English
- Word order- Dropped arguments- Lexical ambiguities- Structure vs morphology
CSE391 – 2005 NLP5
Common Thread
• Predicate-argument structure– Basic constituents of the sentence and how
they are related to each other
• Constituents– John, Mary, the dog, pleasure, the store.
• Relations– Loves, feeds, go, to, bring
CSE391 – 2005 NLP6
Abstracting away from surface structure
CSE391 – 2005 NLP7
Transfer lexicons
CSE391 – 2005 NLP8
Machine Translation Lexical Choice- Word Sense Disambiguation
Iraq lost the battle. Ilakuka centwey ciessta. [Iraq ] [battle] [lost].
John lost his computer. John-i computer-lul ilepelyessta. [John] [computer] [misplaced].
CSE391 – 2005 NLP9
Natural Language Processing
• Syntax– Grammars, parsers, parse trees,
dependency structures
• Semantics– Subcategorization frames, semantic
classes, ontologies, formal semantics
• Pragmatics– Pronouns, reference resolution, discourse
models
CSE391 – 2005 NLP10
Syntactic Categories
• Nouns, pronouns, Proper nouns
• Verbs, intransitive verbs, transitive verbs, ditransitive verbs (subcategorization frames)
• Modifiers, Adjectives, Adverbs
• Prepositions
• Conjunctions
CSE391 – 2005 NLP11
Syntactic Parsing
• The cat sat on the mat. Det Noun Verb Prep Det Noun
• Time flies like an arrow. Noun Verb Prep Det Noun
• Fruit flies like a banana. Noun Noun Verb Det Noun
CSE391 – 2005 NLP12
Parses
V PP
VP
S
NP
the
the mat
satcat
onNPPrep
The cat sat on the mat
DetN
Det N
CSE391 – 2005 NLP13
Parses
VPP
VP
S
NP
time
an arrow
flies
likeNPPrep
Time flies like an arrow.
N
Det N
CSE391 – 2005 NLP14
Parses
V NP
VP
S
NP
flies like
an
NDet
Time flies like an arrow.
Ntime
arrow
N
CSE391 – 2005 NLP15
Recursive transition nets for CFGs
• s :- np,vp.• np:- pronoun; noun; det,adj, noun; np,pp.
S1 S2 S3
NP VP
S
S4 S5 S6
det noun
NPnoun
pronoun
pp
adj
CSE391 – 2005 NLP16
Lexicon
noun(cat).
noun(mat).
det(the).
det(a).
verb(sat).
prep(on).
noun(flies).
noun(time).
noun(arrow).
det(an).
verb(flies).
verb(time).
prep(like).
CSE391 – 2005 NLP17
Lexicon with Roots
noun(cat,cat).
noun(mat,mat).
det(the,the)
det(a,a).
verb(sat,sit).
prep(on,on).
noun(flies,fly).
noun(time,time).
noun(arrow,arrow).
det(an,an).
verb(flies,fly).
verb(time,time).
prep(like,like).
CSE391 – 2005 NLP18
Parses
V
VP
S
NP
can
water
holdcan
NPaux
The old can can hold the water.
N
thedet N
thedet
oldadj
CSE391 – 2005 NLP19
Structural ambiguities
• That factory can can tuna.
• That factory cans cans of tuna and salmon.
• Have the students in cse91 finish the exam in 212.
• Have the students in cse91 finished the exam in 212?
CSE391 – 2005 NLP20
LexiconThe old can can hold the water.
Noun(can,can)
Noun(cans,can)
Noun(water,water)
Noun(hold,hold)
Noun(holds,hold)
Det(the,the)
Verb(hold,hold)
Verb(holds,hold)
Aux(can,can)
Adj(old,old)
CSE391 – 2005 NLP21
Simple Context Free Grammar in BNF notation
S → NP VPNP → Pronoun | Noun | Det Adj Noun |NP
PP
PP → Prep NP
V → Verb | Aux VerbVP → V | V NP | V NP NP | V NP PP | VP
PP
CSE391 – 2005 NLP22
Top-down parse in progress[The, old, can, can, hold, the, water]
S → NP VP NP → NP?
NP → Pronoun?Pronoun? fail
NP → Noun?Noun? fail
NP → Det Adj Noun?Det? the
ADJ?old Noun? Can
Succeed.Succeed.
VP?
CSE391 – 2005 NLP23
Top-down parse in progress[can, hold, the, water]
VP → VP? V → Verb?
Verb? failV → Aux Verb?
Aux? canVerb? hold
succeedsucceedfail [the, water]
CSE391 – 2005 NLP24
Top-down parse in progress[can, hold, the, water]
VP → VP NPV → Verb?
Verb? failV → Aux Verb?
Aux? canVerb? hold
NP → Pronoun?Pronoun? fail
NP → Noun?Noun? fail
NP → Det Adj Noun?Det? the
ADJ? fail
CSE391 – 2005 NLP25
Lexicon
Noun(can,can)
Noun(cans,can)
Noun(water,water)
Noun(hold,hold)
Noun(holds,hold)
Det(the,the)
Verb(hold,hold)
Verb(holds,hold)
Aux(can,can)
Adj(old,old)
Adj( , )
CSE391 – 2005 NLP26
Top-down parse in progress[can, hold, the, water]
VP → V NP? V → Verb?
Verb? failV → Aux Verb?
Aux? canVerb? hold
NP → Pronoun?Pronoun? fail
NP → Noun?Noun? fail
NP → Det Adj Noun?Det? the
ADJ? Noun? water
SUCCEEDSUCCEED
CSE391 – 2005 NLP27
Lexicon
Noun(can,can)
Noun(cans,can)
Noun(water,water)
Noun(hold,hold)
Noun(holds,hold)
Det(the,the)
Noun(old,old)
Verb(hold,hold)
Verb(holds,hold)
Aux(can,can)
Adj(old,old)
Adj( , )
CSE391 – 2005 NLP28
Top-down approach
• Start with goal of sentence S → NP VPS → Wh-word Aux NP VP
• Will try to find an NP 4 different ways before trying a parse where the verb comes first.
• What does this remind you of? – search
• What would be better?
CSE391 – 2005 NLP29
Bottom-up approach
• Start with words in sentence.
• What structures do they correspond to?
• Once a structure is built, kept on a CHART.
CSE391 – 2005 NLP30
Bottom-up parse in progress
det adj noun aux verb det noun.
The old can can hold the water.
det noun aux/verb noun/verb noun det noun.
CSE391 – 2005 NLP31
Bottom-up parse in progress
det adj noun aux verb det noun.
The old can can hold the water.
det noun aux/verb noun/verb noun det noun.
CSE391 – 2005 NLP32
Bottom-up parse in progress
det adj noun aux verb det noun.
The old can can hold the water.
det noun aux/verb noun/verb noun det noun.
CSE391 – 2005 NLP33
Top-down vs. Bottom-up
• Helps with POS ambiguities – only consider relevant POS
• Rebuilds the same structure repeatedly
• Spends a lot of time on impossible parses
• Has to consider every POS
• Builds each structure once
• Spends a lot of time on useless structures
What would be better?
CSE391 – 2005 NLP34
Hybrid approach
• Top-down with a chart
• Use look ahead and heuristics to pick most likely sentence type
• Use probabilities for pos tagging, pp attachments, etc.
CSE391 – 2005 NLP35
Features
• C for Case, Subjective/Objective– She visited her.
• P for Person agreement, (1st, 2nd, 3rd)– I like him, You like him, He likes him,
• N for Number agreement, Subject/Verb– He likes him, They like him.
• G for Gender agreement, Subject/Verb– English, reflexive pronouns He washed himself.– Romance languages, det/noun
• T for Tense, – auxiliaries, sentential complements, etc. – * will finished is bad
CSE391 – 2005 NLP36
Example Lexicon EntriesUsing Features:
Case, Number, Gender, Person
pronoun(subj, sing, fem, third, she, she).pronoun(obj, sing, fem, third, her, her).
pronoun(obj, Num, Gender, second, you, you).pronoun(subj, sing, Gender, first, I, I).
noun(Case, plural, Gender, third, flies,fly).
CSE391 – 2005 NLP37
Language to LogicHow do we get there?
• John went to the book store. John store1, go(John, store1)
• John bought a book. buy(John,book1)
• John gave the book to Mary. give(John,book1,Mary)
• Mary put the book on the table. put(Mary,book1,table1)
CSE391 – 2005 NLP38
Lexical SemanticsSame event - different sentences
John broke the window with a hammer.
John broke the window with the crack.
The hammer broke the window.
The window broke.
CSE391 – 2005 NLP39
Same event - different syntactic frames
John broke the window with a hammer. SUBJ VERB OBJ MODIFIER
John broke the window with the crack. SUBJ VERB OBJ MODIFIER
The hammer broke the window. SUBJ VERB OBJ
The window broke. SUBJ VERB
CSE391 – 2005 NLP40
Semantics -predicate arguments
break(AGENT, INSTRUMENT, PATIENT)
AGENT PATIENT INSTRUMENT John broke the window with a hammer.
INSTRUMENT PATIENT The hammer broke the window.
PATIENT The window broke.
Fillmore 68 - The case for case
CSE391 – 2005 NLP41
AGENT PATIENT INSTRUMENT
John broke the window with a hammer. SUBJ OBJ MODIFIER
INSTRUMENT PATIENT
The hammer broke the window. SUBJ OBJ
PATIENT
The window broke. SUBJ
CSE391 – 2005 NLP42
Constraint Satisfaction
break (Agent: animate, Instrument: tool, Patient: physical-object)
Agent <=> subj Instrument <=> subj, with-pp Patient <=> obj, subj
ACL81,ACL85,ACL86,MT90,CUP90,AIJ93
CSE391 – 2005 NLP43
Syntax/semantics interaction
• Parsers will produce syntactically valid parses for semantically anomalous sentences
• Lexical semantics can be used to rule them out
CSE391 – 2005 NLP44
Constraint Satisfaction
give (Agent: animate, Patient: physical-object Recipient: animate)
Agent <=> subj Patient <=> object Recipient <=> indirect-object, to-pp
CSE391 – 2005 NLP45
Subcategorization Frequencies
• The women kept the dogs on the beach.– Where keep? Keep on beach 95%
• NP XP 81%
– Which dogs? Dogs on beach 5%• NP 19%
• The women discussed the dogs on the beach.– Where discuss? Discuss on beach 10%
• NP PP 24%
– Which dogs? Dogs on beach 90%• NP 76%
Ford, Bresnan, Kaplan 82, Jurafsky 98, Roland,Jurafsky 99
CSE391 – 2005 NLP46
Reading times
• NP-bias (slower times to bold word)
The waiter confirmed the reservation was made yesterday.
The defendant accepted the verdict would be decided soon.
CSE391 – 2005 NLP47
Reading times
• S-bias (no slower times to bold word)
The waiter insisted the reservation was made yesterday.
The defendant wished the verdict would be decided soon.
Trueswell, Tanenhaus and Kello, 93Trueswell and Kim 98
CSE391 – 2005 NLP48
Probabilistic Context Free Grammars
• Adding probabilities
• Lexicalizing the probabilities
CSE391 – 2005 NLP49
Simple Context Free Grammar in BNFS → NP VPNP → Pronoun
| Noun | Det Adj Noun |NP PP
PP → Prep NPV → Verb
| Aux VerbVP → V
| V NP | V NP NP | V NP PP | VP PP
CSE391 – 2005 NLP50
Simple Probabilistic CFGS → NP VPNP → Pronoun [0.10]
| Noun [0.20]| Det Adj Noun [0.50]|NP PP [0.20]
PP → Prep NP [1.00]V → Verb [0.20]
| Aux Verb [0.20]VP → V [0.10]
| V NP [0.40]| V NP NP [0.10]| V NP PP [0.20]| VP PP [0.20]
CSE391 – 2005 NLP51
Simple Probabilistic Lexicalized CFGS → NP VPNP → Pronoun [0.10]
| Noun [0.20]| Det Adj Noun [0.50]|NP PP [0.20]
PP → Prep NP [1.00]V → Verb [0.20]
| Aux Verb [0.20]VP → V [0.87] {sleep, cry, laugh}
| V NP [0.03]| V NP NP [0.00]| V NP PP [0.00]| VP PP [0.10]
CSE391 – 2005 NLP52
Simple Probabilistic Lexicalized CFGVP → V [0.30]
| V NP [0.60] {break,split,crack..}
| V NP NP [0.00]| V NP PP [0.00]| VP PP [0.10]
VP → V [0.10] what about | V NP [0.40] leave?| V NP NP [0.10] leave1,
leave2?| V NP PP [0.20]| VP PP [0.20]
CSE391 – 2005 NLP53
A TreeBanked Sentence
Analysts
S
NP-SBJ
VP
have VP
been VP
expectingNP
a GM-Jaguar pact
NP
that
SBAR
WHNP-1
*T*-1
S
NP-SBJVP
wouldVP
give
the US car maker
NP
NP
an eventual 30% stake
NP
the British company
NP
PP-LOC
in
(S (NP-SBJ Analysts) (VP have (VP been (VP expecting
(NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1) (VP would
(VP give (NP the U.S. car maker)
(NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company))))))))))))
Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.
CSE391 – 2005 NLP54
The same sentence, PropBanked
Analysts
have been expecting
a GM-Jaguar pact
Arg0 Arg1
(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting
Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1) (VP would
(VP give Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British
company))))))))))))that would give
*T*-1
the US car maker
an eventual 30% stake in the British company
Arg0
Arg2
Arg1
expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)
CSE391 – 2005 NLP55
Headlines
• Police Begin Campaign To Run Down Jaywalkers
• Iraqi Head Seeks Arms
• Teacher Strikes Idle Kids
• Miners Refuse To Work After Death
• Juvenile Court To Try Shooting Defendant
CSE391 – 2005 NLP56
Events
• From KRR lecture
CSE391 – 2005 NLP57
Context Sensitivity
• Programming languages are Context Free
• Natural languages are Context Sensitive?– Movement– Features– respectivelyJohn, Mary and Bill ate peaches, pears and apples,
respectively.
CSE391 – 2005 NLP58
The Chomsky Grammar Hierarchy
• Regular grammars, aabbbb S → aS | nil | bS
• Context free grammars, aaabbb S → aSb | nil
• Context sensitive grammars, aaabbbccc xSy → xby
• Turing Machines
CSE391 – 2005 NLP59
Recursive transition nets for CFGs
• s :- np,vp.• np:- pronoun; noun; det,adj, noun; np,pp.
S1 S2 S3
NP VP
S
S4 S5 S6
det noun
NPnoun
pronoun
pp
adj
CSE391 – 2005 NLP60
Most parsers are Turing Machines• To give a more natural and
comprehensible treatment of movement
• For a more efficient treatment of features
• Not because of respectively – most parsers can’t handle it.
CSE391 – 2005 NLP61
Nested Dependencies and Crossing Dependencies
John, Mary and Bill ate peaches, pears and apples, respectively
The dog chased the cat that bit the mouse that ran.
The mouse the cat the dog chased bit ran.
CF
CF
CS
CSE391 – 2005 NLP62
Movement
What did John give to Mary?
*Where did John give to Mary?
John gave cookies to Mary.
John gave <what> to Mary.
CSE391 – 2005 NLP63
Handling Movement:Hold registers/Slash Categories
• S :- Wh, S/NP
• S/NP :- VP
• S/NP :- NP VP/NP
• VP/NP :- Verb