cs 188: artificial intelligence spring 2006 lecture 27: nlp 4/27/2006 dan klein – uc berkeley
TRANSCRIPT
![Page 1: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/1.jpg)
CS 188: Artificial IntelligenceSpring 2006
Lecture 27: NLP
4/27/2006
Dan Klein – UC Berkeley
![Page 2: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/2.jpg)
What is NLP?
Fundamental goal: deep understand of broad language Not just string processing or keyword matching!
End systems that we want to build: Ambitious: speech recognition, machine translation, information
extraction, dialog interfaces, question answering… Modest: spelling correction, text categorization…
![Page 3: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/3.jpg)
Why is Language Hard?
Ambiguity EYE DROPS OFF SHELF MINERS REFUSE TO WORK AFTER DEATH KILLER SENTENCED TO DIE FOR SECOND
TIME IN 10 YEARS LACK OF BRAINS HINDERS RESEARCH
![Page 4: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/4.jpg)
The Big Open Problems
Machine translation
Information extraction
Solid speech recognition
Deep content understanding
![Page 5: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/5.jpg)
Machine Translation
Translation systems encode: Something about fluent language Something about how two languages correspond
SOTA: for easy language pairs, better than nothing, but more an understanding aid than a replacement for human translators
![Page 6: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/6.jpg)
Information Extraction Information Extraction (IE)
Unstructured text to database entries
SOTA: perhaps 70% accuracy for multi-sentence temples, 90%+ for single easy fields
New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent.
Person Company Post State
Russell T. Lewis New York Times newspaper
president and general manager
start
Russell T. Lewis New York Times newspaper
executive vice president
end
Lance R. Primis New York Times Co. president and CEO start
![Page 7: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/7.jpg)
Question Answering Question Answering:
More than search Ask general
comprehension questions of a document collection
Can be really easy: “What’s the capital of Wyoming?”
Can be harder: “How many US states’ capitals are also their largest cities?”
Can be open ended: “What are the main issues in the global warming debate?”
SOTA: Can do factoids, even when text isn’t a perfect match
![Page 8: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/8.jpg)
Models of Language
Two main ways of modeling language
Language modeling: putting a distribution P(s) over sentences s Useful for modeling fluency in a noisy channel setting, like
machine translation or ASR Typically simple models, trained on lots of data
Language analysis: determining the structure and/or meaning behind a sentence Useful for deeper processing like information extraction or
question answering Starting to be used for MT
![Page 9: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/9.jpg)
The Speech Recognition Problem
We want to predict a sentence given an acoustic sequence:
The noisy channel approach: Build a generative model of production (encoding)
To decode, we use Bayes’ rule to write
Now, we have to find a sentence maximizing this product
)|(maxarg* AsPss
)|()(),( sAPsPsAP
)|(maxarg* AsPss
)(/)|()(maxarg APsAPsPs
)|()(maxarg sAPsPs
![Page 10: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/10.jpg)
N-Gram Language Models
No loss of generality to break sentence probability down with the chain rule
Too many histories!
N-gram solution: assume each word depends only on a short linear history
i
iin wwwwPwwwP )|()( 12121
i
ikiin wwwPwwwP )|()( 121
![Page 11: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/11.jpg)
Unigram Models Simplest case: unigrams
Generative process: pick a word, pick another word, … As a graphical model:
To make this a proper distribution over sentences, we have to generate a special STOP symbol last. (Why?)
Examples: [fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass.] [thrift, did, eighty, said, hard, 'm, july, bullish] [that, or, limited, the] [] [after, any, on, consistently, hospital, lake, of, of, other, and, factors, raised, analyst, too, allowed,
mexico, never, consider, fall, bungled, davison, that, obtain, price, lines, the, to, sass, the, the, further, board, a, details, machinists, the, companies, which, rivals, an, because, longer, oakes, percent, a, they, three, edward, it, currier, an, within, in, three, wrote, is, you, s., longer, institute, dentistry, pay, however, said, possible, to, rooms, hiding, eggs, approximate, financial, canada, the, so, workers, advancers, half, between, nasdaq]
i
in wPwwwP )()( 21
w1 w2 wn-1 STOP………….
![Page 12: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/12.jpg)
Bigram Models Big problem with unigrams: P(the the the the) >> P(I like ice cream) Condition on last word:
Any better? [texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr.,
gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen]
[outside, new, car, parking, lot, of, the, agreement, reached] [although, common, shares, rose, forty, six, point, four, hundred, dollars, from, thirty,
seconds, at, the, greatest, play, disingenuous, to, be, reset, annually, the, buy, out, of, american, brands, vying, for, mr., womack, currently, sharedata, incorporated, believe, chemical, prices, undoubtedly, will, be, as, much, is, scheduled, to, conscientious, teaching]
[this, would, be, a, record, november]
i
iin wwPwwwP )|()( 121
w1 w2 wn-1 STOPSTART
![Page 13: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/13.jpg)
0
0.2
0.4
0.6
0.8
1
0 200000 400000 600000 800000 1000000
Number of Words
Fra
ctio
n S
een
Unigrams
Bigrams
Rules
Sparsity
Problems with n-gram models: New words appear all the time:
Synaptitute 132,701.03 fuzzificational
New bigrams: even more often Trigrams or more – still worse!
Zipf’s Law Types (words) vs. tokens (word occurences) Broadly: most word types are rare Specifically:
Rank word types by token frequency Frequency inversely proportional to rank
Not special to language: randomly generated character strings have this property
![Page 14: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/14.jpg)
Smoothing We often want to make estimates from sparse statistics:
Smoothing flattens spiky distributions so they generalize better
Very important all over NLP, but easy to do badly!
P(w | denied the) 3 allegations 2 reports 1 claims 1 request
7 total
alle
gat
ions
atta
ck
man
outc
ome
…
alle
gat
ions
repo
rts
clai
ms
atta
ck
req
ue
st
man
outc
ome
…
alle
gat
ions
repo
rts
cla
ims
requ
est
P(w | denied the) 2.5 allegations 1.5 reports 0.5 claims 0.5 request 2 other
7 total
![Page 15: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/15.jpg)
Phrase Structure Parsing
Phrase structure parsing organizes syntax into constituents or brackets
In general, this involves nested trees
Linguists can, and do, argue about details
Lots of ambiguity
Not the only kind of syntax… new art critics write reviews with computers
PP
NP
NP
N’
NP
VP
S
![Page 16: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/16.jpg)
PP Attachment
![Page 17: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/17.jpg)
Attachment is a Simplification
I cleaned the dishes from dinner
I cleaned the dishes with detergent
I cleaned the dishes in the sink
![Page 18: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/18.jpg)
Syntactic Ambiguities I
Prepositional phrases:They cooked the beans in the pot on the stove with handles.
Particle vs. preposition:A good pharmacist dispenses with accuracy.The puppy tore up the staircase.
Complement structuresThe tourists objected to the guide that they couldn’t hear.She knows you like the back of her hand.
Gerund vs. participial adjectiveVisiting relatives can be boring.Changing schedules frequently confused passengers.
![Page 19: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/19.jpg)
Syntactic Ambiguities II
Modifier scope within NPsimpractical design requirementsplastic cup holder
Multiple gap constructionsThe chicken is ready to eat.The contractors are rich enough to sue.
Coordination scope:Small rats and mice can squeeze into holes or cracks in the wall.
![Page 20: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/20.jpg)
Human Processing
Garden pathing:
Ambiguity maintenance
![Page 21: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/21.jpg)
Context-Free Grammars
A context-free grammar is a tuple <N, T, S, R> N : the set of non-terminals
Phrasal categories: S, NP, VP, ADJP, etc. Parts-of-speech (pre-terminals): NN, JJ, DT, VB
T : the set of terminals (the words) S : the start symbol
Often written as ROOT or TOP Not usually the sentence non-terminal S
R : the set of rules Of the form X Y1 Y2 … Yk, with X, Yi N Examples: S NP VP, VP VP CC VP Also called rewrites, productions, or local trees
![Page 22: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/22.jpg)
Example CFG
Can just write the grammar (rules with non-terminal LHSs) and lexicon (rules with pre-terminal LHSs)
ROOT S
S NP VP
VP VBP
VP VBP NP
VP VP PP
PP IN NP
JJ new
NN art
NNS critics
NNS reviews
NNS computers
VBP write
IN with
NP NNS
NP NN
NP JJ NP
NP NP NNS
NP NP PP
Grammar Lexicon
![Page 23: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/23.jpg)
Top-Down Generation from CFGs
A CFG generates a language Fix an order: apply rules to leftmost non-terminal
Gives a derivation of a tree using rules of the grammar
ROOT
S
NP VP
NNS VP
critics VP
critics VBP NP
critics write NP
critics write NNS
critics write reviews
ROOT
S
NP VP
NNS
critics
VBP NP
NNS
reviews
write
![Page 24: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/24.jpg)
Corpora A corpus is a collection of text
Often annotated in some way Sometimes just lots of text Balanced vs. uniform corpora
Examples Newswire collections: 500M+ words Brown corpus: 1M words of tagged
“balanced” text Penn Treebank: 1M words of parsed
WSJ Canadian Hansards: 10M+ words of
aligned French / English sentences The Web: billions of words of who
knows what
![Page 25: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/25.jpg)
Treebank Sentences
![Page 26: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/26.jpg)
Corpus-Based Methods A corpus like a treebank gives us three important tools:
It gives us broad coverage
ROOT S
S NP VP .
NP PRP
VP VBD ADJ
![Page 27: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/27.jpg)
PLURAL NOUN
NOUNDETDET
ADJ
NOUN
NP NP
CONJ
NP PP
Why is Language Hard?
Scale
![Page 28: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/28.jpg)
Parsing as Search: Top-Down
Top-down parsing: starts with the root and tries to generate the input
ROOTROOT
S
ROOT
S
NP VP
ROOT
S
NP VP
NNS
ROOT
S
NP VP
NP PPINPUT: critics write reviews
![Page 29: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/29.jpg)
Treebank Parsing in 20 sec
Need a PCFG for broad coverage parsing. Can take a grammar right off the trees (doesn’t work well):
Better results by enriching the grammar (e.g., lexicalization). Can also get reasonable parsers without lexicalization.
ROOT S 1
S NP VP . 1
NP PRP 1
VP VBD ADJP 1
…..
![Page 30: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/30.jpg)
PCFGs and Independence
Symbols in a PCFG define independence assumptions:
At any node, the material inside that node is independent of the material outside that node, given the label of that node.
Any information that statistically connects behavior inside and outside a node must flow through that node.
NP
S
VPS NP VP
NP DT NN
NP
![Page 31: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/31.jpg)
Corpus-Based Methods
It gives us statistical information
11%9%
6%
NP PP DT NN PRP
9% 9%
21%
NP PP DT NN PRP
7%4%
23%
NP PP DT NN PRP
All NPs NPs under S NPs under VP
This is a very different kind of subject/object asymmetry than what many linguists are interested in.
![Page 32: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/32.jpg)
Corpus-Based Methods
It lets us check our answers!
![Page 33: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/33.jpg)
Semantic Interpretation
Back to meaning! A very basic approach to computational semantics Truth-theoretic notion of semantics (Tarskian) Assign a “meaning” to each word Word meanings combine according to the parse structure People can and do spend entire courses on this topic We’ll spend about an hour!
What’s NLP and what isn’t? Designing meaning representations? Computing those representations? Reasoning with them?
Supplemental reading will be on the web page.
![Page 34: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/34.jpg)
Meaning “Meaning”
What is meaning? “The computer in the corner.” “Bob likes Alice.” “I think I am a gummi bear.”
Knowing whether a statement is true? Knowing the conditions under which it’s true? Being able to react appropriately to it?
“Who does Bob like?” “Close the door.”
A distinction: Linguistic (semantic) meaning
“The door is open.” Speaker (pragmatic) meaning
Today: assembling the semantic meaning of sentence from its parts
![Page 35: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/35.jpg)
Entailment and Presupposition
Some notions worth knowing: Entailment:
A entails B if A being true necessarily implies B is true ? “Twitchy is a big mouse” “Twitchy is a mouse” ? “Twitchy is a big mouse” “Twitchy is big” ? “Twitchy is a big mouse” “Twitchy is furry”
Presupposition: A presupposes B if A is only well-defined if B is true “The computer in the corner is broken” presupposes that
there is a (salient) computer in the corner
![Page 36: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/36.jpg)
Truth-Conditional Semantics
Linguistic expressions: “Bob sings”
Logical translations: sings(bob) Could be p_1218(e_397)
Denotation: [[bob]] = some specific person (in some context) [[sings(bob)]] = ???
Types on translations: bob : e (for entity) sings(bob) : t (for truth-value)
S
NP
Bob
bob
VP
sings
y.sings(y)
sings(bob)
![Page 37: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/37.jpg)
Truth-Conditional Semantics
Proper names: Refer directly to some entity in the world Bob : bob [[bob]]W ???
Sentences: Are either true or false (given
how the world actually is) Bob sings : sings(bob)
So what about verbs (and verb phrases)? sings must combine with bob to produce sings(bob) The -calculus is a notation for functions whose arguments are not yet
filled. sings : x.sings(x) This is predicate – a function which takes an entity (type e) and
produces a truth value (type t). We can write its type as et. Adjectives?
S
NP
Bob
bob
VP
sings
y.sings(y)
sings(bob)
![Page 38: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/38.jpg)
Compositional Semantics
So now we have meanings for the words How do we know how to combine words? Associate a combination rule with each grammar rule:
S : () NP : VP : (function application) VP : x . (x) (x) VP : and : VP : (intersection)
Example:S
NP VP
Bob VP and
sings
VP
dancesbob
y.sings(y) z.dances(z)
x.sings(x) dances(x)
[x.sings(x) dances(x)](bob)
sings(bob) dances(bob)
![Page 39: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/39.jpg)
Other Cases
Transitive verbs: likes : x.y.likes(y,x) Two-place predicates of type e(et). likes Amy : y.likes(y,Amy) is just like a one-place predicate.
Quantifiers: What does “Everyone” mean here? Everyone : f.x.f(x) Mostly works, but some problems
Have to change our NP/VP rule. Won’t work for “Amy likes everyone.”
“Everyone like someone.” This gets tricky quickly!
S
NP VP
Everyone VBP NP
Amylikes
x.y.likes(y,x)
y.likes(y,amy)
amy
f.x.f(x)
[f.x.f(x)](y.likes(y,amy))
x.likes(x,amy)
![Page 40: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/40.jpg)
Denotation
What do we do with logical translations? Translation language (logical form) has fewer
ambiguities Can check truth value against a database
Denotation (“evaluation”) calculated using the database
More usefully: assert truth and modify a database Questions: check whether a statement in a corpus
entails the (question, answer) pair: “Bob sings and dances” “Who sings?” + “Bob”
Chain together facts and use them for comprehension
![Page 41: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/41.jpg)
Grounding
Grounding So why does the translation likes : x.y.likes(y,x) have anything
to do with actual liking? It doesn’t (unless the denotation model says so) Sometimes that’s enough: wire up bought to the appropriate
entry in a database
Meaning postulates Insist, e.g x,y.likes(y,x) knows(y,x) This gets into lexical semantics issues
Statistical version?
![Page 42: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/42.jpg)
Tense and Events
In general, you don’t get far with verbs as predicates Better to have event variables e
“Alice danced” : danced(alice) e : dance(e) agent(e,alice) (time(e) < now)
Event variables let you talk about non-trivial tense / aspect structures “Alice had been dancing when Bob sneezed” e, e’ : dance(e) agent(e,alice)
sneeze(e’) agent(e’,bob) (start(e) < start(e’) end(e) = end(e’)) (time(e’) < now)
![Page 43: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/43.jpg)
Propositional Attitudes
“Bob thinks that I am a gummi bear” thinks(bob, gummi(me)) ? Thinks(bob, “I am a gummi bear”) ? thinks(bob, ^gummi(me)) ?
Usual solution involves intensions (^X) which are, roughly, the set of possible worlds (or conditions) in which X is true
Hard to deal with computationally Modeling other agents models, etc Can come up in simple dialog scenarios, e.g., if you want to talk
about what your bill claims you bought vs. what you actually bought
![Page 44: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/44.jpg)
Trickier Stuff Non-Intersective Adjectives
green ball : x.[green(x) ball(x)] fake diamond : x.[fake(x) diamond(x)] ?
Generalized Quantifiers the : f.[unique-member(f)] all : f. g [x.f(x) g(x)] most? Could do with more general second order predicates, too (why worse?)
the(cat, meows), all(cat, meows) Generics
“Cats like naps” “The players scored a goal”
Pronouns (and bound anaphora) “If you have a dime, put it in the meter.”
… the list goes on and on!
x.[fake(diamond(x))
![Page 45: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/45.jpg)
Multiple Quantifiers
Quantifier scope Groucho Marx celebrates quantifier order
ambiguity:“In this country a woman gives birth every 15 min.
Our job is to find that woman and stop her.”
Deciding between readings “Bob bought a pumpkin every Halloween” “Bob put a pumpkin in every window”
![Page 46: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/46.jpg)
![Page 47: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/47.jpg)
Indefinites
First try “Bob ate a waffle” : ate(bob,waffle) “Amy ate a waffle” : ate(amy,waffle)
Can’t be right! x : waffle(x) ate(bob,x) What does the translation
of “a” have to be? What about “the”? What about “every”?
S
NP VP
Bob VBD NP
a waffleate
![Page 48: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/48.jpg)
Adverbs
What about adverbs? “Bob sings terribly”
terribly(sings(bob)?
(terribly(sings))(bob)?
e present(e) type(e, singing) agent(e,bob) manner(e, terrible) ?
It’s really not this simple..
S
NP VP
Bob VBP ADVP
terriblysings
![Page 49: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/49.jpg)
Problem: Ambiguities
Headlines: Iraqi Head Seeks Arms Ban on Nude Dancing on Governor’s Desk Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found by Tree Kids Make Nutritious Snacks Local HS Dropouts Cut in Half Hospitals Are Sued by 7 Foot Doctors
Why are these funny?
![Page 50: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/50.jpg)
Machine Translation
What is the anticipated cost of collecting fees under the new proposal?
En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?
x yWhat
is the
anticipated
costof
collecting fees
under the
new proposal
?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?Bipartite Matching
![Page 51: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/51.jpg)
Some Output
Madame la présidente, votre présidence de cette institution a été marquante.
Mrs Fontaine, your presidency of this institution has been outstanding.Madam President, president of this house has been discoveries. Madam President, your presidency of this institution has been
impressive.
Je vais maintenant m'exprimer brièvement en irlandais.I shall now speak briefly in Irish .I will now speak briefly in Ireland . I will now speak briefly in Irish .
Nous trouvons en vous un président tel que nous le souhaitions.We think that you are the type of president that we want.We are in you a president as the wanted. We are in you a president as we the wanted.
![Page 52: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/52.jpg)
Information Extraction
![Page 53: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/53.jpg)
![Page 54: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/54.jpg)
Just a Code?
“Also knowing nothing official about, but having guessed and inferred considerable about, the powerful new mechanized methods in cryptography—methods which I believe succeed even when one does not know what language has been coded—one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ”
Warren Weaver (1955:18, quoting a letter he wrote in 1947)
![Page 55: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/55.jpg)
Levels of Transfer
Interlingua
SemanticStructure
SemanticStructure
SyntacticStructure
SyntacticStructure
WordStructure
WordStructure
Source Text Target Text
SemanticComposition
SemanticDecomposition
SemanticAnalysis
SemanticGeneration
SyntacticAnalysis
SyntacticGeneration
MorphologicalAnalysis
MorphologicalGeneration
SemanticTransfer
SyntacticTransfer
Direct
(Vauquois triangle)
![Page 56: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/56.jpg)
A Recursive Parser
Here’s a recursive (CNF) parser:
bestParse(X,i,j,s)
if (j = i+1)
return X -> s[i]
(X->YZ,k) = argmax score(X->YZ) *
bestScore(Y,i,k,s) *
bestScore(Z,k,j,s)
parse.parent = X
parse.leftChild = bestParse(Y,i,k,s)
parse.rightChild = bestParse(Z,k,j,s)
return parse
![Page 57: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/57.jpg)
A Recursive Parser
Will this parser work? Why or why not? Memory requirements?
bestScore(X,i,j,s)
if (j = i+1)
return tagScore(X,s[i])
else
return max score(X->YZ) *
bestScore(Y,i,k) *
bestScore(Z,k,j)
![Page 58: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/58.jpg)
A Memoized Parser
One small change:
bestScore(X,i,j,s)
if (scores[X][i][j] == null)
if (j = i+1)
score = tagScore(X,s[i])
else
score = max score(X->YZ) *
bestScore(Y,i,k) *
bestScore(Z,k,j)
scores[X][i][j] = score
return scores[X][i][j]
![Page 59: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/59.jpg)
Memory: Theory
How much memory does this require? Have to store the score cache Cache size: |symbols|*n2 doubles For the plain treebank grammar:
X ~ 20K, n = 40, double ~ 8 bytes = ~ 256MB Big, but workable.
What about sparsity?
![Page 60: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/60.jpg)
Time: Theory
How much time will it take to parse? Have to fill each cache element (at worst) Each time the cache fails, we have to:
Iterate over each rule X Y Z and split point k Do constant work for the recursive calls
Total time: |rules|*n3
Cubic time Something like 5 sec for an unoptimized parse
of a 20-word sentences
![Page 61: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/61.jpg)
PLURAL NOUN
NOUNDETDET
ADJ
NOUN
NP NP
CONJ
NP PP
Problem: Scale People did know that language was ambiguous!
…but they hoped that all interpretations would be “good” ones (or ruled out pragmatically)
…they didn’t realize how bad it would be
![Page 62: CS 188: Artificial Intelligence Spring 2006 Lecture 27: NLP 4/27/2006 Dan Klein – UC Berkeley](https://reader035.vdocuments.mx/reader035/viewer/2022062322/5697c0021a28abf838cc2c33/html5/thumbnails/62.jpg)
Problem: Sparsity
However: sparsity is always a problem New unigram (word), bigram (word pair), and rule
rates in newswire
00.10.20.30.40.50.60.70.80.9
1
0 200000 400000 600000 800000 1000000
Number of Words
Fra
cti
on
Se
en
Unigrams
Bigrams
Rules