1 natural language processing slides adapted from pedro domingos what ’ s the problem? –input?...

22
1 Natural Language Processing Slides adapted from Pedro Domingos • What’s the problem? – Input? • Natural Language Sentences – Output? • Parse Tree • Semantic Interpretation (Logical Representation)

Post on 19-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

1

Natural Language ProcessingSlides adapted from Pedro Domingos

• What’s the problem?

– Input?• Natural Language Sentences

– Output?• Parse Tree

• Semantic Interpretation (Logical Representation)

2

Example Applications

• Enables great user interfaces!• Spelling and grammar checkers.• Http://www.askjeeves.com/• Document understanding on the WWW.• Spoken language control systems: banking,

shopping• Classification systems for messages, articles.• Machine translation tools.

3

NLP Problem Areas

Morphology: structure of words• Syntactic interpretation (parsing): create a parse

tree of a sentence.• Semantic interpretation: translate a sentence into

the representation language.– Pragmatic interpretation: incorporate current situation

into account.

– Disambiguation: there may be several interpretations. Choose the most probable

4

Some Difficult Examples

• From the newspapers:– Squad helps dog bite victim.– Helicopter powered by human flies.– Levy won’t hurt the poor.– Once-sagging cloth diaper industry saved by full

dumps.

• Ambiguities:– Lexical: meanings of ‘hot’, ‘back’.– Syntactic: I heard the music in my room.– Referential: The cat ate the mouse. It was ugly.

5

Parsing

• Context-free grammars:

– EXPR -> NUMBER– EXPR -> VARIABLE– EXPR -> (EXPR + EXPR)– EXPR -> (EXPR * EXPR)

• (2 + X) * (17 + Y) is in the grammar.• (2 + (X)) is not.• Why do we call them context-free?

6

Using CFG’s for Parsing

• Can natural language syntax be captured using a context-free grammar?– Yes, no, sort of, for the most part, maybe.

• Words:– nouns, adjectives, verbs, adverbs.– Determiners: the, a, this, that– Quantifiers: all, some, none– Prepositions: in, onto, by, through– Connectives: and, or, but, while.– Words combine together into phrases: NP, VP

7

An Example Grammar

• S -> NP VP• VP -> V NP• NP -> NAME• NP -> ART N• ART -> a | the• V -> ate | saw• N -> cat | mouse• NAME -> Sue | Tom

8

Example Parse

• The mouse saw Sue.

9

Ambiguity

• S -> NP VP• VP -> V NP • VP -> V NP NP• NP -> N• NP -> N N• NP -> Det NP• Det -> the• V -> ate | saw | bought• N -> cat | mouse |biscuits | Sue | Tom

“Sue bought the cat biscuits

10

Bottom-Up Parsing (page 666)

• Bottom-Up Algorithm page 666, Figure 22.7:

• Loop until forest has size one and equal to start symbol– Choose a subsequence of forest,– Choose a rule whose RHS matches

subsequence– Replace subsequence in forest by LHS of rule

11

Try it

• Sentence: the box floats

• S -> NP VP

• VP -> V

• NP -> ART N

• ART -> the

• N -> box

• V -> floats

12

Example: Chart Parsing

• Three main data structures: a chart, a key list, and a set of edges

• Chart:

1 32

1

4

2

3

4

Starting pointslengt

h

Name of terminal or non-terminal

13

Key List and Edges

• Key list: Push down stack of chart entries– “the” “box” “floats”

• Edges: rules that can be applied to chart entries to build up larger entries

1 32

1

4

2

3

4length

the

box

floats

ARTthe o

14

Chart Parsing Algorithm

• Loop while entries in key list– 1. Remove entry from key list

– 2. If entry already in chart, • Add edge list

• Break

– 3. Add entry from key list to chart

– 4. For all rules that begin with entry’s type, add an edge for that rule

– 5. For all edges that need the entry next, add an extended edge (see algorithm on right)

– 6. If the edge is finished, add an entry to the key list with type, start point, length, and edge list

• To extend an edge with chart entry c

– Create a new edge e’

– Set start (e’) to start (e)

– Set end(e’) to end(e)

– Set rule(e’) to rule(e) with “o” moved beyond c.

– Set the righthandside(e’) to the righthandside(e)+c

15

Try it

• Sentence: the box floats

• S -> NP VP

• VP -> V

• NP -> ART N

• ART -> the

• N -> box

• V -> floats

16

Semantic Interpretation

• Our goal: to translate sentences into a logical form.

• But: sentences convey more than true/false:– It will rain in Seattle tomorrow.– Will it rain in Seattle tomorrow?

• A sentence can be analyzed by:– propositional content, and– speech act: tell, ask, request, deny, suggest

17

Propositional Content

• We develop a logic-like language for representing propositional content:– Word-sense ambiguity – Scope ambiguity

• Proper names --> objects (John, Alon)• Nouns --> unary predicates (woman, house)• Verbs -->

– transitive: binary predicates (find, go)– intransitive: unary predicates (laugh, cry)

• Quantifiers: most, some• Example: Mary: Loves(John, Mary)

18

Statistical NLP(see book by Charniak, “Statistical

Language Learning”, MIT Press, 1993

• Consider the problem of tagging part-of-speech:– “The box floats”– “The” Det; “Box” N; “Floats” V;

• Given a sentence w(1,n), where w(i) is the i-th word, we want to find tags t(i) assigned to each word w(i)

19

The Equations

• Find the t(1,n) that maximizes– P[t(1,n)|w(1,n)]=P[w(1,n)|t(1,n)]/P(w(1,n))– So, only need to maximize P[w(1,n)|t(1,n)]

• Assume that – A word depends only on previous tag– A tag depends only on previous tag– We have:

• P[w(j)|w(1,j-1),t(1,j)]=P[w(j)|t(j)], and• P[t(j)|w(1,j-1),t(1,j-1)] = P(t(j)|t(j-1)]

– Thus, want to maximize• P[w(n)|t(n-1)]*P[t(n+1)|t(n)]*P[w(n-1)|t(n-2)]*P[t(n)|t(n-1)]…

20

Example

• “The box floats”: given a corpus (a training set)– Assignment one:

• T(1)=det, T(2) = V, T(3)=V

– P(V|det) is rather low, so is P(V|V). Thus is less likely compared to

– Assignment two: • T(t)=det, T(2) = N; t(3) = V

• P(N|det) is high, and P(V|N) is high, thus is more likely!

– In general, can use Hidden Markov Models to find probabilities

det N Vbox

floats

the

21

Experiments

• Charniak and Colleagues did some experiments on a collection of documents called the “Brown Corpus”, where tags are assigned by hand.

• 90% of the corpus are used for training and the other 10% for testing

• They show they can get 95% correctness with HMM’s.

• A really simple algorithm: assign t to w by the highest probability tag P(t|w) 91% correctness!

22

Natural Language Summary

• Parsing:– context free grammars with features.

• Semantic interpretation:– Translate sentences into logic-like language– Use additional domain knowledge for word-

sense disambiguation.– Use context to disambiguate references.