natural language processing: parsing

49
Artificial Intelligence Natural Language Processing: Parsing Rushdi Shams Computational Linguistics Lab Western University. [email protected]

Upload: rushdi-shams

Post on 19-Jan-2015

714 views

Category:

Education


4 download

DESCRIPTION

This lecture talks about parsing. Briefly gives overview on lexicon, categorization, grammar rules, syntactic tree, word senses and various challenges of natural language processing

TRANSCRIPT

Page 1: Natural Language Processing: Parsing

Artificial Intelligence

Natural Language Processing: Parsing

Rushdi ShamsComputational Linguistics Lab

Western [email protected]

Page 2: Natural Language Processing: Parsing

Natural Language

• Natural Language means any language we speak

• We need to process natural language (in text, speech, etc.) so that machine can exploit it.

• Applications: numerous!– Watson (Jeopardy)– MS Word

Page 3: Natural Language Processing: Parsing

Parsing

• The first task for any NLP-based system is to read (or to parse) the text

• Parsing depends on three components of a language-

1. Lexicon2. Categorization3. Grammar Rules

Page 4: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 4

Lexicon stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | ..

is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | …

right | left | east | south | back | smelly | …

here | there | nearby | ahead | right | left | east | south | back | …

me | you | I | it | S=HE | Y’ALL …

John | Mary | Boston | UCB | PAJC | …

the | a | an | …

to | in | on | near | …

and | or | but | …

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 5: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 5

CategorizationNoun > stench | breeze | glitter | nothing | wumpus | pit | pits | gold |

east | ..

Verb > is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | …

Adjective > right | left | east | south | back | smelly | …

Adverb > here | there | nearby | ahead | right | left | east | south | back | …

Pronoun > me | you | I | it | S=HE | Y’ALL …

Name > John | Mary | Boston | UCB | PAJC | …

Article > the | a | an | …

Preposition > to | in | on | near | …

Conjunction > and | or | but | …

Digit > 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 6: Natural Language Processing: Parsing

Grammar Rules

• “The large cat”• This phrase can be parsed by an NLP-system if

it has a grammar likeNoun Phrase -> Determiner + Adjective + Noun

• If your system finds a phrase or sentence that has a pattern not mentioned in its set of Grammar Rules it won’t be able to parse them.

Page 7: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 7

Therefore...

• Parsing is the process of using grammar rules to determine whether a sentence is legal,

• and to obtain its Syntactic Tree

Page 8: Natural Language Processing: Parsing

Syntactic Tree

‘The large cat eats the small rat’

http://www.digitalenema.com/2012_07_01_archive.html

Page 9: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 9

The large cat eats the small rat

Syntactic Tree

Page 10: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 10

The large cat

Article adjective noun

Article adjective noun

eats the small rat

Syntactic Tree

Verb

Page 11: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 11

The large cat

Article adjective noun noun phrase

Article adjective noun

eats the small rat

Syntactic Tree

Verb

Page 12: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 12

The large cat

Article adjective noun Verb noun phrase

Article adjective noun

Noun phrase

eats the small rat

Syntactic Tree

Page 13: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 13

The large cat

Article adjective noun Verb noun phrase

Article adjective noun

Noun phrase verb phrase

eats the small rat

Syntactic Tree

Page 14: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 14

The large cat

Article adjective noun Verb noun phrase

Article adjective noun

Noun phrase verb phrase

sentence

eats the small rat

Syntactic Tree

Page 15: Natural Language Processing: Parsing

Label Bracketing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 15

• It is a process of representing the syntactic tree in another way.

Page 16: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 16

Do yourself: Label Bracket the tree

Page 17: Natural Language Processing: Parsing

17

Evaluation of Parsing

• The two most frequent and basic measures to evaluate parsing:

Page 18: Natural Language Processing: Parsing

18

Precision, Recall, and F1-Score

• The notions are much clearer with a contingency table-

Page 19: Natural Language Processing: Parsing
Page 20: Natural Language Processing: Parsing
Page 21: Natural Language Processing: Parsing

Evaluation of Parsing

Page 22: Natural Language Processing: Parsing

However…

http://www.cafepress.com/barrysworld/1486105

Page 23: Natural Language Processing: Parsing

And…

Page 24: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 24

Ambiguity

• There are 2 types of ambiguity-1. Lexical Ambiguity: Sentence

contains an idiom/word/term that has more than one meaning.Glasses means both drinking glasses and spectacles

Page 25: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 25

Ambiguity

2. Structural Ambiguity: Sentence has more than one syntactic treeI saw the boy with the telescope

Did you see the boy with a telescope? OrDid you see the boy who was having a telescope?

Page 26: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 26

Structural Ambiguity

Page 27: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 27

Ambiguity

• Which of the following examples have lexical ambiguity and which of them carry structural ambiguity; justify-

1. The painter put on another coat2. We like flying planes3. Visiting relatives can be tiresome

Page 28: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 28

Ambiguity

• He wrote the note yesterday• You mean you carried the information by a

bus?• Connecting wires are tiring in electronics lab• Squad helps dog bite victim

Page 29: Natural Language Processing: Parsing

Word Sense

• Most of the lexical ambiguity arises from the differences in word sense.

• Word senses vary due to several factors:– Synonymy– Antonymy– Homonymy– Polysemy and– Heteronymy

Page 30: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 30

Synonymy

• Synonyms are different words (or sometimes phrases) with identical or very similar meanings.

• Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy

Page 31: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 31

Synonymy

• student and pupil (noun)• buy and purchase (verb)• sick and ill (adjective)• quickly and speedily (adverb)• on and upon (preposition)

Page 32: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 32

Synonymy is a relation between senses rather than words

• Note that synonyms are defined with respect to certain senses of words

• pupil as the "aperture in the iris of the eye" is not synonymous with student.

• Similarly, he expired means the same as he died, yet my passport has expired cannot be replaced by my passport has died.

Page 33: Natural Language Processing: Parsing

Synonymy is a relation between senses rather than words

• Consider the words big and large• Are they synonyms?:

– How big is the plane?– Are we travelling with a large or small plane?

• How about?:– Mrs Benjamin became a big sister of him– Mrs Benjamin became a large sister of him

Page 34: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 34

Heteronymy

• heteronyms (also known as heterophones) are words with – identical spellings (or characters) – but different pronunciations and meanings.

Page 35: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 35

Antonymy

• Antonyms are words with opposite or nearly opposite meanings.

• short and tall• dead and alive• increase and decrease

Page 36: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 36

Homonymy

• A homonym is one of a group of words that – share the same spelling but– Have different distinct meaning

• Bank (Financial Institute) vs Bank (Sloping Land)• Bat (A club for hitting the ball) vs Bat (Mammal)

• Homographs (Bank/Bank, Bat/Bat)• Homophones (Right/Write, Piece/Peace)

Page 37: Natural Language Processing: Parsing

Polysemy

• Homonymous words that are related with each other– The bank was constructed in 1971 (building

related to a financial institute)– I draw money from the bank (financial institute)

Page 38: Natural Language Processing: Parsing

Hypernymy and Hyponymy

• Superclass-subclass structure– Car is a hypernym of Honda– Honda is a hyponym of Car

Page 39: Natural Language Processing: Parsing

Zeugma Test

• A test to see whether or not two words have the same sense– Which flight does serve breakfast?– Does Lufthansa serve Philadelphia?

• Simply make a conjunction:– Does Lufthansa serve breakfast and Philadelphia?

Page 40: Natural Language Processing: Parsing

WordNet 3.0• A hierarchically organized lexical database• On-line thesaurus + aspects of a dictionary

• Some other languages available or under development– (Arabic, Finnish, German, Portuguese…)

Category Unique StringsNoun 117,798Verb 11,529Adjective 22,479Adverb 4,481

Page 41: Natural Language Processing: Parsing

Senses of “bass” in Wordnet

Page 42: Natural Language Processing: Parsing

WordNet Hypernym Hierarchy for “bass”

Page 43: Natural Language Processing: Parsing

WordNet Noun Relations

Page 44: Natural Language Processing: Parsing

WordNet 3.0

• Where it is:– http://wordnetweb.princeton.edu/perl/webwn

• Libraries– Python: WordNet from NLTK

• http://www.nltk.org/Home– Java:

• JWNL, extJWNL on sourceforge

Page 45: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 45

Difficulties with Natural Language:Anaphora

• Using pronouns to refer back to entities already introduced in the text

– After Mary proposed to John, they found a preacher and got married. For the honeymoon, they went to Hawaii

– Mary saw a ring through the window and asked John for it

– Mary threw a rock at the window and broke it

Page 46: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 46

Difficulties with Natural Language:Indexicality

• Indexical sentences refer to utterance situation (place, time, etc.)

– I am over here– Why did you do that?

Page 47: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 47

Difficulties with Natural Language:Metonymy

• Using one noun phrase to stand for another

– I've read Shakespeare– Chrysler announced record profits– The ham sandwich on Table 4 wants

another beer

Page 48: Natural Language Processing: Parsing

Rushdi Shams, Dept of CSE, KUET, Bangladesh 48

Difficulties with Natural Language:Metaphor

• “Non-literal" usage of words and phrases, often systematic.

– I've tried killing the process but it won't die. Its parent keeps it alive.

Page 49: Natural Language Processing: Parsing

Summary

• The components of a language– Lexicon– Categorization– Grammar rules

• Syntactic Tree• Label Bracketing• Evaluation of Parsing• Word sense• Problem of Parsing