pushpak bhattacharyya cse dept., iit bombay 8 th , 10 th march , 2011

85
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24– Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th , 10 th March, 2011 (Lectures 21 and 22 were on Sentiment Analysis by Aditya Joshi)

Upload: darva

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24– Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing). Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th , 10 th March , 2011 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 23, 24– Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

8th, 10th March, 2011

(Lectures 21 and 22 were on Sentiment Analysis by Aditya Joshi)

Page 2: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

A note on Language Modeling Example sentence

“ ^ The tortoise beat the hare in the race.”

Guided Guided by Guided byby frequency Language world

Knowledge KnowledgeN-gram (n=3) CFG Probabilistic

CFGDependencyGrammar

Prob. DG

^ the tortoise 5*10-3

S-> NP VP S->NP VP 1.0

Semantic Roles agt, obj, sen, etc.Semantic Rules are always between “Heads”

Semantic Roles with probabilitiesthe tortoise beat

3*10-2NP->DT N NP->DT N

0.5tortoise beat the 7*10-5

VP->V NP PP

VP->V NP PP 0.4

beat the hare5*10-6

PP-> P NP PP-> P NP 1.0

Page 3: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Parse Tree

S

NP VP

DT N V NP PP

The Tortoise beat DT N P NP

the hare in DT N

the race

Page 4: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

UNL Expression

beat@past

tortoise@def hare@defrace@de

f

agt obj scn (scene)

Page 5: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Purpose of LM Prediction of next word (Speech

Processing) Language Identification (for same

script) Belongingness check (parsing) P(NP->DT N) means what is the

probability that the ‘YIELD’ of the non terminal NP is DT N

Page 6: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Need for Deep Parsing Sentences are linear structures But there is a hierarchy- a tree-

hidden behind the linear structure There are constituents and

branches

Page 7: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

PPs are at the same level: flat with respect to the head word “book”

NP

PPAP

big

The

of poems

with the blue cover

[The big book of poems with theBlue cover] is on the table.

book

No distinction in terms of dominance or c-

command

PP

Page 8: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

“Constituency test of Replacement” runs into problems

One-replacement: I bought the big [book of poems with

the blue cover] not the small [one] One-replacement targets book of

poems with the blue cover Another one-replacement:

I bought the big [book of poems] with the blue cover not the small [one] with the red cover

One-replacement targets book of poems

Page 9: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

More deeply embedded structureNP

PP

AP

big

The

of poems

with the blue cover

N’1

Nbook

PP

N’2

N’3

Page 10: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Grammar and Parsing Algorithms

Page 11: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

A simplified grammar S NP VP NP DT N | N VP V ADV | V

Page 12: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Example Sentence

People laugh1 2 3

Lexicon:People - N, V Laugh - N, V

These are positions

This indicate that both Noun and Verb is

possible for the word “People”

Page 13: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top-Down Parsing State Backup State Action-----------------------------------------------------------------------------------------------------1. ((S) 1) - -

2. ((NP VP)1) - -3a. ((DT N VP)1) ((N VP) 1) -3b. ((N VP)1) - -4. ((VP)2) - Consume “People”5a. ((V ADV)2) ((V)2) -6. ((ADV)3) ((V)2) Consume “laugh”5b. ((V)2) - -6. ((.)3) - Consume “laugh”

Termination Condition : All inputs over. No symbols remaining.Note: Input symbols can be pushed back.

Position of input pointer

Page 14: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Discussion for Top-Down Parsing This kind of searching is goal driven. Gives importance to textual precedence

(rule precedence). No regard for data, a priori (useless

expansions made).

Page 15: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Bottom-Up Parsing

Some conventions:N12

S1? -> NP12 ° VP2?

Represents positions

End position unknownWork on the LHS done, while the work on RHS remaining

Page 16: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Bottom-Up Parsing (pictorial representation)

S -> NP12 VP23 °

People Laugh 1 2 3

N12 N23

V12 V23

NP12 -> N12 ° NP23 -> N23 ° VP12 -> V12 ° VP23 -> V23 ° S1? -> NP12 ° VP2?

Page 17: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Problem with Top-Down Parsing• Left Recursion

• Suppose you have A-> AB rule. Then we will have the expansion as

follows:• ((A)K) -> ((AB)K) -> ((ABB)K) ……..

Page 18: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Combining top-down and bottom-up strategies

Page 19: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top-Down Bottom-Up Chart Parsing Combines advantages of top-down &

bottom-up parsing. Does not work in case of left recursion.

e.g. – “People laugh” People – noun, verb Laugh – noun, verb

Grammar – S NP VPNP DT N | NVP V ADV | V

Page 20: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Transitive ClosurePeople laugh

1 2 3

S NP VP NP N VP V

NP DT N S NPVP S NP VP NP N VP V ADV success

VP V

Page 21: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Arcs in Parsing Each arc represents a chart which

records Completed work (left of ) Expected work (right of )

Page 22: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

ExamplePeople laugh loudly

1 2 3 4

S NP VP NP N VP V VP V ADVNP DT N S NPVP VP VADV S NP VPNP N VP V ADV S NP VP

VP V

Page 23: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Dealing With Structural Ambiguity Multiple parses for a sentence

The man saw the boy with a telescope.

The man saw the mountain with a telescope.

The man saw the boy with the ponytail.

At the level of syntax, all these sentences are ambiguous. But semantics can disambiguate 2nd & 3rd sentence.

Page 24: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Prepositional Phrase (PP) Attachment ProblemV – NP1 – P – NP2

(Here P means preposition)NP2 attaches to NP1 ?or NP2 attaches to V ?

Page 25: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Parse Trees for a Structurally Ambiguous SentenceLet the grammar be – S NP VPNP DT N | DT N PPPP P NPVP V NP PP | V NPFor the sentence,“I saw a boy with a telescope”

Page 26: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Parse Tree - 1S

NP VP

N V NP

Det N PP

P NP

Det N

I saw

a boy

with

a telescope

Page 27: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Parse Tree -2S

NP VP

N V NP

Det N

PP

P NP

Det NI saw

a boy with

a telescope

Page 28: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Parsing Structural Ambiguity

Page 29: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Parsing for Structurally Ambiguous Sentences Sentence “I saw a boy with a telescope” Grammar:

S NP VPNP ART N | ART N PP | PRONVP V NP PP | V NP

ART a | an | theN boy | telescopePRON IV saw

Page 30: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Ambiguous Parses Two possible parses:

PP attached with Verb (i.e. I used a telescope to see)

( S ( NP ( PRON “I” ) ) ( VP ( V “saw” ) ( NP ( (ART “a”) ( N “boy”))( PP (P “with”) (NP ( ART “a” ) ( N

“telescope”))))) PP attached with Noun (i.e. boy had a

telescope) ( S ( NP ( PRON “I” ) ) ( VP ( V “saw” )

( NP ( (ART “a”) ( N “boy”) (PP (P “with”) (NP ( ART “a” ) ( N

“telescope”))))))

Page 31: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

Page 32: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

Page 33: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

Page 34: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

Page 35: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

Page 36: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) − Verb Attachment Rule used

Page 37: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) − Verb Attachment Rule used

6 ( ( NP PP ) 3 ) − Consumed “saw”

Page 38: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) − Verb Attachment Rule used

6 ( ( NP PP ) 3 ) − Consumed “saw”

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

Page 39: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) − Verb Attachment Rule used

6 ( ( NP PP ) 3 ) − Consumed “saw”

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

Page 40: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) − Verb Attachment Rule used

6 ( ( NP PP ) 3 ) − Consumed “saw”

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

9 ( ( PP ) 5 ) − Consumed “boy”

Page 41: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

2 ( ( NP VP ) 1 ) − − Use NP ART N | ART N PP | PRON

3 ( ( ART N VP ) 1 ) (a) ( ( ART N PP VP ) 1 )

(b) ( ( PRON VP ) 1)

− ART does not match “I”,

backup state (b) used

3B

( ( PRON VP ) 1 ) − −

4 ( ( VP ) 2 ) − Consumed “I”

5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) − Verb Attachment Rule used

6 ( ( NP PP ) 3 ) − Consumed “saw”

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

9 ( ( PP ) 5 ) − Consumed “boy”

10

( ( P NP ) ) − −

Page 42: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

… … … … …

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

9 ( ( PP ) 5 ) − Consumed “boy”

10

( ( P NP ) 5 ) − −

11

( ( NP ) 6 ) − Consumed “with”

Page 43: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

… … … … …

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

9 ( ( PP ) 5 ) − Consumed “boy”

10

( ( P NP ) 5 ) − −

11

( ( NP ) 6 ) − Consumed “with”

12

( ( ART N ) 6 ) (a) ( ( ART N PP ) 6 )(b) ( ( PRON ) 6)

Page 44: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

… … … … …

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

9 ( ( PP ) 5 ) − Consumed “boy”

10

( ( P NP ) 5 ) − −

11

( ( NP ) 6 ) − Consumed “with”

12

( ( ART N ) 6 ) (a) ( ( ART N PP ) 6 )(b) ( ( PRON ) 6)

13

( ( N ) 7 ) − Consumed “a”

Page 45: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down ParseState Backup State Action Comments

1 ( ( S ) 1 ) − − Use S NP VP

… … … … …

7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 )

(b) ( ( PRON PP ) 3 )

8 ( ( N PP) 4 ) − Consumed “a”

9 ( ( PP ) 5 ) − Consumed “boy”

10

( ( P NP ) 5 ) − −

11

( ( NP ) 6 ) − Consumed “with”

12

( ( ART N ) 6 ) (a) ( ( ART N PP ) 6 )(b) ( ( PRON ) 6)

13

( ( N ) 7 ) − Consumed “a”

14

( ( − ) 8 ) − Consume “telescope”

Finish Parsing

Page 46: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Top Down Parsing - Observations Top down parsing gave us the Verb

Attachment Parse Tree (i.e., I used a telescope)

To obtain the alternate parse tree, the backup state in step 5 will have to be invoked

Is there an efficient way to obtain all parses ?

Page 47: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

Colour Scheme : Blue for Normal Parse Green for Verb Attachment Parse Purple for Noun Attachment Parse Red for Invalid Parse

Bottom Up Parse

Page 48: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

Bottom Up Parse

NP12 PRON12S1?NP12VP2?

Page 49: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3?

Page 50: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

Page 51: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

NP35ART34N45

Page 52: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

NP35ART34N45

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5?

Page 53: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5?

Page 54: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45 NP68ART67N7?

NP6?ART67N78PP8?

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5?

Page 55: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45 NP68ART67N7?

NP6?ART67N78PP8?

NP68ART67N78

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5?

Page 56: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45 NP68ART67N7?

NP6?ART67N78PP8?

NP68ART67N78

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5? PP58P56NP68

Page 57: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45 NP68ART67N7?

NP6?ART67N78PP8?

NP68ART67N78

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5? PP58P56NP68

NP38ART34N45PP58

Page 58: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45 NP68ART67N7?

NP6?ART67N78PP8?

NP68ART67N78

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5? PP58P56NP68

NP38ART34N45PP58

VP28V23NP38VP28V23NP35PP58

Page 59: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

I saw a boy with a telescope1 2 3 4 5 6 7 8

NP3?ART34N45PP5?

Bottom Up Parse

NP12 PRON12S1?NP12VP2? VP2?V23NP3?PP??

VP2?V23NP3? NP35 ART34N45

NP3?ART34N45PP5?

PP5?P56NP6?NP35ART34N45 NP68ART67N7?

NP6?ART67N78PP8?

NP68ART67N78

VP25V23NP35 S15NP12VP25

VP2?V23NP35PP5? PP58P56NP68

NP38ART34N45PP58

VP28V23NP38VP28V23NP35PP58

S18NP12VP28

Page 60: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Bottom Up Parsing - Observations Both Noun Attachment and Verb

Attachment Parses obtained by simply systematically applying the rules

Numbers in subscript help in verifying the parse and getting chunks from the parse

Page 61: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

ExerciseFor the sentence,“The man saw the boy with a

telescope” & the grammar given previously, compare the performance of top-down, bottom-up & top-down chart parsing.

Page 62: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Start of Probabilistic Parsing

Page 63: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Example of Sentence labeling: Parsing

[S1[S[S[VP[VBCome][NP[NNPJuly]]]][,,][CC and][S [NP [DT the] [JJ IIT] [NN campus]][VP [AUX is][ADJP [JJ abuzz][PP[IN with][NP[ADJP [JJ new] [CC and] [ VBG returning]][NNS students]]]]]][..]]]

Page 64: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Noisy Channel ModelingNoisy ChannelSource

sentenceTarget parse

T*= argmax [P(T|S)]T

= argmax [P(T).P(S|T)]T

= argmax [P(T)], since given the parse theT sentence is completely

determined and P(S|T)=1

Page 65: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Corpus A collection of text called corpus, is used for

collecting various language data With annotation: more information, but manual labor

intensive Practice: label automatically; correct manually The famous Brown Corpus contains 1 million tagged

words. Switchboard: very famous corpora 2400

conversations, 543 speakers, many US dialects, annotated with orthography and phonetics

Page 66: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Discriminative vs. Generative Model

W* = argmax (P(W|SS)) W

Compute directly fromP(W|SS)

Compute fromP(W).P(SS|W)

DiscriminativeModel

GenerativeModel

Page 67: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Language Models N-grams: sequence of n consecutive

words/characters Probabilistic / Stochastic Context Free

Grammars: Simple probabilistic models capable of

handling recursion A CFG with probabilities attached to rules Rule probabilities how likely is it that a

particular rewrite rule is used?

Page 68: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

PCFGs Why PCFGs?

Intuitive probabilistic models for tree-structured languages

Algorithms are extensions of HMM algorithms

Better than the n-gram model for language modeling.

Page 69: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Formal Definition of PCFG A PCFG consists of

A set of terminals {wk}, k = 1,….,V {wk} = { child, teddy, bear, played…}

A set of non-terminals {Ni}, i = 1,…,n {Ni} = { NP, VP, DT…} A designated start symbol N1

A set of rules {Ni j}, where j is a

sequence of terminals & non-terminals NP DT NN A corresponding set of rule probabilities

Page 70: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Rule Probabilities Rule probabilities are such that

E.g., P( NP DT NN) = 0.2 P( NP NN) = 0.5 P( NP NP PP) = 0.3

P( NP DT NN) = 0.2 Means 20 % of the training data

parses use the rule NP DT NN

i

P(N ) 1i ji

Page 71: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Probabilistic Context Free Grammars

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Page 72: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Example Parse t1`

The gunman sprayed the building with bullets. S1.0

NP0.5 VP0.6

DT1.0NN0.5

VBD1.0NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.0

bullets

with

buildingthe

The gunman

sprayed

P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225VP0.4

Page 73: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0 NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0bullets

withbuilding

the

Thegunman

sprayed

NP0.2

P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015

The gunman sprayed the building with bullets.

Page 74: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Probability of a sentence Notation :

wab – subsequence wa….wb Nj dominates wa….wb

or yield(Nj) = wa….wbwa……………..wb

Nj

1

1 1

1

: ( )

( ) ( , )

= ( ) ( | )

( )m

m mt

mt

t yield t w

P w P w t

P t P w t

P t

Where t is a parse tree of the sentence

the..sweet..teddy ..bear

NP

• Probability of a sentence = P(w1m)

1( | ) = 1mP w t If t is a parse tree for the sentence w1m, this will be 1 !!

Page 75: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Assumptions of the PCFG model Place invariance :

P(NP DT NN) is same in locations 1 and 2 Context-free :

P(NP DT NN | anything outside “The child”) = P(NP DT NN)

Ancestor free : At 2,P(NP DT NN|its ancestor is VP)

= P(NP DT NN)

S

NP

The child

VP

NP

The toy

1

2

Page 76: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Probability of a parse tree Domination :We say Nj dominates from k

to l, symbolized as , if Wk,l is derived from Nj

P (tree |sentence) = P (tree | S1,l ) where S1,l means that the start symbol S dominates the word sequence

W1,l

P (t |s) approximately equals joint probability of constituent non-terminals dominating the sentence fragments (next slide)

Page 77: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Probability of a parse tree (cont.)S1,l

NP1,2 VP3,l

N2V3,3 PP4,l

P4,4 NP5,lw2

w4

DT1

w1 w3

w5 wl

P ( t|s ) = P (t | S1,l )= P ( NP1,2, DT1,1 , w1,

N2,2, w2,VP3,l, V3,3 , w3,

PP4,l, P4,4 , w4, NP5,l, w5…l | S1,l )

= P ( NP1,2 , VP3,l | S1,l) * P ( DT1,1 , N2,2 | NP1,2) * D(w1 | DT1,1) * P (w2 | N2,2) * P (V3,3, PP4,l | VP3,l) * P(w3 | V3,3) * P( P4,4, NP5,l | PP4,l ) * P(w4|P4,4) * P (w5…l | NP5,l)

(Using Chain Rule, Context Freeness and Ancestor Freeness )

Page 78: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

HMM ↔ PCFG O observed sequence ↔ w1m

sentence X state sequence ↔ t parse

tree model ↔ G grammar

Three fundamental questions

Page 79: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

HMM ↔ PCFG How likely is a certain observation given the

model? ↔ How likely is a sentence given the grammar?

How to choose a state sequence which best explains the observations? ↔ How to choose a parse which best supports the sentence?

arg max ( | , )X

P X O 1arg max ( | , )mt

P t w G↔

1( | )mP w G( | )P O ↔

Page 80: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

HMM ↔ PCFG

How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences?arg max ( | )P O

1arg max ( | )m

GP w G↔

Page 81: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Interesting Probabilities

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

(4,5) NP

N1

NP

(4,5)NP

What is the probability of having a NP at this position such that it will derive “the building” ? -

What is the probability of starting from N1 and deriving “The gunman sprayed”, a NP and “with bullets” ? -

Inside Probabilities

Outside Probabilities

Page 82: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Interesting Probabilities Random variables to be considered

The non-terminal being expanded. E.g., NP

The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building”

While calculating probabilities, consider: The rule to be used for expansion :

E.g., NP DT NN The probabilities associated with the RHS

non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

Page 83: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Outside Probability j(p,q) : The probability of beginning

with N1 & generating the non-terminal Nj

pq and all words outside wp..wq1( 1) ( 1)( , ) ( , , | )j

j p pq q mp q P w N w G

w1 ………wp-1 wp…wqwq+1 ……… wm

N1

Nj

Page 84: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Inside Probabilities j(p,q) : The probability of generating

the words wp..wq starting with the non-terminal Nj

pq.( , ) ( | , )jj pq pqp q P w N G

w1 ………wp-1 wp…wqwq+1 ……… wm

N1

Nj

Page 85: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay  8 th , 10 th  March , 2011

Outside & Inside Probabilities: example

The gunman sprayed the building with bullets 1 2 3 4 5 6 7

4,5

(4,5) for "the building"

(The gunman sprayed, , with bullets | )NP

P NP G

N1

NP

4,5(4,5) for "the building" (the building | , )NP P NP G