parsing with pcfg ling 571 fei xia week 3: 10/11-10/13/05

59
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05

Post on 20-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Parsing with PCFG

Ling 571

Fei Xia

Week 3: 10/11-10/13/05

Outline

• Misc

• CYK algorithm

• Converting CFG into CNF

• PCFG

• Lexicalized PCFG

Misc• Quiz 1: 15 pts, due 10/13

• Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u

• Treehouse weekly meeting: – Time: every Wed 2:30-3:30pm, tomorrow is the 1st meeting– Location: EE1 025 (Campus map 12-N, South of MGH)– Mailing list: cl-announce@u

• Others:– Pongo policies– Machines: LLC, Parrington, Treehouse– Linux commands: ssh, sftp, …– Catalyst tools: ESubmit, EPost, …

CYK algorithm

Parsing algorithms

• Top-down

• Bottom-up

• Top-down with bottom-up filtering

• Earley algorithm

• CYK algorithm

• ....

CYK algorithm

• Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm)

• Require CFG to be in Chomsky Normal Form (CNF).

• Bottom-up chart parsing algorithm using DP.

• Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring

• Complexity: )( 3NO

jii www ...1

Chomsky normal form (CNF)

• Definition of CNF:– A B C– A a– S

A, B, C are non-terminals; a is a terminal.

S is the start symbol; B and C are not.

• For every CFG, there is a CFG in CNF that is weakly equivalent.

CYK algorithm• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

trueAii ]][][[

PBCACendmBmbegin )&(&]][][1[&&]][][[

trueAendbegin ]][][[

),,(]][][[ CBmAendbeginB

CYK algorithm (another way)• For every rule Aw_i, add it to Cell[i][i]• For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B... and Cell[m+1][end] contains C… and ABC is a rule in the grammar then add ABC to Cell[begin][end] and remember m

An example

Rules:

VP V NP V book

VP VP PP Nbook/flight/cards

NP Det N Det that/the

NP NP PP P with

PP P NP

Parse “book that flight”: C1[begin][end]

VPV NP (m=1)

NPDet N (m=2)

Nflight

---- Detthat

Nbook

Vbook

begin=1 begin=2 begin=3

end=1

end=2

end=3

Parse “book that flight”: C2[begin][span]

VPV NP (m=1)

---- NPDet N

(m=2)

Nbook

Vbook

Detthat Nflight

begin=1 begin=2 begin=3

span=1

span=2

span=3

Data structures for the chart

(1)

(2)

(3)

(4)

),,(]][][[ CBmAendbeginB booleanAendbegin :]][][[

)},{(]][[ mBCAendbeginCell

)},{(]][[ mBCAspanbeginCell

),,,(]][][[ CBmboolAendbegin

Summary of CYK algorithm

• Bottom-up using DP

• Require the CFG to be in CNF

• A very efficient algorithm

• Easy to be extended

Converting CFG into CNF

Chomsky normal form (CNF)

• Definition of CNF:– A B C, – A a, – S Where

A, B, C are non-terminals, a is a terminal,

S is the start symbol, and B, C are not start symbols.

• For every CFG, there is a CFG in CNF that is weakly equivalent.

Converting CFG to CNF

(1) Add a new symbol S0, and a rule S0S

(so the start symbol will not appear on the rhs of any rule)

(2) Eliminate

for each rule add

for each rule , add

unless has been previously eliminated.

AAB B

AB BB

Conversion (cont)

(3) Remove unit rule

add if

unless the latter rule was previously removed.

(4) Replace a rule where k>2

with

replace any terminal with a new symbol

and add a new rule

BA

A PB

kuuuA ....21

kkn uuAAuA 1211 ,.....,

iu iU

ii uU

An example

||

|

bB

SBA

aBASAS

Adding

||

|0

bB

SBA

aBASAS

SS

SSo

Removing rules

bB

SBA

aaBASAS

SS

||

||0

bB

SBA

SASSAaaBASAS

SS

|

|||||0

Remove B

Remove A

Removing unit rules

• Remove

• Remove

bB

SBA

ASSAaaBASAS

SS

|

||||0

bB

SBA

ASSAaaBASAS

ASSAaaBASAS

|

||||

||||0

SS 0

SS

Removing unit rules (cont)

• Remove

• Removing

BA

bB

bSA

ASSAaaBASAS

ASSAaaBASAS

|

||||

||||0

bB

ASSAaaBASAbA

ASSAaaBASAS

ASSAaaBASAS

|||||

||||

||||0

SA

Converting remaining rules

bB

aU

SAA

ASSAaUBAAbA

ASSAaUBAAS

ASSAaUBAASo

1

1

1

1

|||||

||||

||||

Summary of CFG parsing

• Simply top-down and bottom-up parsing generate useless trees.

• Top-down with bottom-up filtering has three problems.

• Solution: use DP:– Earley algorithm– CYK algorithm

Probabilistic CFG (PCFG)

PCFG

• PCFG is an extension of CFG.

• A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P:

or

• Given a non-terminal A,

1)Pr(1

n

iiA

)Pr( A )|Pr( AA

A PCFG

S NP VP 0.8 N Mary 0.01S Aux NP VP 0.15 Nbook 0.02S VP 0.05 VPV 0.35 Vbought 0.02VPV NP 0.45VPVP PP 0.20 Deta 0.04

NPN 0.8NPDet N 0.2

….

Using probabilities

• To estimate prob of a sentence and its parse trees.

• Useful in disambiguation.

• The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.

Tn

nrpTP ))(()(

Computing P(T)

S NP VP 0.8 N Mary 0.01S Aux NP VP 0.15 Nbook 0.02S VP 0.05 VPV 0.35 Vbought 0.02VPV NP 0.45VPVP PP 0.20 Deta 0.04

NPN 0.8NPDet N 0.2

The sentence is “Mary bought a book”.

The most likely tree

• P(T, S) = P(T) * P(S|T) = P(T)

T is a parse tree, S is a sentence• The best parse tree for a sentence S

)(maxarg

),(maxarg

)(

),(maxarg

)|(maxarg)(ˆ)(

TP

STP

SP

STP

STPSTST

Find the most likely tree

Given a PCFG and a sentence, how to find the best parse tree for S?

One algorithm: CYK

CYK algorithm for CFG• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

trueAii ]][][[

PBCACendmBmbegin )&(&]][][1[&&]][][[

trueAendbegin ]][][[

),,(]][][[ CBmAendbeginB

CYK algorithm for CFG (another implementation)

• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

))((]][][[ PwAAii i

))((*]][][1[*]][][[ PBCACendmBmbeginval

valAendbegin ]][][[

),,(]][][[ CBmAendbeginB

0val

Variables for CFG and PCFG

• CFG: whether there is a parse tree whose root is A and which covers

• PCFG: the prob of the most likely parse tree whose root is A and which covers

:]][][[ Aendbegin

endbegin ww ....

endbegin ww ....

CYK algorithm for PCFG• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

)Pr(]][][[ iwAAii

)Pr(*]][][1[*]][][[ BCACendmBmbeginval

valAendbegin ]][][[

),,(]][][[ CBmAendbeginB

]][][[ Aendbeginval

A CFG

Rules:

VP V NP V book

VP VP PP Nbook/flight/cards

NP Det N Det that/the

NP NP PP P with

PP P NP

Parse “book that flight”

VPV NP (m=1)

NPDet N (m=2)

Nflight

---- Detthat

Nbook

Vbook

begin=1 begin=2 begin=3

end=1

end=2

end=3

A PCFG

Rules:

VP V NP 0.4 V book 0.001

VP VP PP 0.2 Nbook 0.01

NP Det N 0.3 Det that 0.1

NP NP PP 0.2 P with 0.2

PP P NP 1.0 Nflight 0.02

Parse “book that flight”

VPV NP (m=1)

2.4e-7

NPDet N (m=2)

6e-4

Nflight 0.02

---- Detthat 0.1

Nbook 0.01

Vbook 0.001

begin=1 begin=2 begin=3

end=1

end=2

end=3

N-best parse trees

• Best parse tree:

• N-best parse trees:

probAendbegin ]][][[

],....,[]][][[ 1 NprobprobAendbegin

),,(]][][[ CBmAendbeginB

)],,,,(),....,,,,,[(]][][[ 11111 NNNNN jiCBmjiCBmAendbeginB

CYK algorithm for N-best• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

]0,...,0),[Pr(]][][[ iwAAii

)Pr(** BCAppval ji

]][][[ Aendbegin

]][][1[],][][[ CendmpBmbeginp ji

]][][[ Aendbegin

]0,....,0[]][][[ Aendbegin)],,,,1),....(,,,,1[(]][][[ AendbeginB

PCFG for Language Modeling (LM)

• N-gram LM:

• Syntax-based LM:

)|Pr(*....*)|Pr(*)Pr()...Pr()Pr( 11211 nnm wwwwwwwS

)(

)Pr(

)|(*)Pr(

),Pr()Pr(

ST

T

T

T

TSPT

TSS

Calculating Pr(S)

)(*)(,

j

ji

ijji

i baba

:]][][[ Aendbegin

• Parsing: the prob of the most likely parse tree• LM: the sum of all parse trees

CYK for finding the most likely parse tree

• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

)Pr(]][][[ iwAAii

)Pr(*]][][1[*]][][[ BCACendmBmbeginval

valAendbegin ]][][[

),,(]][][[ CBmAendbeginB

]][][[ Aendbeginval

CYK for calculating LM• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:

)Pr(]][][[ iwAAii

)Pr(*]][][1[*]][][[ BCACendmBmbeginval

;]][][[ valAendbegin

CYK algorithm

One parse tree boolean tuple

All parse trees boolean list of tuples

Most likely parse tree

real number

(the max prob)

tuple

N-best parse trees

list of real numbers list of tuples

LM for sentence

real number

(the sum of probs)

not needed

]][][[ Aendbegin ]][][[ AendbeginB

Learning PCFG Probabilities

Given a treebank (i.e., a set of trees), use MLE:

Without treebanks inside-outside algorithm

)(

)(

)(

)(

)|()(

ACount

ACount

ACount

ACount

AAPAP

Q&A

• PCFG

• CYK algorithm

Problems of PCFG

• Lack of sensitivity to structural dependency:

• Lack of sensitivity to lexical dependency:

Structural Dependency

• Each PCFG rule is assumed to be independent of other rules.

• Observation: sometimes the choice of how a node expands is dependent on the location of the node in the parse tree.– NPPron depends on whether the NP was a

subject or an object

Lexical Dependency

Given P(NPNP PP) > P(VPVP PP)

should a PP always be attached to an NP?

Verbs such as “send”

Preps such as “of”, “into”

Solution to the problems

• Structural dependency

• Lexical dependency

Other more sophisticated models.

Lexicalized PCFG

Head and head child

• Each syntactic constituent is associated with a lexical head.

• Each context-free rule has a head child:– VP V NP– NPDet N– VP VP PP– NPNP PP– VP to VP– VP aux VP

Head propagation

• Lexical head propagates from head child to its parent.

• An example: “Mary bought a book in the store.”

Lexicalized PCFG

• Lexicalized rules:– VP (bought) V(bought) NP 0.01– VPV NP | 0.01 | 0 | bought -

– VP (bought) V (bought) NP (book) 1.5e-7– VP V NP | 1.5e-7 | 0 | bought book

Finding head in a parse tree

• Head propagation table: simple rules to find head child

• An example:– (VP left V/VP/Aux)– (PP left P)– (NP right N)

Simplified Model using Lexicalized PCFG

• PCFG: P(r(n)|n)

• Lexicalized PCFG: P(r(n)|n, head(n))– P(VPVBD NP PP | VP, dumped)– P(VPVBD NP PP | VP, slept)

• Parsers that use lexicalized rules– Collins’ parser