parsing with pcfg ling 571 fei xia week 3: 10/11-10/13/05
Post on 20-Dec-2015
221 views
TRANSCRIPT
Misc• Quiz 1: 15 pts, due 10/13
• Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u
• Treehouse weekly meeting: – Time: every Wed 2:30-3:30pm, tomorrow is the 1st meeting– Location: EE1 025 (Campus map 12-N, South of MGH)– Mailing list: cl-announce@u
• Others:– Pongo policies– Machines: LLC, Parrington, Treehouse– Linux commands: ssh, sftp, …– Catalyst tools: ESubmit, EPost, …
Parsing algorithms
• Top-down
• Bottom-up
• Top-down with bottom-up filtering
• Earley algorithm
• CYK algorithm
• ....
CYK algorithm
• Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm)
• Require CFG to be in Chomsky Normal Form (CNF).
• Bottom-up chart parsing algorithm using DP.
• Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring
• Complexity: )( 3NO
jii www ...1
Chomsky normal form (CNF)
• Definition of CNF:– A B C– A a– S
A, B, C are non-terminals; a is a terminal.
S is the start symbol; B and C are not.
• For every CFG, there is a CFG in CNF that is weakly equivalent.
CYK algorithm• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then
trueAii ]][][[
PBCACendmBmbegin )&(&]][][1[&&]][][[
trueAendbegin ]][][[
),,(]][][[ CBmAendbeginB
CYK algorithm (another way)• For every rule Aw_i, add it to Cell[i][i]• For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B... and Cell[m+1][end] contains C… and ABC is a rule in the grammar then add ABC to Cell[begin][end] and remember m
An example
Rules:
VP V NP V book
VP VP PP Nbook/flight/cards
NP Det N Det that/the
NP NP PP P with
PP P NP
Parse “book that flight”: C1[begin][end]
VPV NP (m=1)
NPDet N (m=2)
Nflight
---- Detthat
Nbook
Vbook
begin=1 begin=2 begin=3
end=1
end=2
end=3
Parse “book that flight”: C2[begin][span]
VPV NP (m=1)
---- NPDet N
(m=2)
Nbook
Vbook
Detthat Nflight
begin=1 begin=2 begin=3
span=1
span=2
span=3
Data structures for the chart
(1)
(2)
(3)
(4)
),,(]][][[ CBmAendbeginB booleanAendbegin :]][][[
)},{(]][[ mBCAendbeginCell
)},{(]][[ mBCAspanbeginCell
),,,(]][][[ CBmboolAendbegin
Summary of CYK algorithm
• Bottom-up using DP
• Require the CFG to be in CNF
• A very efficient algorithm
• Easy to be extended
Chomsky normal form (CNF)
• Definition of CNF:– A B C, – A a, – S Where
A, B, C are non-terminals, a is a terminal,
S is the start symbol, and B, C are not start symbols.
• For every CFG, there is a CFG in CNF that is weakly equivalent.
Converting CFG to CNF
(1) Add a new symbol S0, and a rule S0S
(so the start symbol will not appear on the rhs of any rule)
(2) Eliminate
for each rule add
for each rule , add
unless has been previously eliminated.
AAB B
AB BB
Conversion (cont)
(3) Remove unit rule
add if
unless the latter rule was previously removed.
(4) Replace a rule where k>2
with
replace any terminal with a new symbol
and add a new rule
BA
A PB
kuuuA ....21
kkn uuAAuA 1211 ,.....,
iu iU
ii uU
Removing unit rules
• Remove
• Remove
bB
SBA
ASSAaaBASAS
SS
|
||||0
bB
SBA
ASSAaaBASAS
ASSAaaBASAS
|
||||
||||0
SS 0
SS
Removing unit rules (cont)
• Remove
• Removing
BA
bB
bSA
ASSAaaBASAS
ASSAaaBASAS
|
||||
||||0
bB
ASSAaaBASAbA
ASSAaaBASAS
ASSAaaBASAS
|||||
||||
||||0
SA
Summary of CFG parsing
• Simply top-down and bottom-up parsing generate useless trees.
• Top-down with bottom-up filtering has three problems.
• Solution: use DP:– Earley algorithm– CYK algorithm
PCFG
• PCFG is an extension of CFG.
• A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P:
or
• Given a non-terminal A,
1)Pr(1
n
iiA
)Pr( A )|Pr( AA
A PCFG
S NP VP 0.8 N Mary 0.01S Aux NP VP 0.15 Nbook 0.02S VP 0.05 VPV 0.35 Vbought 0.02VPV NP 0.45VPVP PP 0.20 Deta 0.04
NPN 0.8NPDet N 0.2
….
Using probabilities
• To estimate prob of a sentence and its parse trees.
• Useful in disambiguation.
• The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.
Tn
nrpTP ))(()(
Computing P(T)
S NP VP 0.8 N Mary 0.01S Aux NP VP 0.15 Nbook 0.02S VP 0.05 VPV 0.35 Vbought 0.02VPV NP 0.45VPVP PP 0.20 Deta 0.04
NPN 0.8NPDet N 0.2
The sentence is “Mary bought a book”.
The most likely tree
• P(T, S) = P(T) * P(S|T) = P(T)
T is a parse tree, S is a sentence• The best parse tree for a sentence S
)(maxarg
),(maxarg
)(
),(maxarg
)|(maxarg)(ˆ)(
TP
STP
SP
STP
STPSTST
Find the most likely tree
Given a PCFG and a sentence, how to find the best parse tree for S?
One algorithm: CYK
CYK algorithm for CFG• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then
trueAii ]][][[
PBCACendmBmbegin )&(&]][][1[&&]][][[
trueAendbegin ]][][[
),,(]][][[ CBmAendbeginB
CYK algorithm for CFG (another implementation)
• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
))((]][][[ PwAAii i
))((*]][][1[*]][][[ PBCACendmBmbeginval
valAendbegin ]][][[
),,(]][][[ CBmAendbeginB
0val
Variables for CFG and PCFG
• CFG: whether there is a parse tree whose root is A and which covers
• PCFG: the prob of the most likely parse tree whose root is A and which covers
:]][][[ Aendbegin
endbegin ww ....
endbegin ww ....
CYK algorithm for PCFG• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
)Pr(]][][[ iwAAii
)Pr(*]][][1[*]][][[ BCACendmBmbeginval
valAendbegin ]][][[
),,(]][][[ CBmAendbeginB
]][][[ Aendbeginval
A CFG
Rules:
VP V NP V book
VP VP PP Nbook/flight/cards
NP Det N Det that/the
NP NP PP P with
PP P NP
Parse “book that flight”
VPV NP (m=1)
NPDet N (m=2)
Nflight
---- Detthat
Nbook
Vbook
begin=1 begin=2 begin=3
end=1
end=2
end=3
A PCFG
Rules:
VP V NP 0.4 V book 0.001
VP VP PP 0.2 Nbook 0.01
NP Det N 0.3 Det that 0.1
NP NP PP 0.2 P with 0.2
PP P NP 1.0 Nflight 0.02
Parse “book that flight”
VPV NP (m=1)
2.4e-7
NPDet N (m=2)
6e-4
Nflight 0.02
---- Detthat 0.1
Nbook 0.01
Vbook 0.001
begin=1 begin=2 begin=3
end=1
end=2
end=3
N-best parse trees
• Best parse tree:
• N-best parse trees:
probAendbegin ]][][[
],....,[]][][[ 1 NprobprobAendbegin
),,(]][][[ CBmAendbeginB
)],,,,(),....,,,,,[(]][][[ 11111 NNNNN jiCBmjiCBmAendbeginB
CYK algorithm for N-best• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].
]0,...,0),[Pr(]][][[ iwAAii
)Pr(** BCAppval ji
]][][[ Aendbegin
]][][1[],][][[ CendmpBmbeginp ji
]][][[ Aendbegin
]0,....,0[]][][[ Aendbegin)],,,,1),....(,,,,1[(]][][[ AendbeginB
PCFG for Language Modeling (LM)
• N-gram LM:
• Syntax-based LM:
)|Pr(*....*)|Pr(*)Pr()...Pr()Pr( 11211 nnm wwwwwwwS
)(
)Pr(
)|(*)Pr(
),Pr()Pr(
ST
T
T
T
TSPT
TSS
Calculating Pr(S)
)(*)(,
j
ji
ijji
i baba
:]][][[ Aendbegin
• Parsing: the prob of the most likely parse tree• LM: the sum of all parse trees
CYK for finding the most likely parse tree
• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
)Pr(]][][[ iwAAii
)Pr(*]][][1[*]][][[ BCACendmBmbeginval
valAendbegin ]][][[
),,(]][][[ CBmAendbeginB
]][][[ Aendbeginval
CYK for calculating LM• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:
)Pr(]][][[ iwAAii
)Pr(*]][][1[*]][][[ BCACendmBmbeginval
;]][][[ valAendbegin
CYK algorithm
One parse tree boolean tuple
All parse trees boolean list of tuples
Most likely parse tree
real number
(the max prob)
tuple
N-best parse trees
list of real numbers list of tuples
LM for sentence
real number
(the sum of probs)
not needed
]][][[ Aendbegin ]][][[ AendbeginB
Learning PCFG Probabilities
Given a treebank (i.e., a set of trees), use MLE:
Without treebanks inside-outside algorithm
)(
)(
)(
)(
)|()(
ACount
ACount
ACount
ACount
AAPAP
Problems of PCFG
• Lack of sensitivity to structural dependency:
• Lack of sensitivity to lexical dependency:
Structural Dependency
• Each PCFG rule is assumed to be independent of other rules.
• Observation: sometimes the choice of how a node expands is dependent on the location of the node in the parse tree.– NPPron depends on whether the NP was a
subject or an object
Lexical Dependency
Given P(NPNP PP) > P(VPVP PP)
should a PP always be attached to an NP?
Verbs such as “send”
Preps such as “of”, “into”
Solution to the problems
• Structural dependency
• Lexical dependency
Other more sophisticated models.
Head and head child
• Each syntactic constituent is associated with a lexical head.
• Each context-free rule has a head child:– VP V NP– NPDet N– VP VP PP– NPNP PP– VP to VP– VP aux VP
Head propagation
• Lexical head propagates from head child to its parent.
• An example: “Mary bought a book in the store.”
Lexicalized PCFG
• Lexicalized rules:– VP (bought) V(bought) NP 0.01– VPV NP | 0.01 | 0 | bought -
– VP (bought) V (bought) NP (book) 1.5e-7– VP V NP | 1.5e-7 | 0 | bought book
Finding head in a parse tree
• Head propagation table: simple rules to find head child
• An example:– (VP left V/VP/Aux)– (PP left P)– (NP right N)