probabilistic parsing ling 571 fei xia week 4: 10/18-10/20/05
Post on 22-Dec-2015
219 views
TRANSCRIPT
Probabilistic Parsing
Ling 571
Fei Xia
Week 4: 10/18-10/20/05
Outline
• Misc: Hw3 and Hw4: lexicalized rules
• CYK recap– Converting CFG into CNF– N-best
• Quiz #2
• Common prob equations
• Independence assumption
• Lexicalized models
CYK Recap
Converting CFG into CNF
• CNF
• Extended CNF
• CFG in general vs. CFG for natural languages
• Converting CFG into CNF
• Converting PCFG into CNF
• Recovering parse trees
Definition of CNF
• A, B,C are non-terminal, a is terminal, S is start symbol
• Definition 1: – A B C, – A a, – S Where B, C are not start symbols.
• Definition 2: -free grammar– A B C– A a
Extended CNF
• Definition 3:– A B C– A a or A B
• We use Def 3:– Unit rules such as NPN are allowed.– No need to remove unit rules during
conversion.– CYK algorithm needs to be modified.
CYK algorithm with Def 2 • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
)Pr(]][][[ iwAAii
)Pr(*]][][1[*]][][[ BCACendmBmbeginval
valAendbegin ]][][[
),,(]][][[ CBmAendbeginB
]][][[ Aendbeginval
CYK algorithm with Def 3• For every position i for all A, if Aw_i, for all A and B, if A=>B, update
• For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: …. for all non-terminals A and B, if AB, update
)Pr(]][][[ iwAAii ]][][[ Aii
]][][[ Aendbegin
CFG
• CFG in general:– G=(N, T, P, S)– Rules:
• CFG for natural languages:– G=(N, T, P, S)– Pre-terminal: – Rules:
• Syntactic rules:
• Lexicon:
*)(, TNA
NN 1
1,, NNANA
1, NAaA
Conversion from CFG to CNF
• CFG (in general) to CNF (Def 1)– Add S0S– Remove e-rules– Remove unit rules– Replace n-ary rules with binary rules
• CFG (for NL) to CNF (Def 3)– CFG (for NL) has no e-rules– Unit rules are allowed in CNF (Def 3)– Only the last step is necessary
An example
• VP V NP PP PP
• To recover the parse tree w.r.t original CFG, just remove added non-terminals.
Converting PCFG into CNF
• VPV NP PP PP 0.1
=>
VPV X1 0.1
X1 NP X2 1.0
X2 PP PP 1.0
CYK with N-best output
N-best parse trees
• Best parse tree:
• N-best parse trees:
probAendbegin ]][][[
],....,[]][][[ 1 NprobprobAendbegin
),,(]][][[ CBmAendbeginB
)],,,,(),....,,,,,[(]][][[ 11111 NNNNN jiCBmjiCBmAendbeginB
CYK algorithm for N-best• For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].
]0,...,0),[Pr(]][][[ iwAAii
)Pr(** BCAppval ji
]][][[ Aendbegin
]][][1[],][][[ CendmpBmbeginp ji
]][][[ Aendbegin
]0,....,0[]][][[ Aendbegin)],,,,1),....(,,,,1[(]][][[ AendbeginB
Mary bought books with cash
SNP VP (1,1,1)
SNP VP (1,1,2)
VPV NP (2,1,1)
VPVP PP (3,1,1)
NPNP PP (3,1,1)
PPP NP (4,1,1)
Ncash
NPN
- - - Pwith
SNP VP (1,1,1)
VPV NP (2,1,1)
Nbooks
NPN
- Vbought
Nbook
NPN
Common probability equations
Three types of probability
• Joint prob: P(x,y)= prob of x and y happening together
• Conditional prob: P(x|y) = prob of x given a specific value of y
• Marginal prob: P(x) = prob of x for all possible values of y
Common equations
)(
),()|(
)|(*)()|(*)(),(
),()(
AP
BAPABP
BAPBPABPAPBAP
BAPAPB
An example
• #(words)=100, #(nouns)=40, #(verbs)=20• “books” appears 10 times, 3 as verbs, 7 as
nouns
• P(w=books)=0.1• P(w=books,t=noun)=0.07• P(t=noun|w=books)=0.7• P(nouns)=0.4• P(w=books|t=nouns)=7/40
More general cases
),...|(),...,(
),...,()(
111
1
,...,11
2
ii
in
AAn
AAAPAAP
AAPAPn
Independence assumption
Independence assumption
• Two variables A and B are independent if– P(A,B)=P(A)*P(B)– P(A)=P(A|B)– P(B)=P(B|A)
• Two variables A and B are conditional independent given C if – P(A,B|C)=P(A|C) * P(B|C)– P(A|B,C)=P(A|C)– P(B|A,C)=P(B|C)
• Independence assumption is used to remove some conditional factors, which will reduce the number of parameters in a model.
PCFG parsers
))(|(
),...|(
),...,(),(
1
111
1
ii
i
ii
i
n
rlhsrP
rrrP
rrPSTP
It assumes each rule is independent of other rules
Problems of independence assumptions
• Lexical independence:– P(VPV, Vbought)
= P(VPV)*P(Vbought)
See Table 12.2 on M&S P418.
come take think want
VP->V 9.5% 2.6% 4.6% 5.7%
VP->V NP 1.1% 32.1% 0.2% 13.9%
VP->V PP 34.5% 3.1% 7.1% 0.3%
VP->V SBAR 6.6% 0.3% 73.0% 0.2%
Problems of independence assumptions (cont)
• Structural independence:– P(SNP VP, NPPron)
= P(SNP VP) * P(NPPron)
See Table 12.3 on M&S P420.
% as subj % as obj
NPPron 13.7% 2.1%
NPDet NN 5.6% 4.6%
NPNP SBAR 0.5% 2.6%
NPNP PP 5.6% 14.1%
Dealing with the problems
• Lexical rules:– P(VPV | V=come)– P(VPV | V=think)
• Adding context info:
is a function that groups
into equivalence classes.
)(),...,|( 11 iii rPrrrP
)),....,(|(),...,|( 1111 iiii rrfrPrrrP
f 1,..., ii rr
PCFG
))(|(
),...|(
),...,(),(
1
111
1
iii
ii
i
n
rlhsrP
rrrP
rrPSTP
It assumes each rule is independent of other rules
A lexicalized model
))(),(|(*)))((),(|)((
)),...,),(|(*)),...,|)((
),...,|)(,(
),...|(
),...,(),(
1
11111
111
111
1
iiiiiii
iiiiii
iii
i
ii
i
n
rhrlhsrPrmhrlhsrhP
lrlrrhrPlrlrrhP
lrlrrhrP
lrlrlrP
lrlrPSTP
An example
• he likes her
),Pr|(Pr*),Pr|(*
),|(*),|(*
),Pr|(Pr*),Pr|(*
),|Pr(*),|(*
),|(*),|(*
),|Pr(*),|(*
),|(*),|(*
),|(*),|(
),(
heronheronPheronherP
likesVlikesVPlikesVlikesP
heonheonPheonheP
herNPonNPPlikesNPherP
likesVPVNPVPPlikesVPlikesP
heNPonNPPlikesNPheP
likesSNPVPSPlikesSlikesP
likesTopSTopPToplikesP
STP
Head-head probability
)...)(...)((
)....)(...)((
),(
),,(
),(
),,(
),|(
1
21
1
12
1
12
12
wAwXC
wAwXC
wAC
wAwC
wAP
wAwP
wAwP
w
)...)(...)((
)...)(...)((),|(
wNPlikesXC
heNPlikesXClikesNPheP
w
Head-rule probability
))((
))((
))((
))((
))((
))((
),(
),,(
),|(
wAC
wAC
wAC
wAC
wAP
wAP
wAP
wAAP
wAAP
))((
)Pr)((),|Pr(
heNPC
onheNPCheNPonNPP
Collecting the counts
))((
))((),|(
)...)(...)((
)....)(...)((),|(
1
2112
wAC
wACwAAP
wAwXC
wAwXCwAwP
w
Remaining problems
• he likes her
• The Prob(T,S) is the same if the sentence is changed to “her likes he”.
),|Pr(*),|(*
),|(*),|(*
),|Pr(*),|(*
),|(*),|(
),(
herNPonNPPlikesNPherP
likesVPVNPVPPlikesVPlikesP
heNPonNPPlikesNPheP
likesSNPVPSPSlikesP
STP
Previous model
))(),(|(*)))((),(|)((
)),...,),(|(*)),...,|)((
),...,|)(,(
),...|(
),...,(),(
1
11111
111
111
1
iiiiiii
iiiiii
iii
i
ii
i
n
rhrlhsrPrmhrlhsrhP
lrlrrhrPlrlrrhP
lrlrrhrP
lrlrlrP
lrlrPSTP
A new model
)))((),(),(|(*)))((),(|)((
)),...,),(|(*)),...,|)((
),...,|)(,(
),...|(
),...,(),(
1
11111
111
111
1
iiiiiiii
iiiiii
iii
i
ii
i
n
rmlhsrhrlhsrPrmhrlhsrhP
lrlrrhrPlrlrrhP
lrlrrhrP
lrlrlrP
lrlrPSTP
New formula
• he likes her
),,|Pr(*),|(*
),,|(*),|(*
),,|Pr(*),|(*
),,|(*),|(
),(
VPherNPonNPPlikesNPherP
SlikesVPVNPVPPlikesVPlikesP
SheNPonNPPlikesNPheP
ToplikesSNPVPSPSlikesP
STP