learning and inference for hierarchically split probabilistic...
TRANSCRIPT
Learning and Inference for Hierarchically Split Probabilistic Context-Free GrammarsSlav Petrov and Dan Klein
University of California, Berkeley
Evolution of the DT tag during hierarchical splitting and merging. Shown are the top three words for each subcategory and their respective probability.
DTthe (0.50) a (0.24) The (0.08)
that (0.15) this (0.14) some (0.11)
this (0.39)that (0.28)That (0.11)
this (0.52)that (0.36)
another (0.04)
That (0.38)This (0.34)each (0.07)
some (0.20)all (0.19)
those (0.12)
some (0.37)all (0.29)
those (0.14)
these (0.27)both (0.21)Some (0.15)
the (0.54) a (0.25) The (0.09)
the (0.80)The (0.15)
a (0.01)
the (0.96)a (0.01)
The (0.01)
The (0.93)A(0.02)
No(0.01)
a (0.61)the (0.19)an (0.10)
a (0.75)an (0.12)the (0.03)
Learning Inference Results
Observed categories are too coarse:
Adaptive Grammar Refinement: Split each category in k subcategories and fit grammar with the EM algorithm.
Reference:Slav Petrov, Adam Pauls and Dan Klein,"Learning Structured Models for Phone Recognition", in EMNLP-CoNLL '07
Smoothing: Reduce overfitting by shrinking the productions of each subcategory towards their common base category.
Reference: Slav Petrov, Leon Barrett, Romain Thibaux and Dan Klein, "Learning accurate, compact and interpretable tree annotation", in ACL-COLING '06
Reference:Slav Petrov and Dan Klein,"Improved Inference for Unlexicalized Parsing", in NAACL-HLT '07
The most frequent three words in the subcategories of several part-of-speech tags.
VBZVBZ-0 gives sells takesVBZ-1 comes goes worksVBZ-2 includes owns isVBZ-3 puts provides takesVBZ-4 says adds SaysVBZ-5 believes means thinksVBZ-6 expects makes callsVBZ-7 plans expects wantsVBZ-8 is ’s getsVBZ-9 ’s is remainsVBZ-10 has ’s isVBZ-11 does Is Does
NNPNNP-0 Jr. Goldman INC.NNP-1 Bush Noriega PetersNNP-2 J. E. L.NNP-3 York Francisco StreetNNP-4 Inc Exchange CoNNP-5 Inc. Corp. Co.NNP-6 Stock Exchange YorkNNP-7 Corp. Inc. GroupNNP-8 Congress Japan IBMNNP-9 Friday September AugustNNP-10 Shearson D. FordNNP-11 U.S. Treasury SenateNNP-12 John Robert JamesNNP-13 Mr. Ms. PresidentNNP-14 Oct. Nov. Sept.NNP-15 New San Wall
JJSJJS-0 largest latest biggestJJS-1 least best worstJJS-2 most Most least
DTDT-0 the The aDT-1 A An AnotherDT-2 The No ThisDT-3 The Some TheseDT-4 all those someDT-5 some these bothDT-6 That This eachDT-7 this that eachDT-8 the The aDT-9 no any someDT-10 an a theDT-11 a this the
CDCD-0 1 50 100CD-1 8.50 15 1.2CD-2 8 10 20CD-3 1 30 31CD-4 1989 1990 1988CD-5 1988 1987 1990CD-6 two three fiveCD-7 one One ThreeCD-8 12 34 14CD-9 78 58 34CD-10 one two threeCD-11 million billion trillion
PRPPRP-0 It He IPRP-1 it he theyPRP-2 it them him
RBRRBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later
ININ-0 In With AfterIN-1 In For AtIN-2 in for onIN-3 of for onIN-4 from on withIN-5 at for byIN-6 by in withIN-7 for with onIN-8 If While AsIN-9 because if whileIN-10 whether if ThatIN-11 that like whetherIN-12 about over betweenIN-13 as de UpIN-14 than ago untilIN-15 out up down
RBRB-0 recently previously stillRB-1 here back nowRB-2 very highly relativelyRB-3 so too asRB-4 also now stillRB-5 however Now HoweverRB-6 much far enoughRB-7 even well thenRB-8 as about nearlyRB-9 only just almostRB-10 ago earlier laterRB-11 rather instead becauseRB-12 back close aheadRB-13 up down offRB-14 not Not maybeRB-15 n’t not also
General technique for learning refined, structured models when only the trace of a complex underlying process is observed.Learns compact and accurate grammars from a treebank without additional human input.Gives best known parsing accuracy on a variety of languages, while being extremely efficient.Interactive demo and download at http://nlp.cs.berkeley.edu
Extensions
Reference:Percy Liang, Slav Petrov, Michael Jordan and Dan Klein,"The Infinite PCFG using Hierarchical Dirichlet Processes", in EMNLP-CoNLL '07
Hierarchical Dirichlet Processes as a nonparametric Bayesian alternative to split and merge:
Merging: Roll back the least useful splits in order to allocate complexity only where needed.
Parsing:Grammar Extraction:S
NP
PRP
She
VP
VBD
heard
NP
DT
the
NN
noise
.
.
Grammar
S→ NP VP . 1.0
NP→ PRP 0.5
NP→ DT NN 0.5
. . .
Lexicon
PRP→ She 1.0
DT→ the 1.0
. . .
11%9%
6%
NP PP DT NN PRP
All NPs
9% 9%
21%
NP PP DT NN PRP
NPs under S
7%4%
23%
NP PP DT NN PRP
NPs under VP
Hierarchical Splitting:
Repeatedly split each annotation symbol in two and retrain the grammar, initializing with the previous grammar.
S-x
NP-x
PRP-x
She
VP-x
VBD-x
heard
NP-x
DT-x
the
NN-x
noise
.-x
.
NP
NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8
L(
1 2
1 2
1 2 1 2
VP
NP
DT NN
)1 2
1
1 2 1 2
VP
NP
DT NN
L( )
With split at NP node With split at NP node reversed
Parsing accuracy for different models
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100Total number of grammar symbols
Par
sing
acc
urac
y (F
1)
50% Merging and Smoothing 50% MergingHierarchical TrainingFlat Training
Parse Efficiency: Rapidly pre-parse the sentence in a hierarchical coarse-to-fine fashion pruning away unlikely chart items.
Parse Selection: Use a variational approximation to select the tree with the maximum number of expected correct rules (since computing the best parse tree is intractable and selecting the best derivation is a poor approximation).
β
φBz
φTz
φEz
z ∞
z1
z2
x2
z3
x3T
Parameters Trees
Starta d
Enda d a d
d ae d
b c b c b c
Automatic refinement of acoustic models learns phone-internal structure as well as phone-external context:
Learned grammars are compact and interpretable:
S
NP
DT
The
NNP
Golden
NNP
Gate
NN
bridge
VP
VBZ
is
ADJP
JJ
red
.
.
Compute grammars of intermediatecomplexity by projecting the mostrefined grammar.
The Golden Gatebridge is red.
G3
G2
G1
G0
G1
G2
G3
G4
G5
G6
Learning
X-Bar=G0
G=
Projection π
i
π0(G)
π1(G)
π2(G)
π3(G)
π4(G)
π5(G)
G
......
...... VP QP
... NP1VP1 ...
NP
... VP4 NP4...
VP2 NP2
... VP6 VP7 NP4NP3
VP3 NP3
G0
G1
G2
G3
S
NP
PRP
He
VP
VBD
was
ADJP
right
.
.
Parse Tree
Parse Derivations
S-2
NP-17
PRP-2
He
VP-23
VBD-12
was
ADJP-11
right
.
.
S-1
NP-10
PRP-2
He
VP-11
VBD-16
was
ADJP-14
right
.
.
≤ 40 words allParser F1 F1
ENGLISHCharniak & Johnson ’05 90.1 89.6This Work 90.6 90.1
GERMANDubey ’05 76.3 -This Work 80.8 80.1
CHINESEChiang & Bikel ’02 80.0 76.6This Work 86.3 83.4
Bracket posterior probabilities during
coarse-to-fine decoding
Influential
members of
the
House
Ways
and
Means
Committee
introduced
legislation
that
would
restrict
howthe
news&l
bailout
agency
can
raise
capital ;
creating
another
potential
obstacle tothe
government ‘s
sale of
sick
thrifts .