learning and inference for hierarchically split probabilistic...

Learning and Inference for Hierarchically Split Probabilistic Context-Free GrammarsSlav Petrov and Dan Klein

University of California, Berkeley

Evolution of the DT tag during hierarchical splitting and merging. Shown are the top three words for each subcategory and their respective probability.

DTthe (0.50) a (0.24) The (0.08)

that (0.15) this (0.14) some (0.11)

this (0.39)that (0.28)That (0.11)

this (0.52)that (0.36)

another (0.04)

That (0.38)This (0.34)each (0.07)

some (0.20)all (0.19)

those (0.12)

some (0.37)all (0.29)

those (0.14)

these (0.27)both (0.21)Some (0.15)

the (0.54) a (0.25) The (0.09)

the (0.80)The (0.15)

a (0.01)

the (0.96)a (0.01)

The (0.01)

The (0.93)A(0.02)

No(0.01)

a (0.61)the (0.19)an (0.10)

a (0.75)an (0.12)the (0.03)

Learning Inference Results

Observed categories are too coarse:

Adaptive Grammar Refinement: Split each category in k subcategories and fit grammar with the EM algorithm.

Reference:Slav Petrov, Adam Pauls and Dan Klein,"Learning Structured Models for Phone Recognition", in EMNLP-CoNLL '07

Smoothing: Reduce overfitting by shrinking the productions of each subcategory towards their common base category.

Reference: Slav Petrov, Leon Barrett, Romain Thibaux and Dan Klein, "Learning accurate, compact and interpretable tree annotation", in ACL-COLING '06

Reference:Slav Petrov and Dan Klein,"Improved Inference for Unlexicalized Parsing", in NAACL-HLT '07

The most frequent three words in the subcategories of several part-of-speech tags.

VBZVBZ-0 gives sells takesVBZ-1 comes goes worksVBZ-2 includes owns isVBZ-3 puts provides takesVBZ-4 says adds SaysVBZ-5 believes means thinksVBZ-6 expects makes callsVBZ-7 plans expects wantsVBZ-8 is ’s getsVBZ-9 ’s is remainsVBZ-10 has ’s isVBZ-11 does Is Does

NNPNNP-0 Jr. Goldman INC.NNP-1 Bush Noriega PetersNNP-2 J. E. L.NNP-3 York Francisco StreetNNP-4 Inc Exchange CoNNP-5 Inc. Corp. Co.NNP-6 Stock Exchange YorkNNP-7 Corp. Inc. GroupNNP-8 Congress Japan IBMNNP-9 Friday September AugustNNP-10 Shearson D. FordNNP-11 U.S. Treasury SenateNNP-12 John Robert JamesNNP-13 Mr. Ms. PresidentNNP-14 Oct. Nov. Sept.NNP-15 New San Wall

JJSJJS-0 largest latest biggestJJS-1 least best worstJJS-2 most Most least

DTDT-0 the The aDT-1 A An AnotherDT-2 The No ThisDT-3 The Some TheseDT-4 all those someDT-5 some these bothDT-6 That This eachDT-7 this that eachDT-8 the The aDT-9 no any someDT-10 an a theDT-11 a this the

CDCD-0 1 50 100CD-1 8.50 15 1.2CD-2 8 10 20CD-3 1 30 31CD-4 1989 1990 1988CD-5 1988 1987 1990CD-6 two three fiveCD-7 one One ThreeCD-8 12 34 14CD-9 78 58 34CD-10 one two threeCD-11 million billion trillion

PRPPRP-0 It He IPRP-1 it he theyPRP-2 it them him

RBRRBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later

ININ-0 In With AfterIN-1 In For AtIN-2 in for onIN-3 of for onIN-4 from on withIN-5 at for byIN-6 by in withIN-7 for with onIN-8 If While AsIN-9 because if whileIN-10 whether if ThatIN-11 that like whetherIN-12 about over betweenIN-13 as de UpIN-14 than ago untilIN-15 out up down

RBRB-0 recently previously stillRB-1 here back nowRB-2 very highly relativelyRB-3 so too asRB-4 also now stillRB-5 however Now HoweverRB-6 much far enoughRB-7 even well thenRB-8 as about nearlyRB-9 only just almostRB-10 ago earlier laterRB-11 rather instead becauseRB-12 back close aheadRB-13 up down offRB-14 not Not maybeRB-15 n’t not also

General technique for learning refined, structured models when only the trace of a complex underlying process is observed.Learns compact and accurate grammars from a treebank without additional human input.Gives best known parsing accuracy on a variety of languages, while being extremely efficient.Interactive demo and download at http://nlp.cs.berkeley.edu

Extensions

Reference:Percy Liang, Slav Petrov, Michael Jordan and Dan Klein,"The Infinite PCFG using Hierarchical Dirichlet Processes", in EMNLP-CoNLL '07

Hierarchical Dirichlet Processes as a nonparametric Bayesian alternative to split and merge:

Merging: Roll back the least useful splits in order to allocate complexity only where needed.

Parsing:Grammar Extraction:S

NP

PRP

She

VP

VBD

heard

NP

DT

the

NN

noise

.

.

Grammar

S→ NP VP . 1.0

NP→ PRP 0.5

NP→ DT NN 0.5

. . .

Lexicon

PRP→ She 1.0

DT→ the 1.0

. . .

11%9%

6%

NP PP DT NN PRP

All NPs

9% 9%

21%

NP PP DT NN PRP

NPs under S

7%4%

23%

NP PP DT NN PRP

NPs under VP

Hierarchical Splitting:

Repeatedly split each annotation symbol in two and retrain the grammar, initializing with the previous grammar.

S-x

NP-x

PRP-x

She

VP-x

VBD-x

heard

NP-x

DT-x

the

NN-x

noise

.-x

.

NP

NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8

L(

1 2

1 2

1 2 1 2

VP

NP

DT NN

)1 2

1

1 2 1 2

VP

NP

DT NN

L( )

With split at NP node With split at NP node reversed

Parsing accuracy for different models

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100Total number of grammar symbols

Par

sing

acc

urac

y (F

1)

50% Merging and Smoothing 50% MergingHierarchical TrainingFlat Training

Parse Efficiency: Rapidly pre-parse the sentence in a hierarchical coarse-to-fine fashion pruning away unlikely chart items.

Parse Selection: Use a variational approximation to select the tree with the maximum number of expected correct rules (since computing the best parse tree is intractable and selecting the best derivation is a poor approximation).

β

φBz

φTz

φEz

z ∞

z1

z2

x2

z3

x3T

Parameters Trees

Starta d

Enda d a d

d ae d

b c b c b c

Automatic refinement of acoustic models learns phone-internal structure as well as phone-external context:

Learned grammars are compact and interpretable:

S

NP

DT

The

NNP

Golden

NNP

Gate

NN

bridge

VP

VBZ

is

ADJP

JJ

red

.

.

Compute grammars of intermediatecomplexity by projecting the mostrefined grammar.

The Golden Gatebridge is red.

G3

G2

G1

G0

G1

G2

G3

G4

G5

G6

Learning

X-Bar=G0

G=

Projection π

i

π0(G)

π1(G)

π2(G)

π3(G)

π4(G)

π5(G)

G

......

...... VP QP

... NP1VP1 ...

NP

... VP4 NP4...

VP2 NP2

... VP6 VP7 NP4NP3

VP3 NP3

G0

G1

G2

G3

S

NP

PRP

He

VP

VBD

was

ADJP

right

.

.

Parse Tree

Parse Derivations

S-2

NP-17

PRP-2

He

VP-23

VBD-12

was

ADJP-11

right

.

.

S-1

NP-10

PRP-2

He

VP-11

VBD-16

was

ADJP-14

right

.

.

≤ 40 words allParser F1 F1

ENGLISHCharniak & Johnson ’05 90.1 89.6This Work 90.6 90.1

GERMANDubey ’05 76.3 -This Work 80.8 80.1

CHINESEChiang & Bikel ’02 80.0 76.6This Work 86.3 83.4

Bracket posterior probabilities during

coarse-to-fine decoding

Influential

members of

the

House

Ways

and

Means

Committee

introduced

legislation

that

would

restrict

howthe

news&l

bailout

agency

can

raise

capital ;

creating

another

potential

obstacle tothe

government ‘s

sale of

sick

thrifts .

learning and inference for hierarchically split probabilistic...

Documents