self-training with products of latent variable grammars zhongqiang huang, mary harper, and slav...

30
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Upload: trevion-lief

Post on 11-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Self-training with Products of Latent Variable Grammars

Zhongqiang Huang, Mary Harper, and Slav Petrov

Page 2: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

OverviewMotivation and Prior Related Research

Experimental SetupResultsAnalysisConclusions

2

Page 3: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Parse Tree Sentence Parameters

...

Derivations

PCFG-LA Parser[Matsuzaki et. al ’05] [Petrov et. al ’06] [Petrov & Klein’07]

3

Page 4: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

PCFG-LA Parser

NP

NP1 NP2

Hierarchical splitting (& merging)

NP1 NP2 NP3 NP4

NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8

Split to 2

Split to 4

Split to 8

Original Node

IncreasedModel

Complexity

n-th grammar: grammar trained after n-th split-merge rounds

Typical learning curve

Grammar Order Selection

Use development set

Page 5: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Max-Rule Decoding (Single Grammar)

S

NP

VP

[Goodman ’98, Matsuzaki et al. ’05, Petrov & Klein ’07]

6

Page 6: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Variability

7 [Petrov, ’10]

Page 7: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

...

Max-Rule Decoding (Multiple Grammars)

[Petrov, ’10]

Treebank

8

Page 8: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Product Model Results

9 [Petrov, ’10]

Page 9: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Motivation for Self-Training

10

Page 10: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Self-training (ST)

HandLabele

d

UnlabeledData

Train

LabelAutomatically Labeled

Data

Train

Select with dev

11

Page 11: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Self-Training Curve

13

Page 12: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

WSJ Self-Training Results

F score

14 [Huang & Harper, ’09]

Page 13: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Self-Trained Grammar Variability

Self-trained Round 7

Self-trained Round 6

16

Page 14: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Summary Two issues: Variability & Over-fitting

Product model Makes use of variability Over-fitting remains in individual grammars

Self-training Alleviates over-fitting Variability remains in individual grammars

Next step: combine self-training with product models

17

Page 15: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Experimental Setup Two genres:

WSJ: Sections 2-21 for training, 22 for dev, 23 for test, 176.9K sentences per self-trained grammar

Broadcast News: WSJ+80% of BN for training, 10% for dev, 10% for test (see paper),

Training Scenarios: train 10 models with different seeds and combine using Max-Rule Decoding Regular: treebank training with up to 7 split-merge

iterations Self-Training: three methods with up to 7 split-

merge iterations18

Page 16: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

ST-Reg

LabelAutomatically Labeled

Data

UnlabeledData

HandLabele

d

Train

Train ⁞

Multiple Grammars?

ProductTrain

Select with dev set

19

Single automatically labeled set by round 6 product

Page 17: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

ST-Prod

LabelAutomatically Labeled

Data

UnlabeledData

HandLabele

d

Train⁞

Product

Train ⁞

Use more data?

Product

20

Single automatically labeled set by round 6 product

Page 18: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

ST-Prod-Mult

HandLabele

d

Train⁞

Label

Product

Label

Product

Product

21

10 different automaticallylabeled sets by round 6 product

Page 19: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

24

Page 20: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

A Closer Look at Regular Results

25

Page 21: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

A Closer Look at Regular Results

26

Page 22: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

A Closer Look at Regular Results

27

Page 23: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

A Closer Look at Self-Training Results

28

Page 24: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

A Closer Look at Self-Training Results

29

Page 25: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

A Closer Look at Self-Training Results

30

Page 26: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Analysis of Rule Variance We measure the average empirical variance

of the log posterior probabilities of the rules among the learned grammars over a held-out set S to get at the diversity among the grammars:

31

Page 27: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Analysis of Rule Variance

32

Page 28: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

English Test Set Results (WSJ 23)

Single Parser Reranker Product Parser Combination

[Ch

arn

iak

’00]

Petr

ov e

t al.

’0

6]

[Carr

era

s e

t al.

’08]

[Hu

an

g &

Harp

er

’08]

Th

is W

ork

[Petr

ov ’

10]

Th

is W

ork

[Ch

arn

iak &

Joh

nson

’05]

[Hu

an

g ’

08]

[McC

losky e

t al.

’06]

[Sag

ae &

Lavie

’06]

[Fossu

m &

Kn

igh

t ’0

9]

[Zh

an

g e

t al.

’09]

33

Page 29: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Broadcast News

34

Page 30: Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Conclusions Very high parse accuracies can be

achieved by combining self-training and product models on newswire and broadcast news parsing tasks.

Two important factors:1. Accuracy of the model used to parse the

unlabeled data 2. Diversity of the individual grammars

35