unambiguity regularization for unsupervised learning of probabilistic grammars

24
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei Tu Vasant Honavar Departments of Statistics and Computer Science University of California, Los Angeles Department of Computer Science Iowa State University

Upload: twyla

Post on 22-Mar-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars. Overview. Unambiguity Regularization A novel approach for unsupervised natural language grammar learning Based on the observation that natural language is remarkably unambiguous - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Kewei Tu Vasant HonavarDepartments of Statistics and

Computer ScienceUniversity of California, Los

Angeles

Department of Computer ScienceIowa State University

Page 2: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

2

Overview Unambiguity Regularization

A novel approach for unsupervised natural language grammar learning

Based on the observation that natural language is remarkably unambiguous

Includes standard EM, Viterbi EM and so-called softmax-EM as special cases

Page 3: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

3

Outline Background Motivation Formulation and algorithms Experimental results

Page 4: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

4

Background Unsupervised learning of probabilistic grammars

Learning a probabilistic grammar from unannotated sentences

A square is above the triangle.A triangle rolls.The square rolls.A triangle is above the square.A circle touches a square.……

S ® NP VPNP ® Det NVP ® Vt NP (0.3) | Vi PP (0.2) | rolls (0.2) | bounces(0.1)……

Training Corpus Probabilistic GrammarInduction

Page 5: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

5

Background Unsupervised learning of probabilistic grammars

Typically done by assuming a fixed set of grammar rules and optimizing the rule probabilities

Various prior information can be incorporated into the objective function to improve learning e.g., rule sparsity, symbol correlation, etc.

Our approach: Unambiguity regularization Utilizes a novel type of prior information: the

unambiguity of natural languages

Page 6: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

6

The Ambiguity of Natural Language Ambiguities are ubiquitous in natural languages

NL sentences can often be parsed in more than one way

Example [Manning and Schutze (1999)]

The post office will hold out discounts and service concessions as incentives.

Noun? Verb? Modifies “hold out” or “concessions”?

Given a complete CNF grammar of 26 nonterminals, the total number of possible parses is              .

Page 7: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

7

The Unambiguity of Natural Language Although each NL sentence has a large number of

possible parses, the probability mass is concentrated on a very small number of parses

Page 8: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

8

Comparison with non-NL grammars

NL Grammar

Random Grammar

Max-Likelihood Grammar Learned by EM

Page 9: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

9

Incorporate Unambiguity Bias into Learning How to measure the ambiguity

Entropy of the parse given the sentence and the grammar

How to add it into the objective function Use a prior distribution that prefers low ambiguity

Intractable Learning

Page 10: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

10

Incorporate Unambiguity Bias into Learning How to measure the ambiguity

Entropy of the parse given the sentence and the grammar

How to add it into the objective function Use posterior regularization [Ganchev et al. (2010)]

An auxiliary distribution

Log posterior of the grammar given the training sentences

KL-divergence between q and the posterior distribution of the parses

Entropy of the parses based on qA constant that

controls the strength of regularization

Page 11: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

11

Optimization Coordinate Ascent

Fix    and optimize    Exactly the M-step of EM

Fix    and optimize    Depends on the value of    

When σ = 0

Exactly the E-step of EM

p q

Page 12: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

12

Optimization Coordinate Ascent

Fix    and optimize    Exactly the M-step of EM

Fix    and optimize    Depends on the value of    

When σ ≥ 1

Exactly the E-step of Viterbi EM

p q

Page 13: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

13

Optimization Coordinate Ascent

Fix    and optimize    Exactly the M-step of EM

Fix    and optimize    Depends on the value of    

When 0 < σ < 1

Softmax of the posterior distribution of the parses

p q

Softmax-EM

Page 14: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

14

Softmax-EM Implementation

Simply exponentiate all the grammar rule probabilities before the E-step of EM

Does not increase the computational complexity of the E-step

Page 15: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

15

The value of     Choosing a fixed value of    

Too small: not enough to induce unambiguity Too large: the learned grammar might be excessively

unambiguous Annealing

Start with a large value of     Strongly push the learner away from the highly

ambiguous initial grammar Gradually reduce the value of    

Avoid inducing excessive unambiguity

Page 16: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

16

Mean-field Variational Inference So far: maximum a posteriori estimation (MAP) Variational inference approximates the posterior of the

grammar Leads to more accurate predictions than MAP Can accommodate prior distributions that MAP cannot

We have also derived a mean-field variational inference version of unambiguity regularization Very similar to the derivation of the MAP version

Page 17: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

17

Experiments Unsupervised learning of the dependency model with

valence (DMV) [Klein and Manning, 2004] Data: WSJ (sect 2-21 for training, sect 23 for testing) Trained on the gold-standard POS tags of the

sentences of length ≤ 10 with punctuation stripped off

Page 18: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

18

Experiments with Different Values of    

Viterbi EM leads to high accuracy on short sentences Softmax-EM (              ) leads to the best accuracy over

all sentences

Page 19: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

19

Experiments with Annealing and Prior Annealing the value of     from 1 to 0 in 100 iterations Adding Dirichlet priors (              ) over rule probabilities

using variational inference Compared with the best results previously published for

learning DMV

Page 20: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

20

Experiments on Extended Models Applying unambiguity regularization on E-DMV, an

extension of DMV [Gillenwater et al., 2010]

Compared with the best results previously published for learning extended dependency models

Page 21: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

21

Experiments on More Languages Examining the effect of unambiguity regularization with the

DMV model on the corpora of eight additional languages.

Unambiguity regularization improves learning on eight out of the nine languages, but with different optimal values of    .

Annealing the value of     leads to better average performance than using any fixed value of    .

Page 22: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

22

Related Work Some previous work also manipulates the entropy of

hidden variables Deterministic annealing [Rose, 1998; Smith and Eisner, 2004] Minimum entropy regularization [Grandvalet and Bengio, 2005;

Smith and Eisner, 2007] Unambiguity regularization differs from them in

Motivation: the unambiguity of NL grammars Algorithm:

a simple extension of EM exponent >1 in the E-step decreasing the exponent in annealing

Page 23: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

23

Conclusion Unambiguity regularization

Motivation The unambiguity of natural languages

Formulation Regularize the entropy of the parses of training sentences

Algorithms Standard EM, Viterbi EM, softmax-EM Annealing the value of    

Experiments Unambiguity regularization is beneficial to learning By incorporating annealing, it outperforms the current

state-of-the-art

Page 24: Unambiguity  Regularization for Unsupervised Learning of Probabilistic Grammars

Thank you!

Q&A