university of albertanov 30, 2006 inversion transduction grammar with linguistic constraints colin...

83
University of Alberta Nov 30, 2006 Inversion Transduction Grammar with Linguistic Constraints Colin Cherry University of Alberta

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

University of AlbertaNov 30, 2006

Inversion Transduction Grammar with Linguistic Constraints

Colin Cherry

University of Alberta

2 University of AlbertaNov 30, 2006

Edmonton Weather (Tuesday)

3 University of AlbertaNov 30, 2006

Outline

• Bitext and Bitext Parsing

• Inversion Transduction Grammar (ITG)

• ITG with Linguistic Constraints

• Discriminative ITG with Linguistic Features

• Other Projects

4 University of AlbertaNov 30, 2006

Statistical Machine Translation• Input:

– Source language sentence E

• Goal: – Produce a well-formed target language sentence F

with same meaning as E

• Process:– Decoding: search for an operation sequence O that

transforms E into F

– Weights on individual operations are determined empirically from examples of translation

5 University of AlbertaNov 30, 2006

Bitext

• Valuable resource for training and testing statistical machine translation systems

• Large-scale examples of translation

• Needs analysis to determine small-scale operations that generalize to unseen sentences

Text inEnglish

Same text,in French

6 University of AlbertaNov 30, 2006

Word Alignment

• Given a sentence and its translation, find the word-to-word connections

the minister in charge of the Canadian Wheat Board

le ministre chargé de la Commission Canadienne du blé

7 University of AlbertaNov 30, 2006

Word Alignment

• Given a sentence and its translation, find the word-to-word connections

• Link: a single word-to-word connection

the minister in charge of the Canadian Wheat Board

le ministre chargé de la Commission Canadienne du blé

8 University of AlbertaNov 30, 2006

Given a Word Alignment

• Extract bilingual phrase pairs for phrasal SMT (Koehn et al. 2003)

• Add in a parse tree and:– Extract treelet pairs for dependency translation (Quirk et

al. 2005)

– Extract rules for a tree transducer (Galley et al. 2004)

• Other fun things:– Train monolingual paraphrasers (Quirk et al. 2004, Callison-

Burch et al. 2005)

9 University of AlbertaNov 30, 2006

Bitext Parsing

• Assume a context-free grammar generates two languages at once

• Like joint models, but position of words in both languages is controlled by grammar

10 University of AlbertaNov 30, 2006

Monolingual Parsing

always verbs the adjective nounhe

Adv V Det

Adj N

NP

V NP

NP VP

S Non-terminals

Terminals

ProductionNPAdj N

11 University of AlbertaNov 30, 2006

Another viewS

VPNP

VNP NP

always verbshe the adjective noun

VP V NP

S NP VP

12 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

S

Eng

lish

French

13 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

VP

NP

Eng

lish

French

14 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

NP

NP

VEng

lish

French

15 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

NP

NP

Adv

V

Eng

lish

French

16 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

NP

NP

Adv

V

Det

Eng

lish

French

17 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

Adj

NP

Adv

V

Det

N

Eng

lish

French

18 University of AlbertaNov 30, 2006

Bitext Parsing is in 2D

Adj

NP

Adv

V

Det

N

il verbe toujours le nom adjectif

he

always

verbs

the

adjective

noun

19 University of AlbertaNov 30, 2006

Why Bitext Parsing?• Established polynomial algorithms

• Flexible framework, easy to add info:– Parse given an alignment– Align given a parse (this work)

• Discoveries can be ported to parser-based decoders (Zens et al. 2004, Melamed 2004)

• Advances in parsing can be ported to word alignment

20 University of AlbertaNov 30, 2006

Outline

• Bitext and Bitext Parsing

• Inversion Transduction Grammar (ITG)

• ITG with Linguistic Constraints

• Discriminative ITG with Linguistic Features

• Other Projects

21 University of AlbertaNov 30, 2006

Inversion Transduction Grammar

• Introduced in by Wu (1997)

– Transduction: • N noun / nom

– Inversion:• NP [Det NP]

• NP <Adj N>

N

nom

noun

NP

Det

AdjN

Straight

Inverted

22 University of AlbertaNov 30, 2006

Binary Bracketing

A[AA]

A<AA>

Ae/f

• No linguistic meaning to “A”

23 University of AlbertaNov 30, 2006

Tree visualization

24 University of AlbertaNov 30, 2006

Pros and Cons of Bracketing

• Pros:– Language independent– Straight-forward and fast– Symbols are minimally restrictive

• Cons:– Grammar is meaningless– ITG Constraint

25 University of AlbertaNov 30, 2006

ITG Constraint

12 are acceptable to the commission Mr Burton fully or in part

12 are acceptable to the commissionMr Burton fully or in part

26 University of AlbertaNov 30, 2006

Outline

• Bitext and Bitext Parsing

• Inversion Transduction Grammar (ITG)

• ITG with Linguistic Constraints

• Discriminative ITG with Linguistic Features

• Other Projects

27 University of AlbertaNov 30, 2006

Some questions

• Those ITG constraints are kind of scary. How bad are they? Do they ever help?

• Can we inject some linguistics into this otherwise purely syntactic process?

– Linguistic grammar would limit trees that can be built - and therefore limit alignments

28 University of AlbertaNov 30, 2006

Alignment Spaces

• Set of feasible alignments for a sentence pair

• Described by how links interact– If links don’t interact, problem loses its structure

• Should encourage competition between links (Guidance)

• Should not eliminate correct alignments (Expressiveness)

29 University of AlbertaNov 30, 2006

ITG Space

• Rules out “inside-out” alignments

• Limits how concepts can be re-ordered during translation

30 University of AlbertaNov 30, 2006

Permutation Space

• One-to-one: each word in at most one link

• Allows any permutation of concepts

• Reduces to weighted maximum matching if each link can be scored independently

the tax causes unrest

l’ impôt cause le malaise

31 University of AlbertaNov 30, 2006

Linguistic source: Dependencies

• Tree structure defines dependencies between words

• Subtrees define contiguous phrases

the minister in charge of the Canadian Wheat Board

32 University of AlbertaNov 30, 2006

Linguistic source: Dependencies

• Tree structure defines dependencies between words

• Subtrees define contiguous phrases

the minister in charge of the Canadian Wheat Board

33 University of AlbertaNov 30, 2006

Phrasal Cohesion

• Syntactic phrases in tree tend to stay together after translation (Fox 2002)

• We can use this idea to constrain an alignment given an English dependency tree

• Shown to improve alignment quality (Lin and Cherry 2003)

34 University of AlbertaNov 30, 2006

Example

the tax causes unrest

l’ impôt cause le malaise

35 University of AlbertaNov 30, 2006

Example

the tax causes unrest

l’ impôt cause le malaise

We can rule out the link, even with no one-to-one violation

36 University of AlbertaNov 30, 2006

ITG & Dependency• Both limit movement with phrasal cohesion

– ITG: Cohesive in some binary tree

– Dep: Cohesive in provided dependency tree

• Not subspaces of each other

the big red dog the dog ate it

Dep & ITG x Dep x & ITG

37 University of AlbertaNov 30, 2006

D-ITG Space

• Force ITG to maintain phrasal cohesion with a provided dependency tree

• Intersects ITG and Dependency spaces

• Adds linguistic dependency tree to ITG parsing

38 University of AlbertaNov 30, 2006

Chart Modification Solution

• Eliminate structures that allow tax to invert away from the

the tax causes unrest the tax causes unrest

the tax causes unrest

39 University of AlbertaNov 30, 2006

Effect on Parser

the

tax

causes

unrest

l’ impôt cause le malaise

A x

A

A

40 University of AlbertaNov 30, 2006

Effect on Parser

the

tax

causes

unrest

l’ impôt cause le malaise

A

A

41 University of AlbertaNov 30, 2006

Continuum of constraints

Unconstrained

Permutation ITG D-ITG

42 University of AlbertaNov 30, 2006

Experimental Setup

• English-French Parliamentary debates

• 500 sentence labeled test set – (Och and Ney, 2003)

• Dependency parses from Minipar

43 University of AlbertaNov 30, 2006

Guidance Test

• Does the space stop incorrect alignments?

• Use a weighted link score built from:– Bilingual correlations between words– Relative position of tokens

• Maximize summed link scores in all spaces, check alignment error rate– AER: Combined precision and recall, lower is better

44 University of AlbertaNov 30, 2006

Guidance Results

02468

101214161820

Alignment Error Rate

PermutationITGD-ITG

45 University of AlbertaNov 30, 2006

Expressiveness Test

• Given a strong model, does the space hold us back?

• Use a cooked link score from the gold standard:– Only correct links are given positive scores– Best space is unconstrained space

• Maximize summed link scores in all spaces, check recall

46 University of AlbertaNov 30, 2006

Expressiveness Results

47 University of AlbertaNov 30, 2006

Contributions

• Algorithmic:– Method to inject ITG with linguistic constraints

• Experimental:– ITG constraints provide guidance, with virtually no

loss in expressiveness (French-English)

– Dependency cohesion constraints provide greater guidance, at the cost of some expressiveness

48 University of AlbertaNov 30, 2006

Outline

• Bitext and Bitext Parsing

• Inversion Transduction Grammar (ITG)

• ITG with Linguistic Constraints

• Discriminative ITG with Linguistic Features

• Other Projects

49 University of AlbertaNov 30, 2006

Remaining Problems

• Dependency cohesion stops correct links:– Parse errors, Paraphrase, Exceptions– Would like a soft constraint

• I’m not doing much learning 2 competitive linking with an ITG search

50 University of AlbertaNov 30, 2006

Soft Constraint• Invalid spans need not be disallowed

– Instead parser could incur a penalty

• Easy to incorporate penalty into DP

the

tax

causes

unrest

l’ impôt cause le malaise

A -5

A

A

51 University of AlbertaNov 30, 2006

ITG Learning

• Zhang and Gildea 2004, 2005, 2006…• Expectation Maximization to parameterize a

stochastic grammar unsupervised– Driven by expensive 2D inside-outside– Not doing much better than I am with 2

• Meanwhile, EMNLP’05 is happening– Moore 2005, Taskar et al. 2005– Suddenly it’s okay to use some training data

52 University of AlbertaNov 30, 2006

Discriminative matching (Taskar et al. 05)

causes

cause

?2 0.767DIST 0.050LCSR 0.833HMM 0.0

= 47.2

Link Score

Max matching finds alignment that maximizes the sum of link scores

Entire alignment y can be given feature vector (y) according to features of links in y

Features Learned Weights

53 University of AlbertaNov 30, 2006

Learning objective

• Find weights w, such that for each example i:

• Can formulate as constrained optimization problem, do max margin training

• Problem: Exponential number of wrong answers

FeaturesLearned Weights Structured Distance

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

54 University of AlbertaNov 30, 2006

SVM Struct (Tsochantaridis et al. 2004)

Constrainedoptimization

wSearch for

most violated

Emptyconstraints

Accumulated constraints

Theory of constraint generation in constrained optimization guarantees convergence

55 University of AlbertaNov 30, 2006

Similarities to Averaged Perceptron

• Online method driven by comparisons of current output to correct answer

• But– Allows a notion of structural distance– Returns a max margin solution (with slacks) at

each step– Remembers all of its past mistakes

56 University of AlbertaNov 30, 2006

SVM-ITG• Can learn ITG parameters discriminatively

• Link productions Ae/f are scored as in discriminative matching

• Non-terminal productions A[AA] | <AA> are scored with two features:– Is it inverted?– Does it cover a span that would usually be illegal?

causes

cause

2 0.767DIST 0.050LCSR 0.833HMM 0.0

= 47.2Acauses / cause :

57 University of AlbertaNov 30, 2006

Experimental Setup

• Identical to Taskar et al.– 100 training– 37 development– 347 test

• Same unsupervised text as before to derive features – 50k Hansards data

58 University of AlbertaNov 30, 2006

Results: Bipartite matching SVM (Permutation): SVM weights with hard constraint (D-ITG)

02468

10121416

AER 1-Prec 1-Rec

59 University of AlbertaNov 30, 2006

Results: Bipartite matching SVM : SVM weights with hard constraint : ITG SVM with soft cohesion feature

02468

10121416

AER 1-Prec 1-Rec

60 University of AlbertaNov 30, 2006

Contributions

• Algorithmic:– Discriminative learning method for ITGs

• Experimental:– Value of hard constraints is reduced in the presence

of a strong link score– Integrating constraint as a feature during training can

recover value of constraints, improve AER & recall

61 University of AlbertaNov 30, 2006

Other Projects

• Applying techniques from SMT to new domains:– Unsupervised pronoun resolution

• Discriminative Structured Learning:– Discriminative parsing

62 University of AlbertaNov 30, 2006

Unsupervised Pronoun ResolutionCherry and Bergsma, CoNLL’05

• The president entered the arena with his family.

• Input: – A pronoun in context, and a list of candidates

“his family”, {arena, president}

• Output: The correct candidate - president

• Big Idea:– Formulate a generative model, where a candidate

generates the pronoun and context, run EM– Similar to IBM-1: Align pronouns to candidates

63 University of AlbertaNov 30, 2006

Pronoun Resolution: Innovations

• Used linguistics to limit candidate list:– Binding theory, known noun genders

• Used unambiguous cases to initialize EM

• Re-weighted component models discriminatively with maximum entropy

• End result: – Within 5% of a supervised system, with re-weighted

model matching supervised performance

64 University of AlbertaNov 30, 2006

Discriminative ParsingWang, Cherry, Lizotte and Schuurmans, CoNLL’06

• Input: Segmented Chinese string

• Output: Dependency parse tree

• Big Idea: – Score each link independently, with SVM weighting

features on links (MacDonald 2005), but generalize without Part of Speech tags

– Learn a weight for every word-pair seen in training

the tax causes unrest

65 University of AlbertaNov 30, 2006

Parsing Innovations

• To promote generalization:– Altered “large margin” portion of SVM objective so

semantically similar word pairs have similar weights

• Tried two constraint types:– Local: Link scores constrained so links present in gold

standard score higher than those absent– Global: SVM Struct-style constraint generation

66 University of AlbertaNov 30, 2006

Others in brief

• Dependency treelet decoder (here)

• Sequence tagging:– Biomedical Term recognition

• Highlight gene names, proteins in medical texts

– Character-based Syllabification• Find syllable breaks in written words

67 University of AlbertaNov 30, 2006

Outline

• Bitext and Bitext Parsing

• Inversion Transduction Grammar (ITG)

• ITG with Linguistic Constraints

• Discriminative ITG with Linguistic Features

• Other Projects

68 University of AlbertaNov 30, 2006

69 University of AlbertaNov 30, 2006

Connecting E and F

• One language generates the other – IBM models (Brown et al. 1993), HMM (Vogel et al. 1996),

Tree-to-string model (Yamada and Knight 2001)

• Both languages generated simultaneously– Joint model (Melamed 2000), Phrasal joint model (Marcu and

Wong 2002)

• S and T generate an alignment– Conditional model (Cherry and Lin 2003), Discriminative

models (Taskar et al. 2005, Moore 2005)

70 University of AlbertaNov 30, 2006

Phrases agree, not trees

he ran here quickly

Dependencies state that ran is modified here and quickly separately

We allow ITG to state that ran is modified by “here quickly”

Also tested these additional head constraints

71 University of AlbertaNov 30, 2006

A

Effect on Parser

the

tax

causes

unrest

l’ impôt cause le malaise

A x

72 University of AlbertaNov 30, 2006

Custom Grammar Solution

• What trees force the and tax to stay together?– Custom recursive grammar– Same alignment space, canonical tree

the tax causes unresttax causes unrest

the tax

ITG

ITG

73 University of AlbertaNov 30, 2006

Guidance Results

02468

101214161820

Alignment Error Rate

PermutationITGDep BeamD-ITGHD-ITG

74 University of AlbertaNov 30, 2006

Expressiveness Results

90919293949596979899

100

Recall

UnconstrainedPermutationITGDep BeamD-ITGHD-ITG

75 University of AlbertaNov 30, 2006

Expressiveness Analysis

• HD-ITG has systematic violations– Discontinuous Constituents (Melamed, 2003)

– Maintains distance to head - not always maintained in translation

Canadian Wheat Board

Commission Canadienne du blé

Canadian Wheat Board

76 University of AlbertaNov 30, 2006

Discriminative Alignment

• Alignment can be viewed as multi-class classification

the tax causes unrest

l’ impôt cause le malaise

the tax causes unrest

l’ impôt cause le malaise

Input:

Correct Answer:

Wrong Answers:

the tax causes unrest

l’ impôt cause le malaise

the tax causes unrest

l’ impôt cause le malaise

the tax causes unrest

l’ impôt cause le malaise…

77 University of AlbertaNov 30, 2006

Problem

• Exponential number of incorrect alignments• One solution:

– Take advantage of properties of matching algorithm– Factor constraints

• Doing the same factorization on ITG could be a lot of work - need something more modular– Averaged perceptron?– Structured SVM

78 University of AlbertaNov 30, 2006

Final Challenge

• Need gold standard trees to train on, only have gold standard alignments

• Versatility of ITG makes this easy:– Search for best parse given an alignment– Select the parse with fewest cohesion violations and

fewest inversions

79 University of AlbertaNov 30, 2006

Redundancy

• Using A[AA] | <AA> | e/f– Several parses produce the same alignment– Wu provides a canonical-form grammar– Creates only one parse per alignment

• Useful for:– Counting methods like EM– Detecting arbitrary bracketing decisions

80 University of AlbertaNov 30, 2006

Results Table

Method Prec Rec AER

Match 79.3 82.7 19.24

ITG 81.8 83.7 17.36

Cohesion 88.8 84.0 13.40

D-ITG 88.8 84.2 13.32

HD-ITG 89.2 84.0 13.15

81 University of AlbertaNov 30, 2006

Guidance Results

0

5

10

15

20

25

Prec Err Rec Err AER

PermutationITGDep BeamD-ITGHD-ITG

82 University of AlbertaNov 30, 2006

Expressiveness Results

0

1

2

3

4

5

6

Recall Error AER

PermutationITGDep BeamD-ITGHD-ITG

83 University of AlbertaNov 30, 2006

SVM Objective

minw,ξ12w

2+ Cn

ξ ii=1

n

∑ s.t. ∀ i :ξ i ≥ 0

∀ i,∀y :ξ i ≥ Δ(y i,y) + w • Ψi(y) −w • Ψi(y i)

Slack Structured loss Feature rep