daniel gildea (2003): loosely tree-based alignment for machine translation linguistics 580 (machine...

27
Daniel Gildea (2003): Daniel Gildea (2003): Loosely Tree-Based Loosely Tree-Based Alignment for Machine Alignment for Machine Translation Translation Linguistics 580 Linguistics 580 (Machine Translation) (Machine Translation) Scott Drellishak, Scott Drellishak, 2/21/2006 2/21/2006

Post on 22-Dec-2015

229 views

Category:

Documents


1 download

TRANSCRIPT

Daniel Gildea (2003):Daniel Gildea (2003):Loosely Tree-Based Alignment Loosely Tree-Based Alignment

for Machine Translationfor Machine Translation

Linguistics 580Linguistics 580(Machine Translation)(Machine Translation)

Scott Drellishak, 2/21/2006Scott Drellishak, 2/21/2006

OverviewOverview

Gildea presents an alignment model he Gildea presents an alignment model he describes as “loosely tree-based”describes as “loosely tree-based”

Builds on Yamada & Knight (2001), a tree-Builds on Yamada & Knight (2001), a tree-to-string modelto-string model

Gildea extends it with a clone operation, Gildea extends it with a clone operation, and also into a tree-to-tree modeland also into a tree-to-tree model

Wants to keep performance reasonable Wants to keep performance reasonable (polynomial in sentence length)(polynomial in sentence length)

BackgroundBackground Tree-to-String ModelTree-to-String Model Tree-to-Tree ModelTree-to-Tree Model ExperimentExperiment

BackgroundBackground

Historically, two approaches to MT: Historically, two approaches to MT: transfer-based and statisticaltransfer-based and statistical

More recently, though, hybridsMore recently, though, hybrids Probabilistic models of structured Probabilistic models of structured

representations:representations: Wu (1997) Stochastic Inversion Transduction Wu (1997) Stochastic Inversion Transduction

GrammarsGrammars Alshawi et. al. (2000) Head TransducersAlshawi et. al. (2000) Head Transducers Yamada & Knight (2001) (see below)Yamada & Knight (2001) (see below)

Gildea’s ProposalGildea’s Proposal

Need to handle drastic changes to trees Need to handle drastic changes to trees (real bitexts aren’t isomorphic)(real bitexts aren’t isomorphic)

To do this, Gildea adds a new operation to To do this, Gildea adds a new operation to the Y&K’s model: subtree clonethe Y&K’s model: subtree clone

This operation clones a subtree from the This operation clones a subtree from the source tree to source tree to anywhereanywhere in the target tree. in the target tree.

Gildea also proposes a tree-to-tree model Gildea also proposes a tree-to-tree model that uses parallel tree corpora.that uses parallel tree corpora.

BackgroundBackground Tree-to-String ModelTree-to-String Model Tree-to-Tree ModelTree-to-Tree Model ExperimentExperiment

Yamada and Knight (2001)Yamada and Knight (2001)

Y&K’s model is tree-to-string: the input is a Y&K’s model is tree-to-string: the input is a tree and output is a string of words.tree and output is a string of words.

(Gildea compares it to an “Alexander (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who Calder mobile”. He’s the guy who invented that kind of sculpture, which is invented that kind of sculpture, which is like Y&K’s model, because each node of like Y&K’s model, because each node of the tree can turn either backwards or the tree can turn either backwards or forwards. Visualize!)forwards. Visualize!)

Y&K Tree-to-String ModelY&K Tree-to-String Model

Three steps to turn input into output:Three steps to turn input into output:1.1. Reorder the children of each node (for Reorder the children of each node (for mm

nodes, nodes, mm! orderings; conditioned only on the ! orderings; conditioned only on the category of the node and its children)category of the node and its children)

2.2. Optionally insert words at each node either Optionally insert words at each node either before or after all the children (conditioned before or after all the children (conditioned only on foreign word)only on foreign word)

3.3. Translate words at leaves (conditioned on Translate words at leaves (conditioned on PP((f|ef|e); words can translate to NULL)); words can translate to NULL)

Aside: Y&K SuitabilityAside: Y&K Suitability

Recall that this model was used for Recall that this model was used for translating English to Japanese.translating English to Japanese.

Their model is well-suited to this language Their model is well-suited to this language pair:pair: Japanese is SOV, while English is SVO. Japanese is SOV, while English is SVO.

Japanese is also generally head-last where Japanese is also generally head-last where English is head-first. Reordering handles English is head-first. Reordering handles both of these.both of these.

Japanese marks subjects/topics and objects Japanese marks subjects/topics and objects with postpositions. Insertion handles this.with postpositions. Insertion handles this.

Y&K EM AlgorithmY&K EM Algorithm

EM algorithm estimates inside probabilities EM algorithm estimates inside probabilities ββ bottom-up: bottom-up:

for all nodes εi in input tree T do for all k, l such that 1 < k < l < N do for all orderings ρ of the children ε1… εm of εi do for all partitions of span k, l into k1, l1…km, lm do

end for end for end forend for

Y&K PerformanceY&K Performance

Computation complexity Computation complexity OO(|T|(|T|NNmm+2+2), where ), where T = tree, N = input length, m = fan-out of T = tree, N = input length, m = fan-out of the grammarthe grammar

““By storing partially complete arcs in the By storing partially complete arcs in the chart and interleaving the inner two loops”, chart and interleaving the inner two loops”, improve to improve to OO(|T|(|T|nn33mm!2!2mm))

Gildea says “exponential in Gildea says “exponential in mm” (looks ” (looks factorial to me) but polynomial in factorial to me) but polynomial in N/nN/n

If |T| is O(If |T| is O(nn) then the whole thing is O() then the whole thing is O(nn44))

Y&K DrawbacksY&K Drawbacks

No alignments with crossing brackets:No alignments with crossing brackets:

AA

BB ZZ

XX YY XZY and YZX are impossibleXZY and YZX are impossible Recall that Y&K flatten trees to avoid Recall that Y&K flatten trees to avoid

some of this, but don’t catch all casessome of this, but don’t catch all cases

Adding CloneAdding Clone

Gildea adds clone operation to Y&K’s modelGildea adds clone operation to Y&K’s model For each node, allow the insertion of a clone of For each node, allow the insertion of a clone of

another node as its child.another node as its child. Probability of cloning Probability of cloning εεii under under εεjj in two steps: in two steps:

Choice to insert:Choice to insert: Node to clone:Node to clone:

PPcloneclone is one estimated number, is one estimated number, PPmakeclonemakeclone is is

constant (all nodes equally probable, reusable)constant (all nodes equally probable, reusable)

BackgroundBackground Tree-to-String ModelTree-to-String Model Tree-to-Tree ModelTree-to-Tree Model ExperimentExperiment

Tree-to-Tree ModelTree-to-Tree Model

Output is a tree, not a string, and it must Output is a tree, not a string, and it must match the tree in the target corpusmatch the tree in the target corpus

Add two new transformation operations:Add two new transformation operations: one source node → two target nodesone source node → two target nodes two source nodes → one target nodetwo source nodes → one target node

““a synchronous tree substitution grammar, a synchronous tree substitution grammar, with probabilities parameterized to with probabilities parameterized to generate the target tree conditioned on the generate the target tree conditioned on the structure of the source tree.”structure of the source tree.”

Calculating ProbabilityCalculating Probability

From the root down. At each level:From the root down. At each level: At most one of node’s children grouped with it, At most one of node’s children grouped with it,

forming an forming an elementary treeelementary tree (conditioned on (conditioned on current node and CFG rule children)current node and CFG rule children)

Alignment of e-tree chosen (conditioned as Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) above). Like Y&K reordering except: (1) alignment can include insertions and deletions alignment can include insertions and deletions (2) two nodes grouped together are reordered (2) two nodes grouped together are reordered together.together.

Lexical leaves translated as before.Lexical leaves translated as before.

Elementary Trees?Elementary Trees?

Elementary trees allow the alignment of Elementary trees allow the alignment of trees with different depths. Treat A,B as trees with different depths. Treat A,B as an e-tree, reorder their children an e-tree, reorder their children togethertogether::

AA AA

BB ZZ →→ XX ZZ YY

XX YY

EM algorithmEM algorithm Estimates inside probabilities Estimates inside probabilities ββ bottom-up: bottom-up:

for all nodes εa in source tree Ta in bottom-up order in bottom-up order do for all elementary trees ta rooted in εa do for all nodes nodes εb in target tree in target tree Tb in bottom-up order in bottom-up order do for all elementary trees tb rooted in εb do for all alignments alignments αα of the children of of the children of ta and and tb dodo

end for end for end for end forend for

PerformancePerformance

Outer two loops are O(|T|Outer two loops are O(|T|22)) Elementary trees include at most one Elementary trees include at most one

child, so choosing e-trees is O(child, so choosing e-trees is O(mm22)) Alignment is O(2Alignment is O(222mm)) Which nodes to insert or clone is O(2Which nodes to insert or clone is O(222mm)) How to reorder is O((2How to reorder is O((2mm)!))!) Overall: O(|T|Overall: O(|T|22mm224422mm(2(2mm)!), quadratic (!) in )!), quadratic (!) in

size of the input sentence.size of the input sentence.

Tree-to-Tree CloneTree-to-Tree Clone

Allowing Allowing mm-to--to-nn matching of up to two matching of up to two nodes (e-trees) allows only “limited non-nodes (e-trees) allows only “limited non-isomorphism”isomorphism”

So, as before, add a clone operationSo, as before, add a clone operation Algorithm unchanged, except alignments Algorithm unchanged, except alignments

may now include cloned subtrees, same may now include cloned subtrees, same probability as in tree-to-string (uniform)probability as in tree-to-string (uniform)

BackgroundBackground Tree-to-String ModelTree-to-String Model Tree-to-Tree ModelTree-to-Tree Model ExperimentExperiment

The DataThe Data

Parallel Korean-English corpusParallel Korean-English corpus Trees annotated by hand on both sidesTrees annotated by hand on both sides ““in this paper we will be using only the in this paper we will be using only the

Korean trees, modeling their Korean trees, modeling their transformation into the English text.”transformation into the English text.”

(That can’t be right—only true for TTS?)(That can’t be right—only true for TTS?) 5083 sentence: 4982 training, 101 eval5083 sentence: 4982 training, 101 eval

Aside: SuitabilityAside: Suitability

Recall that Y&K’s model was suited to the Recall that Y&K’s model was suited to the English-to-Japanese task.English-to-Japanese task.

Gildea is going to compare their model to Gildea is going to compare their model to his, but using a Korean-English corpus. Is his, but using a Korean-English corpus. Is that fair?that fair?

In a word, yes. Korean and Japanese are In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, syntactically very similar: agglutinative, head-last (so similar that syntax is the head-last (so similar that syntax is the main argument that they’re related).main argument that they’re related).

ResultsResults

Alignment Error Rate Och & Ney (2000):Alignment Error Rate Och & Ney (2000):

AERAER

IBM Model 1IBM Model 1 .37.37

IBM Model 2IBM Model 2 .35.35

IBM Model 3IBM Model 3 .43.43

Tree-to-StringTree-to-String .42.42

TTS + cloneTTS + clone .36.36

TTS + clone, TTS + clone, PPinsins = .5 = .5 .32.32

Tree-to-TreeTree-to-Tree .49.49

TTT + cloneTTT + clone .36.36

Results DetailedResults Detailed

The lexical probabilities come from Model 1 and The lexical probabilities come from Model 1 and node reordering probabilities initialized to node reordering probabilities initialized to uniformuniform

Best results when Best results when PPinsins set to 0.5 rather than set to 0.5 rather than estimated (!)estimated (!)

““While the model learned by EM tends to While the model learned by EM tends to overestimate the total number of aligned word overestimate the total number of aligned word pairs, fixing a higher probability for insertions pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore results in fewer total aligned pairs and therefore a better trade-off between precision and recall”a better trade-off between precision and recall”

How’d TTS and TTT Do?How’d TTS and TTT Do?

The best results were with tree-to-string, The best results were with tree-to-string, surprisinglysurprisingly

Y&K + clone was ≈ to IBM, fixing Y&K + clone was ≈ to IBM, fixing PPinsins was was

best overallbest overall Tree-to-tree + clone was ≈ to IBM, but it Tree-to-tree + clone was ≈ to IBM, but it

was much more efficient to train (since it’s was much more efficient to train (since it’s quadratic instead of quartic)quadratic instead of quartic)

Still, disappointing results for TTTStill, disappointing results for TTT

ConclusionsConclusions

Model allows syntactic info to be used for Model allows syntactic info to be used for training without ordering constraintstraining without ordering constraints

Clone operations improve alignment Clone operations improve alignment resultsresults

Tree-to-tree + clone is better Tree-to-tree + clone is better onlyonly in in performance (but he’s hopeful)performance (but he’s hopeful)

Future directions: bigger corpora, Future directions: bigger corpora, conditioning on lexicalized treesconditioning on lexicalized trees