machine translation - 13: syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most...

19
Machine Translation 13: Syntax Rico Sennrich University of Edinburgh R. Sennrich MT – 2018 – 13 1 / 30 Overview today some syntactic phenomena that make (machine) translation difficult some solutions from MT research R. Sennrich MT – 2018 – 13 1 / 30 R. Sennrich MT – 2018 – 13 2 / 30 Word Order languages differ in word order: SVO: English, French, Mandarin, Russian, ... SOV: Hindi, Latin, Japanese, Korean, ... VSO, VOS, OSV, OSV exist, but less common German is V2 in main clause, SOV in subordinate clause word order more flexible when function is morphologically marked example: German–English der Mann , der den letzten Marathon gewonnen hat the man who won the last marathon R. Sennrich MT – 2018 – 13 3 / 30

Upload: others

Post on 02-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Machine Translation13: Syntax

Rico Sennrich

University of Edinburgh

R. Sennrich MT – 2018 – 13 1 / 30

Overview

todaysome syntactic phenomena that make (machine) translation difficult

some solutions from MT research

R. Sennrich MT – 2018 – 13 1 / 30

R. Sennrich MT – 2018 – 13 2 / 30

Word Order

languages differ in word order:SVO: English, French, Mandarin, Russian, ...SOV: Hindi, Latin, Japanese, Korean, ...VSO, VOS, OSV, OSV exist, but less commonGerman is V2 in main clause, SOV in subordinate clauseword order more flexible when function is morphologically marked

example: German–English

der Mann , der den letzten Marathon gewonnen hat

the man who won the last marathon

R. Sennrich MT – 2018 – 13 3 / 30

Page 2: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

paid

German English

R. Sennrich MT – 2018 – 13 4 / 30 R. Sennrich MT – 2018 – 13 5 / 30

Arabic English

press conference

the day

current strategyIraqi government

enough progress

American troops

R. Sennrich MT – 2018 – 13 6 / 30

Word Order

translation units can be discontinuous

example: German separable verb prefixes are clause-finalhe proposes a tradeer schlägt einen Handel vor

R. Sennrich MT – 2018 – 13 7 / 30

Page 3: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Word Order

"The Awful German Language" by Mark TwainThe Germans have another kind of parenthesis, which they make bysplitting a verb in two and putting half of it at the beginning of an excitingchapter and the other half at the end of it. Can any one conceive ofanything more confusing than that? These things are called "separableverbs." The German grammar is blistered all over with separable verbs;and the wider the two portions of one of them are spread apart, the betterthe author of the crime is pleased with his performance.

R. Sennrich MT – 2018 – 13 8 / 30

Word Order

source side pre-reorderingpreprocess the source text to better match target language word order

standard for some language pairs for phrase-based SMTvarious approaches based on syntactic analysis of source sentence:

hand-written rules [Nießen and Ney, 2000, Collins et al., 2005]automatically learned rules [Xia and McCord, 2004, Genzel, 2010]neural pre-reordering [Miceli Barone and Attardi, 2015]

negative results for neural MT [Du and Way, 2017]

R. Sennrich MT – 2018 – 13 9 / 30

Word Order

target side pre-reordering?in principle, we can reorder target side before training

need second step to restore original target language word order→ this is hard

some research, but never became standard

R. Sennrich MT – 2018 – 13 10 / 30

Syntactic N-grams

n-grams may not be meaningful if word order is flexible

syntactic n-grams for evaluation: head-word-chain metric (HWCM)[Liu and Gildea, 2005]

syntactic n-grams

die Passagiere auf dem Zug begrüßen den mutigen Entschlussthe passengers on the train welcome the bold decision

rootsubj

det pp pndet

obja

attr

det

<s> begrüßen begrüßen Passagiere Passagiere diePassagiere auf auf Zug Zug dem

begrüßen Entschluss Entschluss den Entschluss mutigen

R. Sennrich MT – 2018 – 13 11 / 30

Page 4: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Non-projective Structures

classical example of context-sensitive structuresin Swiss German (Zurich dialect)

dass mer em Hans s’Huus hälfed aastriiche

that we help Hans paint the house

R. Sennrich MT – 2018 – 13 12 / 30

Non-projective Structures

non-projective German dependency tree

Wir müssen Systeme aufbauen , die sicher sind .

We have to build systems that are safe .

root root

subjobja

auxcomma

subj pred

rel

rel

most syntax-based SMT systems are context-free

either they can’t produce non-projective structures,or we use pseudo-projective arcs (dotted line)

R. Sennrich MT – 2018 – 13 13 / 30

Subcategorization

words occur with specific syntactic arguments

complex mapping between syntactic arguments and meaning

exampleremember can have direct object or clausal objectsemantic role: content of the memory

he remembers his medical appointment.he remembers that he has a medical appointment.

remind can have direct object, and prep. or clausal objectdirect object: recipient of informationprep. or clausal object: informationI remind him of his medical appointment.I remind him that he has a medical appointment.

ungrammatical (or semantically nonsensical):*he remembers of his medical appointment.*he reminds his medical appointment.

R. Sennrich MT – 2018 – 13 14 / 30

Subcategorization

subcategorization rules often differ between languages

he remembers the medical appointment.

*er erinnert den Arzttermin.

er erinnert sich an den Arzttermin.

*he remembers himself to the medical appointment.

for some translations, syntactic arguments swap semantic roles

he misses the cat

die Katze fehlt ihm(the cat is missing to him)

R. Sennrich MT – 2018 – 13 15 / 30

Page 5: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Subcategorization

different meanings of words occur with different subcategories:

she applies for a job.prep. object for : submit oneself as a candidate (German: “bewerben”)

this rule applies to everyone.intransitive: be relevant (German: “gelten”)

he applies the wrong test.transitive: use (German: “anwenden”)

R. Sennrich MT – 2018 – 13 16 / 30

Syntax and Neural MT

attentional encoder–decoder is less limited than previous approaches

reordering can be learned by attention model

consequence of subcategorization constraints:you should not translate syntactic arguments independently

recurrent model can handle discontiguous and non-projectivestructures

recent researchdoes neural MT benefit from syntactic structure/information?

R. Sennrich MT – 2018 – 13 17 / 30

Linguistic Input Features

guide reordering with syntactic information

source Gefährlichpred ist die Routesubj aber dennoch.reference However the route is dangerous.baseline NMT *Dangerous is the route, however.

R. Sennrich MT – 2018 – 13 18 / 30

Linguistic Input Features

disambiguate words by POS

English Germancloseverb schließencloseadj nahclosenoun Ende

source We thought a win like this might be closeadj.reference Wir dachten, dass ein solcher Sieg nah sein könnte.baseline NMT *Wir dachten, ein Sieg wie dieser könnte schließen.

R. Sennrich MT – 2018 – 13 19 / 30

Page 6: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Neural Machine Translation: Multiple Input Features

Use separate embeddings for each feature, then concatenate(same method as for inclusion of lemma)

baseline: only word feature

E(close) =

0.50.20.30.1

|F | input features

E1(close) =

0.40.10.2

E2(adj) =

[0.1]

E1(close) ‖ E2(adj) =

0.40.10.20.1

R. Sennrich MT – 2018 – 13 20 / 30

Experiments

Featureslemmas

morphological features

POS tags

dependency labels

BPE tags

DataWMT16 training/test data

English↔German and English→Romanian

R. Sennrich MT – 2018 – 13 21 / 30

Results: BLEU ↑

English→German German→English English→Romanian

0.0

10.0

20.0

30.0

40.0

27.8

31.4

23.8

28.4

32.9

24.8

33.1

37.5

28.2

33.2

38.5

29.2

BLE

U

baselineall featuresbaseline (+synthetic data)all features (+synthetic data)

R. Sennrich MT – 2018 – 13 22 / 30

Tree-Structured Encoders [Eriguchi et al., 2016]

represent source sentence as binary tree

leaf nodes: states of sequential RNN

tree-based encoder computes state of k-th parent node (hpk) asfunction of left and right child nodes (hlk and hrk):

hpk = f(hlk, hrk)

allow attention on original encoder states (leaves) and tree nodes

Figure 5: Translation example of a long sentence and the attentional relations by our proposed model.

amples in other sentences, where our model out-puts synonyms of the reference words, e.g. “女”and “女性” (“female” in English) and “NASA”and “航空宇宙局” (“National Aeronautics andSpace Administration” in English). These trans-lations are penalized in terms of BLEU scores, butthey do not necessarily mean that the translationswere wrong. This point may be supported by thefact that the NMT models were highly evaluatedin WAT’15 by crowd sourcing (Nakazawa et al.,2015).

6 Related Work

Kalchbrenner and Blunsom (2013) were the firstto propose an end-to-end NMT model using Con-volutional Neural Networks (CNNs) as the sourceencoder and using RNNs as the target decoder.The Encoder-Decoder model can be seen as an ex-tension of their model, and it replaces the CNNswith RNNs using GRUs (Cho et al., 2014b) orLSTMs (Sutskever et al., 2014).

Sutskever et al. (2014) have shown that mak-ing the input sequences reversed is effective in aFrench-to-English translation task, and the tech-nique has also proven effective in translation tasksbetween other European language pairs (Luong etal., 2015a). All of the NMT models mentionedabove are based on sequential encoders. To incor-porate structural information into the NMT mod-els, Cho et al. (2014a) proposed to jointly learnstructures inherent in source-side languages butdid not report improvement of translation perfor-mance. These studies motivated us to investigatethe role of syntactic structures explicitly given byexisting syntactic parsers in the NMT models.

The attention mechanism (Bahdanau et al.,2015) has promoted NMT onto the next stage. It

enables the NMT models to translate while align-ing the target with the source. Luong et al. (2015a)refined the attention model so that it can dynami-cally focus on local windows rather than the entiresentence. They also proposed a more effective at-tentional path in the calculation of ANMT models.Subsequently, several ANMT models have beenproposed (Cheng et al., 2016; Cohn et al., 2016);however, each model is based on the existing se-quential attentional models and does not focus ona syntactic structure of languages.

7 Conclusion

In this paper, we propose a novel syntactic ap-proach that extends attentional NMT models. Wefocus on the phrase structure of the input sen-tence and build a tree-based encoder followingthe parsed tree. Our proposed tree-based encoderis a natural extension of the sequential encodermodel, where the leaf units of the tree-LSTMin the encoder can work together with the origi-nal sequential LSTM encoder. Moreover, the at-tention mechanism allows the tree-based encoderto align not only the input words but also inputphrases with the output words. Experimental re-sults on the WAT’15 English-to-Japanese transla-tion dataset demonstrate that our proposed modelachieves the best RIBES score and outperformsthe sequential attentional NMT model.

Acknowledgments

We thank the anonymous reviewers for their con-structive comments and suggestions. This workwas supported by CREST, JST, and JSPS KAK-ENHI Grant Number 15J12597.

[Eriguchi et al., 2016]R. Sennrich MT – 2018 – 13 23 / 30

Page 7: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

hypothesis:

recurrent neural networks have recency bias

instead, we want to induce syntactic bias

R. Sennrich MT – 2018 – 13 24 / 30

Recurrent Neural Network Grammars

One theory of hierarchy

• Generate symbols sequentially using an RNN

• Add some “control symbols” to rewrite the history periodically

• Periodically “compress” a sequence into a single “constituent”

• Augment RNN with an operation to compress recent history into a single vector (-> “reduce”)

• RNN predicts next symbol based on the history of compressed elements and non-compressed terminals (“shift” or “generate”)

• RNN must also predict “control symbols” that decide how big constituents are

• We call such models recurrent neural network grammars.

R. Sennrich MT – 2018 – 13 25 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

R. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

R. Sennrich MT – 2018 – 13 26 / 30

Page 8: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Page 9: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Page 10: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Trees as sequences

The hungry cat meows .

NP VP

S

S( NP( The hungry cat ) VP( meows ) . )

Tree traversals correspond to stack operationsR. Sennrich MT – 2018 – 13 26 / 30

Recurrent Neural Network Grammars

Stack ActionTerminals

R. Sennrich MT – 2018 – 13 27 / 30

Page 11: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Stack ActionNT(S)

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S

Terminals Stack ActionNT(S)

(S NT(NP)

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Page 12: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Page 13: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

The hungry cat (S (NP The hungry cat )

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

The hungry cat (S (NP The hungry cat )

(S (NP The hungry cat)

Compress “The hungry cat”into a single composite symbol

R. Sennrich MT – 2018 – 13 27 / 30

Page 14: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)

(S (NP The hungry cat) (VPThe hungry cat

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)

GEN(meows)(S (NP The hungry cat) (VPThe hungry cat

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Page 15: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Stack ActionNT(S)

(S NT(NP)

(S (NP GEN(The)

GEN(hungry)

GEN(cat)

REDUCE

NT(VP)

GEN(meows)

The hungry cat meows

REDUCE

(S (NP The hungry cat) (VP meows) GEN(.)

REDUCE

(S (NP The hungry cat) (VP meows) .)The hungry cat meows .

(S (NP The hungry cat) (VP meows) .The hungry cat meows .

(S (NP The hungry cat) (VP meowsThe hungry cat meows

(S (NP The hungry cat) (VPThe hungry cat

(S (NP The hungry cat)The hungry cat

(S (NP The hungry catThe hungry cat

(S (NP The hungryThe hungry

(S (NP TheThe

Terminals

R. Sennrich MT – 2018 – 13 27 / 30

Recurrent Neural Network Grammars

Syntactic Composition(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition(NP The hungry cat)Need representation for:

NP

What head type?

Syntactic Composition

The

(NP The hungry cat)Need representation for:

NP

What head type?R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

The hungry

(NP The hungry cat)Need representation for:

NP

What head type?R. Sennrich MT – 2018 – 13 28 / 30

Page 16: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Syntactic Composition

The hungry cat

(NP The hungry cat)Need representation for:

NP

What head type?R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

The hungry cat )

(NP The hungry cat)Need representation for:

NP

What head type?R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

TheNP hungry cat )

(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Page 17: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

TheNP hungry cat ) NP

(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Syntactic Composition

TheNP hungry cat ) NP (

(NP The hungry cat)Need representation for:

Syntactic Composition

TheNP hungry cat ) NP (

(NP The hungry cat)Need representation for:

R. Sennrich MT – 2018 – 13 28 / 30

Page 18: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Recurrent Neural Network Grammars

Recursion

TheNP cat ) NP (

(NP The (ADJP very hungry) cat)Need representation for: (NP The hungry cat)

hungry

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

Recursion

TheNP cat ) NP (

(NP The (ADJP very hungry) cat)Need representation for: (NP The hungry cat)

| {z }v

v

R. Sennrich MT – 2018 – 13 28 / 30

Recurrent Neural Network Grammars

RNNGs in MT• Basic idea: learn decoder-encoder MT model and

RNNG on parallel data with parsed target side, sharing target word embedding parameters. (multi-task learning). To translate, just use MT model.

Eriguchi et al., Feb 2017

R. Sennrich MT – 2018 – 13 29 / 30

Slide Credit

slide credit for slides 2,4-6,24-27: Adam Lopez

R. Sennrich MT – 2018 – 13 30 / 30

Page 19: Machine Translation - 13: Syntaxhomepages.inf.ed.ac.uk/rsennric/mt18/13_4up.pdfsubjpred rel rel most syntax-based SMT systems are context-free either they can't produce non-projective

Bibliography I

Collins, M., Koehn, P., and Kucerová, I. (2005).Clause Restructuring for Statistical Machine Translation.In ACL ’05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 531–540, Ann Arbor,Michigan. Association for Computational Linguistics.

Du, J. and Way, A. (2017).Pre-Reordering for Neural Machine Translation: Helpful or Harmful?The Prague Bulletin of Mathematical Linguistic, (108):107–182.

Eriguchi, A., Hashimoto, K., and Tsuruoka, Y. (2016).Tree-to-Sequence Attentional Neural Machine Translation.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages823–833, Berlin, Germany.

Genzel, D. (2010).Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation.In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 376–384, Beijing,China. Coling 2010 Organizing Committee.

Liu, D. and Gildea, D. (2005).Syntactic Features for Evaluation of Machine Translation.InProceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization,pages 25–32, Ann Arbor, Michigan.

Miceli Barone, A. V. and Attardi, G. (2015).Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for Machine Translation.In Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 10–20, Denver,Colorado, USA. Association for Computational Linguistics.

R. Sennrich MT – 2018 – 13 31 / 30

Bibliography II

Nießen, S. and Ney, H. (2000).Improving SMT quality with morpho-syntactic analysis.In 18th Int. Conf. on Computational Linguistics, pages 1081–1085.

Xia, F. and McCord, M. (2004).Improving a Statistical MT System with Automatically Learned Rewrite Patterns.In Proceedings of Coling 2004, pages 508–514, Geneva, Switzerland. COLING.

R. Sennrich MT – 2018 – 13 32 / 30