chris dyer - kevin gimpel waleed ammar - noah smith

46
Knowledge-Rich MT November 4, 2011

Upload: dulcea

Post on 14-Jan-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Knowledge-Rich MT. Chris Dyer - Kevin Gimpel Waleed Ammar - Noah Smith. November 4, 2011. Outline. Where are we starting with end-to-end MT? Adapting SMT for low-resource scenarios What progress have we been making? What does Year 2 hold?. Cross-site system comparison. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Knowledge-Rich MT

November 4, 2011

Page 2: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Outline

•Where are we starting with end-to-end MT?

•Adapting SMT for low-resource scenarios

•What progress have we been making?

•What does Year 2 hold?

Page 3: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Cross-site system comparison

Page 4: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TM

learner

English français

LMlearner

English

decoder

S'il vous plaît traduire...

Please translate...

The SMT baseline

Page 5: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

SMT Baselines

BLEU

English – Kinyarwanda (Hiero) 4.7

BLEU

Kinyarwanda – English (Hiero) 6.8

Page 6: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

SMT Baselines

BLEU

English – Kinyarwanda (Hiero) 4.7

BLEU

English – Malagasy (Hiero) 25.0

English – Malagasy (Moses) 30.5

BLEU

Kinyarwanda – English (Hiero) 6.8

BLEU

Malagasy – English (Hiero) 24.3

Malagasy – English (Moses) 24.2

Page 7: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Let’s make things better.

Page 8: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TM

learner

English français

LMlearner

EnglishThe problem?

Page 9: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TM

learner

EnglishMalagasy

LMlearner

EnglishLow-resource!

Page 10: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TMEnglishMalagasy

LMlearner

EnglishLow-resource!

Small,Out of

domain

Page 11: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TMEnglishMalagasy

LMlearner

EnglishLow-resource!

Malagasy verbal morphology“Partial” language models

Page 12: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TMEnglishMalagasy

LMlearner

EnglishLow-resource!

Malagasy verbal morphology

Unsupservisedmodel outputs

Dependency parses

Page 13: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

TMEnglishMalagasy

LMlearner

EnglishLow-resource!

Malagasy verbal morphology

Unsupservisedmodel outputs

Dependency parses

36:dieny,fara,fiompiny,hamoaka,handehanany

37:adinina,aforeto,ahevahevao,akaiky,alao,

Word clusters

Page 14: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 1 MT Challenge

Page 15: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 1 MT ChallengeEnglishMalagasy

Malagasy verbal morphology

Dependency parses

36:dieny,fara,fiompiny,hamoaka,handehanany

37:adinina,aforeto,ahevahevao,akaiky,alao,

Word clusters

Page 16: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 1 MT ChallengeEnglishMalagasy

Malagasy verbal morphology

Dependency parses

36:dieny,fara,fiompiny,hamoaka,handehanany

37:adinina,aforeto,ahevahevao,akaiky,alao,

Word clusters

Translation ModelTranslation Model

Page 17: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 1 MT ChallengeEnglishMalagasy

Malagasy verbal morphology

Dependency parses

36:dieny,fara,fiompiny,hamoaka,handehanany

37:adinina,aforeto,ahevahevao,akaiky,alao,

Word clusters

Translation ModelTranslation Modelhenemana no hana ... something intelligible ...

Page 18: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Accomplishments

Page 19: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Model 4 CMU

Page 20: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Model 4 CMU

Page 21: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Model 4 CMU

Page 22: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Model 4 CMU

Similar pattern of improvements,no language-specific features (yet).

Page 23: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Malagasy - English

BLEUBLEU

Model 4 - GDAModel 4 - GDA 24.2

Model 4 - GDFAModel 4 - GDFA 26.7

CMU - GDFACMU - GDFA 26.3

Model 4 +CMUModel 4 +CMU 27.6

Malagasy - English version 1.0

Page 24: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

the sons of simeon were jemoela , jamin , jakin , and ohada zohara saul , the son of a canaanite woman .

the sons of simeon were jemuel , jamin , ohada , jakin , zohar , and shaul , the son of a canaanite woman .

the sons of simeon : jemuel , jamin , ohad , jakin , zohar , and shaul ( the son of a canaanite woman ) .

What improvements?

Page 25: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

the sons of simeon were jemoela , jamin , jakin , and ohada zohara saul , the son of a canaanite woman .

the sons of simeon were jemuel , jamin , ohada , jakin , zohar , and shaul , the son of a canaanite woman .

the sons of simeon : jemuel , jamin , ohad , jakin , zohar , and shaul ( the son of a canaanite woman ) .

What improvements?

Page 26: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

then the woman said to the serpent , “ no ! you will not die .

now the serpent said to the woman , “ you will not die .

the serpent said to the woman , “ surely you will not die ,

What improvements?

Page 27: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

then the woman said to the serpent , “ no ! you will not die .

now the serpent said to the woman , “ you will not die .

the serpent said to the woman , “ surely you will not die ,

What improvements?

Page 28: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Feature-rich translation

•Discriminative learning on training data

•Learn much sparser features than possible with just a development set

•Update weights to improve translation probability

•Final tuning pass on development set to optimize translation metrics (BLEU, METEOR, etc.)

Page 29: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

What features?

Page 30: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith
Page 31: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Contexts give clues to contintuents

Page 32: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Contexts give clues to contintuents

Page 33: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

German - English

BLEUBLEU FeatureFeaturess

baselinebaseline 25.0 11 / 11

+7-gram+7-gram 25.0 13 / 13

+Context+Context 25.211,194 /

80,006,646

+Context+Context+7-gram+7-gram

25.411,196 /

80,006,648

Page 34: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Phrasal dependency

translation model

Page 35: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Phrase-

based

output:

zimbabwe african national congresssanctions againstopposition to

ANC opposition sanction Zimbabwe

非国大 反对 制裁 津巴布韦

african national congress opposes sanctions against zimbabweReference:

Page 36: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

OurSyste

m:

zimbabweafrican national congress sanctions againstis opposed to

ANC opposition sanction Zimbabwe

$

非国大 反对 制裁 津巴布韦

african national congress opposes sanctions against zimbabweReference:

zimbabwe african national congresssanctions againstopposition to

ANC opposition sanction Zimbabwe

非国大 反对 制裁 津巴布韦

Phrase-

based

output:

Page 37: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

OurSyste

m:african national congress opposes sanctions against zimbabweReference:

zimbabwe african national congresssanctions againstopposition to

ANC opposition sanction Zimbabwe

非国大 反对 制裁 津巴布韦

zimbabweafrican national congress sanctions againstis opposed to

ANC opposition sanction Zimbabwe$

$

非国大 反对 制裁 津巴布韦

Use features from source-side parse

Phrase-

based

output:

Page 38: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Target Syntax Only

% BLEU

Page 39: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Target Syntax +

String-to-Tree Rules

Target Syntax Only

% BLEU

Page 40: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Target Syntax +

String-to-Tree Rules

Target Syntax Only

% BLEU

Target Syntax +

String-to-Tree Rules +

Tree-to-Tree Features

Page 41: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

•Our best results use supervised parsers for both source and target languages

•What about unsupervised parsing?

Page 42: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

•Our best results use supervised parsers for both source and target languages

•What about unsupervised parsing?

•We use the dependency model with valence (Klein & Manning, 2004)

•With careful initialization, it gives state-of-the-art results (Gimpel & Smith, 2011):

•53.1% attachment accuracy on Penn Treebank

•44.4% on Chinese Treebank

Page 43: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

% BLEU

Page 44: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 2

•Target morphological complexity

•Generate novel word forms

•Leverage morphological resources and machine learning

•Need better language models, not just translation models

“Into other languages”

Page 45: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 2 Challenges

•Generating new word forms means a much larger search space than is usual in MT

•Inference is expensive

•Use “high-recall” linguistic tools to constrain search

•Statistics do the rest

Page 46: Chris Dyer - Kevin Gimpel  Waleed Ammar - Noah Smith

Year 2

•Data requirements

•Large non-English monolingual corpora

•Test sets for focus languages