avenue architecture

17
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resource s Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitat ion Tool Elicitati on Corpus Elicitation Rule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Upload: hanae-rivas

Post on 02-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Elicitation. Morphology. Rule Learning. Run-Time System. Rule Refinement. Translation Correction Tool. Word-Aligned Parallel Corpus. Learning Module. INPUT TEXT. Run Time Transfer System. Learning Module. Learned Transfer Rules. Rule Refinement Module. Elicitation Corpus. - PowerPoint PPT Presentation

TRANSCRIPT

Avenue Architecture

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Interactive and Automatic Refinement of translation Rules

• Problem: Improve Machine Translation Quality.

• Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar.

• Approach: Automate post-editing efforts by feeding them back into the MT system.Automatic refinement of translation rules that

caused an error beyond post-editing.

• Goal: Improve MT coverage and overall quality.

Technical Challenges

Elicit minimal MT information from non-expert users

Automatically Refine and Expand

Translation Rules minimally

Manually written Automatically Learned

Automatic Evaluation of Refinement process

Error Typology for Automatic Rule Refinement (simplified)Missing word

Extra word

Wrong word order

Incorrect word

Wrong agreement

Interactive elicitation of error information

Local vs Long distance

Word vs. phrase

+ Word change

Sense

Form

Selectional restrictions

Idiom

Missing constraint

Extra constraint

TCTool (Demo)• Add a word• Delete a word• Modify a word• Change word order

Actions:

Interactive elicitation of error information

precision recall

error detection 90% 89%

error classification 72% 71%

1. Refine a translation rule:R0 R1 (change R0 to make it more

specific or more general)

Types of Refinement Operations

Automatic Rule Adaptation

R0:

R1:

NP

DET N ADJ

NP

DET ADJ N

a nice house

una casa bonito

NP

DET N ADJ

NP

DET ADJ N

a nice house

una casa bonita

N gender = ADJ gender

2. Bifurcate a translation rule:R0 R0 (same, general rule)

R1 (add a new more specific rule)

Types of Refinement Operations

Automatic Rule Adaptation

R0: NP

DET N ADJ

NP

DET ADJ N

NP

DET ADJ N

NP

DET ADJ N

R1:

a nice house una casa bonita

a great artist un gran artista

ADJ type: pre-nominal

Error Information Elicitation

Refinement Operation Typology

Automatic Rule Adaptation

Change word orderSL: Gaudí was a great artist

MT system output:TL: Gaudí era un artista grande

Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista

A concrete example

clue word

error

correction

Finding Triggering Feature(s): (error word, corrected word) =

need to postulate a new binary feature: feat1

Blame assignment (from MT system output)

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )

(VP,3 (VB,2 (AUX,17:2 "ERA") )

(NP,8 (DET,0:3 "UN")

(N,4:5 "ARTISTA")

(ADJ,5:4 "GRANDE") ) ) ) )>

Automatic Rule Adaptation

S,1

NP,1

NP,8

…Grammar

ADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

Refining Rules• Bifurcate NP,8 NP,8 (R0) + NP,8’ (R1)

(flip order of ADJ-N)

{NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N]( (X1::Y1) (X2::Y2) (X3::Y3)

((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ))

Automatic Rule Adaptation

Refining Lexical EntriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = -))

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = +))

Automatic Rule Adaptation

Evaluating ImprovementAutomatic Rule Adaptation

- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect

by the user)

un artista gran

un gran artista

un grande artista

*un artista grande

Evaluating ImprovementAutomatic Rule Adaptation

- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect

by the user)

*un artista gran

un gran artista

*un grande artista

*un artista grande

Challenges and future work

• Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace

• Order of corrections matters ~ explore rule interactions

• Explore the space between batch mode and fully interactive system

• Online TCTool always running to collect corrections from bilingual speakers make it into a game with rewards for the best users

Publications• Font Llitjós, A., J.G. Carbonell and A. Lavie.

"A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" EAMT 10th Annual Conference 30-31 May 2005, Budapest, Hungary.   

• Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA.  

• Font Llitjós, A., K. Probst and J.G. Carbonell . "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". AMTA, 2004, Washington, USA.   

• Font Llitjós, A. and J.G. Carbonell . "The Translation Correction Tool: English-Spanish user studies“. LREC, 2004. Lisbon, Portugal.   

QuechuaSpanish MT• V-Unit: funded Summer project in Cusco (Peru)

June-August 2005 [preparations and data collection started earlier]

• Intensive Quechua course in Centro Bartolome de las Casas (CBC)

• Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)

Quechua Spanish prototype MT system

Stem Lexicon (semi-automatically generated): 753

lexical entries

Suffix lexicon: 21 suffixes

(150 Cusihuaman)

Quechua morphology analyzer

25 translation rules

Spanish morphology generation module

User-Studies: 10 sentences, 3 users (2 native, 1 non-native)