evaluation of machine translation errors in english and iraqi arabic

34
© 2010 The MITRE Corporation. All rights reserved Sherri Condon, Dan Parvaz, John Aberdeen, Christy Doran, Andrew Freeman and Marwan Awad The MITRE Corporation Evaluation of Machine Translation Errors in English and Iraqi Arabic Approved for Public Release:10-101174. Distribution Unlimited LREC 2010

Upload: sef

Post on 24-Feb-2016

80 views

Category:

Documents


1 download

DESCRIPTION

Evaluation of Machine Translation Errors in English and Iraqi Arabic. Sherri Condon, Dan Parvaz, John Aberdeen, Christy Doran, Andrew Freeman and Marwan Awad The MITRE Corporation. LREC 2010. Approved for Public Release:10-101174. Distribution Unlimited. Preview. Methods - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2010 The MITRE Corporation. All rights reserved

Sherri Condon, Dan Parvaz, John Aberdeen, Christy Doran, Andrew Freeman and Marwan Awad

The MITRE Corporation

Evaluation of Machine Translation Errors in English and

Iraqi Arabic

Approved for Public Release:10-101174. Distribution Unlimited

LREC 2010

Page 2: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

2

© 2010 The MITRE Corporation. All rights reserved

Preview

Methods– DARPA Speech Translation– HTER and annotation process– Annotation categories

Iraqi Arabic to English (I→E) Errors– Polarity errors– Pronoun errors– Copula errors

English to Iraqi Arabic (E→I) Errors – Subject pronoun inflection errors– Word order errors– “Other” errors

Summary and Conclusions

Page 3: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

3

© 2010 The MITRE Corporation. All rights reserved

DARPA Speech Translation Systems

2-way communication for English and Iraqi Arabic– Military domains and use cases– Checkpoint, facility inspection, civil affairs, training, medical– Funded 4 speech translation systems (labeled A-D)

Evaluations conducted by NIST and MITRE– Live evaluations with military users and Iraqi speakers– Offline evaluations using recordings of military users and

Iraqi speakers

Error analyses use translations of text transcriptions from offline recordings– Exclude errors from speech recognition

Page 4: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

4

© 2010 The MITRE Corporation. All rights reserved

Evaluation Data

Samples from 2 evaluations– June 2008– November 2008

Translations from 4 systems Subset of offline inputs

Translation Direction June, 2008 Nov., 2008

English to Iraqi Arabic 436 372

Iraqi Arabic to English 388 432

Number of Translations Annotated

Page 5: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

5

© 2010 The MITRE Corporation. All rights reserved

Error Analysis

THIS IS REALLY HARD!– Errors depend on what’s correct– But no single correct translation

Automated measures of translation quality like Translation Error Rate (TER) are not diagnostic– Scores based on changes needed to turn system output into

reference translation (insertion, deletion, substitution, shift)– Human TER (HTER) requires humans to create reference

translations as close as possible to system output

We used HTER for error annotation– Provides a maximally close correct translation– TER alignment and annotation facilitates our annotation

Page 6: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

6

© 2010 The MITRE Corporation. All rights reserved

Annotation Process

Customize reference translations– NIST post-editing tool for HTER reference translations– 4 reference translations for post-editors

Align and annotate translations with TER– Annotators may change alignments– Keep word classes aligned where possible

Annotate TER errors– Identify major word classes of errors– Quantify polarity and speech act errors– Exclude minor errors

Page 7: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

7

© 2010 The MITRE Corporation. All rights reserved

ID Reference Output TER RealignAnnotate

70 and and70 is I @ ssa*70 this this70 stuff stuff70 was D S null70 stolen stolen70 from from70 the the70 market market

Sample Annotation

*substituted speech act (takes priority over “word order” annotation)

Page 8: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

8

© 2010 The MITRE Corporation. All rights reserved

Annotations

Null [synonyms, articles, some prepositions/inflections] Word Order [= TER ‘shift’] Polarity (negative to positive or positive to negative) Substituted Speech Act (e.g., question to statement) Untranslated (transliterated, “???”) Verb (deleted, inserted, and substituted) Noun (same) Pronoun (same) Pronoun-Verb Complex [for English contractions and

Arabic verbs with subject inflection only] (same) Verb Person Inflection [substitute Arabic subject inflection] Other [adjectives, prepositions, conjunctions] (same)

Page 9: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

9

© 2010 The MITRE Corporation. All rights reserved

June I→E: Proportions of TER Error Types

Deletions Insertions Substitutions Word Order0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

ABCD

Systems

Page 10: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

10

© 2010 The MITRE Corporation. All rights reserved

June I→E: Proportions of Word Class Errors

Pronouns Pro-V Complex

Verbs Nouns Other0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

ABCD

Systems

Page 11: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

11

© 2010 The MITRE Corporation. All rights reserved

I→E: Polarity Errors

Transcript: عندهم و متدرب جندي ثالثين حالياً عنديخفيفة أسلحة

MT: I don’t have at the moment thirty soldier trained and they have light weapons

Ref: I have at the moment thirty trained soldiers and they have light weapons

Transcript: و مستعجلين كنا وقت عندنا كان ما الله و التشيكنا ما سوينا فما بالعجل ناسيشتغلون محتاجين

قبل عليهم MT: no and god we do not have time we were in a hurry and we

need people to work hurry up so we did nothing we checked them before

Ref: no we did not have time we were in a hurry and we need people to work immediately so we did not check them before System A B C D

Frequency 2 2 2 1

Page 12: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

12

© 2010 The MITRE Corporation. All rights reserved

June I→E Pronoun Issues: Subjects

Frequency of pronouns (19%) and nouns (17%) are nearly equal yet pronoun errors are 2 times higher than noun errors

In Iraqi Arabic pronominal subjects are expressed only as verb inflection– MT: was bitten by a scorpion– Ref: he was bitten by a scorpion

But some contrasts are neutralized/iftahamit/ إفتهمت understand+past+1st or 2nd person singular subject“I/you understood”

– MT: you see his symptoms– Ref: I saw his symptoms

Page 13: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

13

© 2010 The MITRE Corporation. All rights reserved

I→E Pronoun Issues: Insertions

Subject pronouns (few)– MT: those people they store them in this complex– Ref: those people store them in this complex

Resumptive pronouns (frequent)– MT: it is about three kilometers from the point the

checkpoint that he ran away from it– Ref: it is about three kilometers from the point the

checkpoint that he ran away from– MT: the area is four streets that will probably restrict it– Ref: the area is four streets that we will probably surround

These are non-null only if they might cause confusion, e.g., garden paths

Page 14: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

14

© 2010 The MITRE Corporation. All rights reserved

I→E Pronoun Issues: Gender

Iraqi Arabic does not have a neutral gender Many examples with it instead of he or she

– MT: are taking care of it god willing and hopefully it will get better a little bit more

– Ref: we are taking care of him god willing and hopefully he will get better soon

– MT: of course I mean it is in good condition– Ref: of course I mean she is in good condition

Only one example of he instead of it– MT: he civilian house consists of three rooms– Ref: it is a civilian house consisting of three rooms

Page 15: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

15

© 2010 The MITRE Corporation. All rights reserved

I→E Verbs: English be vs. Arabic “be”

English be serves several functions– They are eating at the restaurant (progressive)– The car was driven by a teenager (passive)– Sam is my brother (copula: identity)– Julia is brilliant (copula: attribution)

Arabic copula is not used in present tense– MT: no sir all the family in the house – Ref: no sir all the family is in the house – MT: but the problem those lazy and sleep on the at night– Ref: but the problem is they are lazy and sleep at night

Many errors with be are more complex errors

Page 16: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

16

© 2010 The MITRE Corporation. All rights reserved

Proportion of be in June I→E Verb Errors

A B C D0.000

0.100

0.200

0.300

0.400

0.500

0.600

Systems

Page 17: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

17

© 2010 The MITRE Corporation. All rights reserved

June E→I: Proportions of TER Error Types

Deletions Insertions Substitutions Word Order0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

0.550

ABCD

Systems

Page 18: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

18

© 2010 The MITRE Corporation. All rights reserved

June E→I: Proportions of Word Class Errors

Pronou

ns

Verb P

erson

Infle

ction

Verbs

Nouns

Other

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

ABCD

Systems

Page 19: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

19

© 2010 The MITRE Corporation. All rights reserved

E→I: Subject Verb Agreement Inflection

With an expressed subject, subject inflection on the verb that does not agree may cause confusion

Source: my marines are going to search the house

Ref: رح مالتي البيت ونفتشيالمارينز MT: رح ا مالتي فتشالبيت ألمارينزRef: AlmArynz mAlty rH yft$wn AlbytRef: the-Marines my will 3m-search-pl the-house

MT: AlmArynz mAlty rH >ft$ AlbytMT: the-Marines my will 1s-search the-house

Special annotation for these errors: Verb Person Inflection Relatively high frequency, except in rule-based system

Page 20: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

20

© 2010 The MITRE Corporation. All rights reserved

E→I: Pronominal Subject Inflection on Verbs

Pronoun errors occur when subject inflection does not match the source subject pronounSource: I might need to tell my commander I am stopping you

Ref: الزم مالي أيمكن للمسؤول وقفكأقولMT: الزم مالي تيمكن للمسؤول وقفنقولRef: ymkn lAzm >qwl llms&wl mAly >wqfkRef: maybe must 1st-sg-say to-the-official my 1st-sg-stop-

2nd-sg

MT: ymkn lAzm tqwl llms&wl mAly nwqfkMT: maybe must 2m/3f-say to-the-official my 1st-pl-stop-

2nd-sg

Number errors usually annotated as ‘null’ (green font) Person errors dramatically change meaning (red font)

Page 21: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

21

© 2010 The MITRE Corporation. All rights reserved

E→I: Both Subject and Verb are Incorrect

With pronominal subject unexpressed, a single verb may incorporate more than one significant error

Source: we will record inside who it belongs to Ref: رح منو سجلنإحنا مال هو جوة MT: رح مال قعديإحنا منو جوة

Ref: <HnA rH nsjl jwp hw mAl mnwRef: we will we-record inside he possession whom

MT: <HnA rH yqEd jwp mnw mAlMT: we will he-sits inside whom possession

Special annotation for Pronoun-Verb Complex Should count as both pronoun and verb error Low frequency

Page 22: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

22

© 2010 The MITRE Corporation. All rights reserved

E→I: Word Order Errors: Noun-Adjective

Slightly more word order errors in E→I vs. I→E In both directions, a significant proportion of these

reverse noun head and modifier orderSource: they have additional supplies

Ref: إضافية التجهيزاتعندهمMT: التجهيزات إضافيعند Ref: Endhm AltjhyzAt <DAfypRef: at+them det+supplies additional+fem

MT: End <DAfy AltjhyzAtMT: with additional det+supplies

Page 23: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

23

© 2010 The MITRE Corporation. All rights reserved

E→I: Word Order Errors: Noun-Noun

This is the Arabic noun-noun modification known as the construct or idafaSource: How does your source know this?

Ref: الشيء مالتك المصدرشلون بهذا عرفMT: أعرفهذا مصدرمالتك شلونRef: $lwn AlmSdr mAltk Erf bh*A Al$y’Ref: how det-source poss-2sm 3s-know in-this det-thing

MT: $lwn mAltk mSdr >Erf h*AMT: how poss-2sm source 1s-know this

.

Page 24: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

24

© 2010 The MITRE Corporation. All rights reserved

E→I: Word Order Errors in Idafa

40% of November 2008 E→I word order errors are wrong idafa order

Source: How does your source know this?

Ref: أشوف علمود مالتكم محطةإجيت الكهرباءMT: الكهرباء أشوف علمود مالتكم المحطةإجيت Ref: <jyt Elmwd >$wf mHTp AlkhrbA’ mAltkm

Ref: came+1s in-order-to see+1s station-of det+electricity poss-2p

MT: <jyt Elmwd >$wf AlkhrbA’ AlmHTp mAltkm

MT: came+1s in-order-to see+1s det+electricity det+station poss-2p

.

.

Page 25: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

25

© 2010 The MITRE Corporation. All rights reserved

E→I: “Other” Errors from Phrasal Verbs

Phrasal verbs are frequently treated as verbs plus prepositions

Source: we have to go through the detaining process Ref: الحجز نسويالزم عملية MT: عنطريق الزم العملية نروح الحجزRef: lAzm nswy Emlyp AlHjzRef: must 1pl-do +def-process the-

detention

MT: lAzm nrwH En Tryq AlHjz AlEmlypMT: must 1pl-go from road the-detention the-

process

English source "to go through" roughly means "to do from start to finish"

MT translated it as “motion through” or "to take a certain route"

This is a type of word sense error

Page 26: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

26

© 2010 The MITRE Corporation. All rights reserved

E→I: “Other” Multiword Expression Errors

23% of “Other” errors involve multiword expressions in the November 2008 corpus

Source: we can give you funds to where you can go out and buy the materials

Ref: الفلوس ن ننطيك المواد علمودقدر وتشتري تطلع MT: المواد وين الفلوس أقدر وتشتري تطلع Ref: nqdr nnTyk Alflws Elmwd tTlE wt$try AlmwAd

Ref: can+1p 1p+give+2ms det+money in-order-to 2ms+go-up and+2ms+buy det+material

MT: >qdr Alflws wyn tTlE wt$try AlmwAd

MT: can+1s det+money where 2ms+go-up and+2ms+buy det+material

Page 27: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

27

© 2010 The MITRE Corporation. All rights reserved

June I→E: Error Type Proportions by Word Class

Pronouns Verbs Nouns Other TotalTo English

To Arabic

To English

To Arabic

To English

To Arabic

To English

To Arabic

To English

To Arabic

Deletion 0.109 0.065 0.121 0.026 0.038 0.036 0.056 0.086 0.323 0.214

Insertion 0.057 0.034 0.043 0.024 0.017 0.026 0.043 0.047 0.160 0.132

Substitution 0.095 0.059 0.090 0.077 0.048 0.119 0.115 0.132 0.348 0.387

Total 0.261 0.158 0.253 0.127 0.103 0.181 0.214 0.266 0.831 0.732

Page 28: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

28

© 2010 The MITRE Corporation. All rights reserved

November I→E: Error Type Proportions by Word Class

Pronouns Verbs Nouns Other TotalTo English

To Arabic

To English

To Arabic

To English

To Arabic

To English

To Arabic

To English

To Arabic

Deletion 0.112 0.039 0.102 0.022 0.046 0.031 0.075 0.053 0.334 0.144

Insertion 0.047 0.028 0.039 0.035 0.007 0.039 0.039 0.079 0.131 0.182

Substitution 0.102 0.024 0.087 0.072 0.052 0.103 0.081 0.153 0.323 0.352

Total 0.261 0.092 0.228 0.129 0.105 0.173 0.195 0.284 0.789 0.678

Total June 0.261 0.158 0.253 0.127 0.103 0.181 0.214 0.266 0.831 0.732

Page 29: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

29

© 2010 The MITRE Corporation. All rights reserved

I→E: Other Error Proportions

June 2008 November 2008

Error Type To English To Arabic To English To Arabic

Word Order 0.139 0.170 0.171 0.166

Pro-V Complex 0.013 0.007 0.003 0.000

Verb Person n/a 0.090 n/a 0.155

Polarity 0.009 0.002 0.017 0

Speech Act 0.006 0 0.019 0

Untranslated 0.001 0 0.001 0

Total 0.169 0.269 0.211 0.321

Page 30: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

30

© 2010 The MITRE Corporation. All rights reserved

Error Frequencies and BLEU Scores

System

June TER Errors

NovemberTERErrors

June Non-null Errors*

NovemberNon-null Errors*

June BLEU Scores

NovemberBLEU Scores

A 292 269 176 /1.81 180 /1.67 .469 .516 B 355 354 240 /2.58 223 /2.06 .446 .471 C 287 279 166 /1.71 175 /1.64 .484 .502 D 291 229 189 /1.94 146 /1.35 .475 .500

I to

E

A 353 225 179 /1.64 134 /1.44 .341 .363 B 408 222 203 /1.86 132 /1.42 .305 .327 C 246 144 116 /1.06 87 /0.94 .339 .378 D 233 221 115 /1.05 104 /1.12 .325 .369

E to

I

*raw frequency/normalized per input

Page 31: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

31

© 2010 The MITRE Corporation. All rights reserved

Conclusions

Linguistic differences will always challenge translation systems

Some differences are difficult even for high frequency expressions like the copula– The need to insert lexemes not present in the source– Or to remove lexemes that are present in the source– These are characteristics of multiword expressions

Discourse context is needed for deictic elements like pronouns– Iraqi Arabic speakers know whether the speaker is referring to

“I” or “you” from the context– Knowing whether to translate Arabic “he” or “she” as “it”

requires knowledge of the referent of the pronoun

Page 32: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

32

© 2010 The MITRE Corporation. All rights reserved

Future Work

Compute relative weight of error types– Compare to human judgments collected by NIST– Compute regression tests

Compare July 2007 with November 2008 translations Additional subcategories of errors

Page 33: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

33

© 2010 The MITRE Corporation. All rights reserved

Word Sense Ambiguities

June 2008– I -> E averaged .021– E -> I averaged .032

These are low compared to Vilar et al. (2006) After analysis of November E ->1 “Other” errors,

annotators were more sensitive to broader class of word sense errors– November E ->1 is about 10%– Comparable to Vilar et al. (2006)

November I -> E word sense analysis is incomplete

Page 34: Evaluation of Machine Translation Errors in English and Iraqi Arabic

© 2008 The MITRE Corporation. All rights reserved

34

© 2010 The MITRE Corporation. All rights reserved

Inter-Annotator Reliability

English annotation performed by 3 native speakers– June 2008 annotated independently– November 2008 each annotated twice and differences

resolved 3 Arabic annotators

– 2 non-native speakers and 1 native speaker– Half annotated by each non-native speaker– All annotations reviewed by native speaker– Differences resolved