i evaluation of free online machine translations for croatian-english and english-croatian language...

24
I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, [email protected] University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia Marija Brkić, [email protected] University of Rijeka, Department of Informatics, Croatia Vlasta Kučiš, [email protected] University of Maribor, Department of Translation Studies, Slovenia FF Zagreb – Informacijske znanosti

Upload: chad-parker

Post on 16-Dec-2015

224 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Evaluation of Free Online Machine Translations

for Croatian-English and English-Croatian Language Pairs

Sanja Seljan, [email protected] of Zagreb - Faculty of Humanities and Social Sciences,

Department of Information Sciences, Croatia 

Marija Brkić, [email protected] of Rijeka, Department of Informatics, Croatia

Vlasta Kučiš, [email protected] of Maribor, Department of Translation Studies, Slovenia

FF Zagreb – Informacijske znanosti

Page 2: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Aim

Text evaluation from four domains (city description, law, football, monitors)

Cro-Eng - by four free online translation services (Google Translate, Stars21, InterTran and Translation Guide)

En- Croatian - by Google Translate Measuring of inter-rater agreement (Fleiss kappa) influence of error types on the criteria of fluency and

adequacy Pearson’s correlation

Page 3: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

I. Introduction

II. MT evaluation

III. Experimental study Translation tools Test set description Evaluation Error analysis Correlations

IV. Conclusion

Page 4: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

I INTRODUCTION

increased use of online in recent years, even among less widely spoken languages

Desirable: moderate to good quality translations

evaluation from the user's perspective

Tools and evaluation mainly for widely spoken languages

Possible use: gisting translations, information retrieval, i.e. question-answering systems

1976 Systran - first MT for the Commission of the European Communities + online tool + different versions

1997 - first online translation tool - Babel Fish using Systran technology

Important: realistic expectations

Page 5: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Studies for popular languages

Considerable difference in the quality of translation dependent on the language pair

2010 - German-French (GT, ProMT, WorldLingo)

2011- three popular online tools

2006 - Spanish-English (introductory textbook)

2008 – 13 languages into English (6 tools: BabelFish, Google Translate, ProMT, SDL free translator, Systran, World Lingo)

Page 6: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

MT evaluation – important in research and product design measure system performance identify weak points and adjust parameter settings language independent algorithms (BLEU, NIST) Better metric – closer to human evaluation

need for qualitative evaluation of different linguistic phenomena

Page 7: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

II EXPERIMENTAL STUDY

evaluation of free online translation services (FTS) – from user’s perspective

undergraduate and graduate students of languages, linguistics and information sciences attending courses on language technologies at the University of Zagreb, Faculty of Humanities and Social Science

Test set description texts 4 domains (city description, law, football, monitors) Cca 7-9 sentence per domain (17.8 word/ sent.) Cro-En, En-Cro

Page 8: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

EvaluatorsCro-En: 48 students, final year of undergraduate and graduate levelsEn-Cro: 50 students, native speakers75% of students attended language technology course(s)

3 3.57

012345

Croatian in general

Average grades for free language resources on the Internet

Evaluation – before pilot study

Page 9: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Croatian tools/resources Tools/ resources in general

3.683.9

3.022.7

0

1

2

3

4

5

Average

Systran Google Translate

InterTran Translation Guide

3.142.49 2.45

3.54

0

1

2

3

4

5

Average

Online dictionariesTerminology databases Translation memories Google Translate

Page 10: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Desirable tools/ resources of appropriate quality

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

online dictionaries

glossaries

term bases

translation memories

MT systems

speech-to-text system

Page 11: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Evaluation

Manual evaluation fluency (indicating how much the translation is fluent in the target language) adequacy (indicating how much of the information is adequately transmitted)

evaluation enriched by translation errors analysis

−morphological errors,

−untranslated words

−lexical errors and word omissions

−syntactic errors

Page 12: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Tools

Cro-En translations Google Translate (GT) - http://translate.google.com Stars21 (S21) - http://stars21.com/translator InterTran (IT) -

http://transdict.com/translators/intertran.html Translation Guide (TG) - http://www.translation-

guide.com

En-Cro translations obtained from Google Translate

Page 13: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Google Translate translation service provided by Google Inc.statistical MT based on huge amount of corporaIt supports 57 languages, Croatian since 2008

S21 service powered by GTtranslations not always the same

InterTranpowered by NeuroTran and WordTransentence-by-sentence and word-by-word

Translation Guidepowered by ITDifferent translations

Page 14: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Results - Cro-En

either low grades (TG and IT) or high grades (S21 and GT), in comparison to the average value (3.04)

S21(4.66) : GT (4.62) – city description, legal GT – football, monitors Best average result – legal domain, then monitors and football Lowest – city description (the most free in style)

1

2

3

4

5

City Law Football Monitors

Stars21 Google Translate

InterTran Translation Guide

Page 15: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Results - Cro-En

- En-Cro - lower average results than the reverse direction: football (3.75 : 4.84), law, monitors

- Higher average grade in city description (shorter sentences, mostly nominative constructions, frequent terms)

- Football domain - specific terms, non-nominative constructions

Page 16: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Error analysis

En-CroTranslations offered by GT and S21 are very similar, although not identicalTG and IT – difference in number of untranslated wordsTG does not recognize words with diacritics

Cro-Enthe highest number of lexical errors, including also errors in style (av. 2.44 ) Untranslated words (1.83), morphological (1.75), syntactic errors (1.38)Lowest score, highest number of errors - football domain (mostly lexical errors and untranslated words)best score – in city description domain (lexcial errors)Lowest no. errors – legal domain (evenly distributed)

Page 17: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Morphological errors – mostly in domain of monitors, the smallest no. in city desription (dominant value 1)

Untranslated words - by far mostly in the football

translation grades - mostly influenced by untranslated words

Dominant values Morphological errors: 1 in city description and monitors, 3

in the legal and football Lexical errors: 1 in city description , others higher untranslated words - 1 in all domains syntactic errors - 1 in all domains but football (2-3)

Page 18: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Pearson’s correlation

smaller number of errors augments the average gradecorrelation between errors types and the criteria of fluency and adequacy

fluency - more affected by the increase of lexical and syntactic errors,

adequacy is more affected by untranslated words

Page 19: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Fleiss' kappa

for assessing the reliability of agreement among raters when giving ratings to the sentences

Indicating extent to which the observed amount of agreement among raters exceeds what would be expected if all the raters made their ratings completely randomly.

Score - between 0 and 1 (perfect agreement)

0.0-0.20 slight agreement N – total of subjects 0.21-0.40 fair agreement n – no. of raters per subject 0.41-0.60 moderate agreement i – extent to which raters

0.61-0.80 substantial agreement agree on i-subject 0.81-1.00 almost perfect agreement j - categories

N

i

k

jij Nnn

nNnP

1 1

2 )()1(

1

Page 20: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

relatively high level of the agreement among raters per domain and per system in Cro-En translations moderate 0.41-0.60 (for IT translation service), substantial agreement 0.61-0.80 (S21 and GT) perfect agreement 0.80-1.00 (TG – the worst tool)

En-Cro translations - inter-rater agreement per domain lowest level of agreement has been detected in the

domains of football and law (from 0.4-0.49 fair & moderate) – larger and more complex sentences

substantial agreement (0.61-0.80) – in city description level of inter-rater agreement is lower for En-Cro

translations in all domains

Page 21: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Conclusion

evaluation study of MT in 4 domains Cro-En – 4 free online translation services En-Cro translations – by Google Translate

Evaluator’s profile high interest in use of translation resources and tools Critical evaluation

System evaluation perfect agreement in the ranking of TG as the worst translation

service substantial agreement is achieved for S21 and GT services moderate agreement is shown for IT, which has performed

slightly better than TG.

Page 22: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Cro-En translations S21 and GT ( 4.63 to 4.84) - football, law and monitors city description - Cro-En lower than in En-Cro

En-Cro direction – by GTlower grades than in the opposite direction (specific terms, non-nominative constructions, multi-word units)Except city description domain - containing mostly nominative constructions, frequent words, no specific terms

Error analysis translation grades are mostly influenced by untranslated words (especially the criteria of adequacy)morphological and syntactic errors reflect grades in smaller proportion (fluency) ,

Page 23: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Google Translate serviceused in both translation directionsharvesting data from the Web, seems to be well trained and suitable for the translation of frequent expressionsDoesn’t perform well where language information is needed, e.g. gender agreement, in MW expressions

Further research Better quantitavie analysis per domainmore detailed analysis of specific language phenomena

Page 24: I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr

I

Evaluation of Free Online Machine Translations

for Croatian-English and English-Croatian Language Pairs

Sanja Seljan, [email protected] of Zagreb - Faculty of Humanities and Social Sciences,

Department of Information Sciences, Croatia 

Marija Brkić, [email protected] of Rijeka, Department of Informatics, Croatia

Vlasta Kučiš, [email protected] of Maribor, Department of Translation Studies, Slovenia

FF Zagreb – Informacijske znanosti