esr7 carolina scarton - expert summer school - malaga 2015

Post on 16-Apr-2017

195 Views

Category:

Data & Analytics

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Finding Ways to Assess Machine Translated Documents for Document-level Quality Prediction

Carolina Scarton c.scarton@sheffield.ac.uk

Supervisor: Dr Lucia Specia

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

2

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

3

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

4

Introduction

Quality estimation (QE) of machine translations

– quality predictions for new, unseen machine translated texts

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

5

Introduction

Quality estimation (QE) of machine translations

– quality predictions for new, unseen machine translated texts

– use of machine learning techniques – only few labelled data points

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

6

Introduction

Quality estimation (QE) of machine translations

– quality predictions for new, unseen machine translated texts

– use of machine learning techniques – only few labelled data points

– different from BLEU-style metrics – QE does not rely on reference translations

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

7

Introduction

Open problems:

– Granularity level?• Word-level• Sentence-level• Document-level

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

8

Introduction

Open problems:

– Granularity level?• Word-level• Sentence-level• Document-level

– Which are the best features?• Linguistic features have been explored: but not

much on discourse features!

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

9

Introduction

Open problems:

– Granularity level?• Word-level• Sentence-level• Document-level

– Which are the best features?• Linguistic features have been explored: but not

much on discourse features!

– Which are the best quality labels?• Likert• HTER• BLEU-style

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

10

Target documents

Source documents

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

11

Target documents

Feature extractor

Source documents

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

12

Target documents

Features for QE

Feature extractor

Source documents

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

13

Target documents

Features for QE

Feature extractor

Source documents

QE model training

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

14

Target documents

Features for QE

Feature extractor

Source documents

Quality labels Likert HTER BLEU ...

QE model training

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

15

Target documents

Features for QE

Feature extractor

QE modelSource documents

Quality labels Likert HTER BLEU ...

QE model training

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

16

Target documents

Features for QE

Feature extractor

QE model

Predictions

Source documents

Quality labels Likert HTER BLEU ...

QE model training

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

17

Target documents

Features for QE

Feature extractor

QE model

Predictions

Source documents

Quality labels Likert HTER BLEU ...

QE model training

Defining the ideal quality label for document-level prediction is a

challenge

Introduction

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

18

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

19

Quality Estimation Framework

QuEst (www.quest.dcs.shef.ac.uk)

– Framework for sentence-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

20

Quality Estimation Framework

QuEst (www.quest.dcs.shef.ac.uk)

– Framework for sentence-level QE

– QuEst++ → recent extension for word and document levels • https://github.com/ghpaetzold/questplusplus

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

21

Quality Estimation Framework

QuEst (www.quest.dcs.shef.ac.uk)

– Framework for sentence-level QE

– QuEst++ → recent extension for word and document levels • https://github.com/ghpaetzold/questplusplus

– Feature Extraction module (Java)

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

22

Quality Estimation Framework

QuEst (www.quest.dcs.shef.ac.uk)

– Framework for sentence-level QE

– QuEst++ → recent extension for word and document levels • https://github.com/ghpaetzold/questplusplus

– Feature Extraction module (Java)

– Machine Learning module (Python)

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

23

Quality Estimation Framework

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

24

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

25

Related Work

Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

26

Related Work

Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores

Scarton and Specia (2014)– Document-level QE prediction using discourse features – also

predicted BLEU scores

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

27

Related Work

Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores

Scarton and Specia (2014)– Document-level QE prediction using discourse features – also

predicted BLEU scores

Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

28

Related Work

Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores

Scarton and Specia (2014)– Document-level QE prediction using discourse features – also

predicted BLEU scores

Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!

Meyer and Weber (2013)– Implicit discourse connectives in MT

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

29

Related Work

Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores

Scarton and Specia (2014)– Document-level QE prediction using discourse features – also

predicted BLEU scores

Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!

Meyer and Weber (2013)– Implicit discourse connectives in MT

Li et al. (2014)– Discourse connectives → improve MT → correlations between

discourse connectives and HTER

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

30

Related Work

Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores

Scarton and Specia (2014)– Document-level QE prediction using discourse features – also

predicted BLEU scores

Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!

Meyer and Weber (2013)– Implicit discourse connectives in MT

Li et al. (2014)– Discourse connectives → improve MT → correlations between

discourse connectives and HTER

Guzmán et al. (2014)– Document-level evaluation metric using RST

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

31

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

32

Quality Label problem

Quality labels are a challenge:

– Which is the ideal quality label for document-level QE?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

33

Quality Label problem

Quality labels are a challenge:

– Which is the ideal quality label for document-level QE?

– How can we assess documents?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

34

Quality Label problem

Quality labels are a challenge:

– Which is the ideal quality label for document-level QE?

– How can we assess documents?

• Sentence-level scores aggregation?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

35

Quality Label problem

Quality labels are a challenge:

– Which is the ideal quality label for document-level QE?

– How can we assess documents?

• Sentence-level scores aggregation?

• New assessment score of the document as a whole?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

36

Quality Label problem

Quality labels are a challenge:

– BLEU-style metrics as quality labels• LIG corpus (FR-EN) → 119 documents

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

37

Quality Label problem

Quality labels are a challenge:

– BLEU-style metrics as quality labels• WMT corpus (EN-DE) → 52 documents (1215

paragraphs)

– Low STDEV → documents have similar quality• Is it really true?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

38

Quality Label problem

Quality labels are a challenge:

– BLEU-style metrics as quality labels• WMT corpus (EN-DE) → 52 documents (1215

paragraphs)

– Low STDEV → documents have similar quality• Is it really true?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

39

Quality Label problem

Quality labels are a challenge:

– BLEU-style metrics as quality labels• WMT corpus (EN-DE) → 52 documents (1215

paragraphs)

– Low STDEV → documents have similar quality• Is it really true?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

40

Two-stage post-edition method

PE1:

– Post-edition of sentences without context

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

41

Two-stage post-edition method

PE1:

– Post-edition of sentences without context• Wir brauchen das kulturelle Fundament, aber wir haben jetzt

mehr Schriftsteller als Leser.

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

42

Two-stage post-edition method

PE1:

– Post-edition of sentences without context• Wir brauchen das kulturelle Fundament, aber wir haben jetzt

mehr Schriftsteller als Leser.

PE2:

– Post-edition of sentence with context

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

43

Two-stage post-edition method

PE1:

– Post-edition of sentences without context• Wir brauchen das kulturelle Fundament, aber wir haben jetzt

mehr Schriftsteller als Leser.

PE2:

– Post-edition of sentence with context• - St. Petersburg bietet nicht viel kulturelles Angebot, Moskau hat

viel mehr Kultur, es hat eine Grundlage. Es ist schwer fr die Kunst, sich in unserem Umfeld durchzusetzen. Wir brauchen das kulturelle Fundament, aber wir haben jetzt mehr Schriftsteller als Leser. Das ist falsch. In Europa gibt es viele neugierige Menschen, die auf Kunstausstellungen, Konzerte gehen. Hier ist diese Schicht ist dünn.

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

44

Two-stage post-edition method

Hypothesis:

– There are problems in MT outputs that can only be solved in context

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

45

Two-stage post-edition method

Hypothesis:

– There are problems in MT outputs that can only be solved in context

– Measuring the difference from PE1 to PE2

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

46

Two-stage post-edition method

Hypothesis:

– There are problems in MT outputs that can only be solved in context

– Measuring the difference from PE1 to PE2

• Isolating document-level problems

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

47

Two-stage post-edition method

Hypothesis:

– There are problems in MT outputs that can only be solved in context

– Measuring the difference from PE1 to PE2

• Isolating document-level problems

• Using the difference to create a better quality label

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

48

Two-stage post-edition method

Hypothesis:

– There are problems in MT outputs that can only be solved in context

– Measuring the difference from PE1 to PE2

• Isolating document-level problems

• Using the difference to create a better quality label

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

49

Two-stage post-edition method

Experiments:

– Data: 1215 paragraphs → WMT EN-DE corpus

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

50

Two-stage post-edition method

Experiments:

– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3

sentences (less than 8)

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

51

Two-stage post-edition method

Experiments:

– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3

sentences (less than 8)• Filter 2: Paragraphs ordered by number of

discourse phenomena (discourse connectives and pronouns)

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

52

Two-stage post-edition method

Experiments:

– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3

sentences (less than 8)• Filter 2: Paragraphs ordered by number of

discourse phenomena (discourse connectives and pronouns)

• Final data: 200 paragraphs

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

53

Two-stage post-edition method

Experiments:

– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3

sentences (less than 8)• Filter 2: Paragraphs ordered by number of

discourse phenomena (discourse connectives and pronouns)

• Final data: 200 paragraphs

– Annotators → students of “translation studies” in Saarland University, Saarbrücken, Germany

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

54

Two-stage post-edition method

Experiments:

– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3

sentences (less than 8)• Filter 2: Paragraphs ordered by number of

discourse phenomena (discourse connectives and pronouns)

• Final data: 200 paragraphs

– Annotators → students of “translation studies” in Saarland University, Saarbrücken, Germany

– 16 sets → evaluate agreement

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

55

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

56

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

57

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

58

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

59

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

60

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

61

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

62

Two-stage post-edition method

Annotator's agreement:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

63

Two-stage post-edition method

Changes from PE1 to PE2 – paragraphs perspective:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

64

Two-stage post-edition method

Changes from PE1 to PE2 – paragraphs perspective:

Document-level QE

All paragraphswere changed

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

65

Two-stage post-edition method

Paragraph changes example:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

66

Two-stage post-edition method

Paragraph changes example:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

67

Two-stage post-edition method

Paragraph changes example:

Document-level QE

Better wordchoices

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

68

Two-stage post-edition method

Paragraph changes → manual analysis

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

69

Two-stage post-edition method

Paragraph changes → manual analysis– Discourse/context changes

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

70

Two-stage post-edition method

Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

71

Two-stage post-edition method

Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes– Other changes

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

72

Two-stage post-edition method

Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes– Other changes

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

73

Two-stage post-edition method

Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes– Other changes

– Low agreement• Annotators should not made lots of stylistic

changes!

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

74

Two-stage post-edition method

Final results:

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

75

Two-stage post-edition method

Final results:

– 116 paragraphs analysed

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

76

Two-stage post-edition method

Final results:

– 116 paragraphs analysed

– Some changes → only with paragraph context

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

77

Two-stage post-edition method

Final results:

– 116 paragraphs analysed

– Some changes → only with paragraph context

– However

• How to combine the results into a quality label?

Document-level QE

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

78

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

79

Large-scale experiments

Extending the research:

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

80

Large-scale experiments

Extending the research:

– Data: ~ 1000 data points • Different language pairs• Entire documents (?)

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

81

Large-scale experiments

Extending the research:

– Data: ~ 1000 data points • Different language pairs• Entire documents (?)

– Annotators: expert annotators (familiar with post-editing)• Improving guidelines and training

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

82

Large-scale experiments

Extending the research:

– Data: ~ 1000 data points • Different language pairs• Entire documents (?)

– Annotators: expert annotators (familiar with post-editing)• Improving guidelines and training

– Evaluation: combining PE2 – PE1 with other metrics (HTER, BLEU, …)

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

83

Large-scale experiments

Extending the research:

– Data: ~ 1000 data points • Different language pairs• Entire documents (?)

– Annotators: expert annotators (familiar with post-editing)• Improving guidelines and training

– Evaluation: combining PE2 – PE1 with other metrics (HTER, BLEU, …)

– Alternative approach: • Post-editions in contexts → available• Apply PE1 (post-editing the sentences again →

without context)• PE2 – PE1 as usual

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

84

Agenda

Introduction

Quality Estimation Framework

Related Work

Document-level Quality Estimation

Quality Label problem

Two-stage post-edition experiment

Large-scale experiments

Conclusion

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

85

Conclusion

Two-stage post-edition method → promising!

– Problems that can only be solved in context

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

86

Conclusion

Two-stage post-edition method → promising!

– Problems that can only be solved in context

How to compute a quality label?

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

87

Conclusion

Two-stage post-edition method → promising!

– Problems that can only be solved in context

How to compute a quality label?

– Combine PE2-PE1 with other metrics?

– Use PE2-PE1?

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

88

Acknowledgement

Saarland University: Marcos Zampieri, Mihaela Vela, Heike Przybyl and Josef Van Genabith

Reviewers from EXPERT Workshop

Thank you!

Carolina Scarton c.scarton@sheffield.ac.uk

Supervisor: Dr Lucia Specia

EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015

top related