towards the use of linguistic information in automatic mt evaluation metrics
DESCRIPTION
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics. Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz. Outline. Introduction State of the Art Discussion of MT Evaluation Metrics Hypothesis & Objective Methodology & Schedule. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/1.jpg)
Towards the Use of Linguistic Information in Automatic MT Evaluation
Metrics
Projecte de Tesi
Elisabet Comelles
Directores Irene Castellon i Victoria Arranz
![Page 2: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/2.jpg)
Outline
• Introduction
• State of the Art
• Discussion of MT Evaluation Metrics
• Hypothesis & Objective
• Methodology & Schedule
![Page 3: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/3.jpg)
Introduction
• Quickly access to Multilingual Information
• Need for quick translation
• High increase of MT Systems
• Need for evaluation of those MT Systems
• Evaluation needs to be quick and reliable
![Page 4: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/4.jpg)
Introduction
• Current and most used Evaluation Metrics show problems
• New approaches to Evaluation using linguistic information:– Syntactic info
– Semantic info
• Our scenario:– Comparisson between already existing systems
– Direction of translation to test: English-Spanish
![Page 5: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/5.jpg)
State of the Art
• MT absolutely linked to MT Evaluation
• Purpose of the evaluation methods:– Error analysis– System comparisson
• Chronologically:1. Human MT Evaluation
2. Automatic MT Evaluation
![Page 6: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/6.jpg)
State of the ArtTypes of MT Evaluation
• Focused on Context: – Context-based Evaluation (FEMTI)
• Evaluates suitability of the MT Technology & the MT System for the user’s purpose
• Parameters of analysis: functionality, reliability, usabiility, efficiency, maintainability, portability, cost, etc.
• Focused on Quantitiy & Quality: – Human Evaluation and Automatic Evaluation
![Page 7: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/7.jpg)
State of the ArtTypes of MT Evaluation
• Human Evaluation:– Several approaches:
• Fidelity (ALPAC report)• Intelligibility (ALPAC report)• Comprehensive evaluation of informativeness
(ARPA)• Quality panel evaluation• Adequacy and Fluency (Semantics and Syntax)• Preferred Translation• Required Post-Editing
![Page 8: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/8.jpg)
State of the ArtTypes of MT Evaluation
• Human Evaluation:– Advantage: human evaluators can evaluate
the overall qualitiy of the system– Disadvantages:
• Time-consuming• Expensive• Subjective
![Page 9: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/9.jpg)
State of the ArtTypes of MT Evaluation
• Automatic Evaluation:– Approaches:
• Based on Lexical Matching• Based on Syntax• Based on Semantics
![Page 10: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/10.jpg)
State of the ArtTypes of MT Evaluation
• Based on Lexical Matching:– Dominant approach to Automatic MT
Evaluation– Seeks for lexical similarities between MT
output and reference translations– Types:
• Edit Distance Measures (WER)• Precision-oriented Measures (BLEU)• Recall-oriented Measures (ROUGE)• Measure balancing Precision & Recall (GTM)
![Page 11: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/11.jpg)
State of the ArtTypes of MT Evaluation
• Based on Syntax– Recently developed– Focused on the syntax of the output sentence– Types:
• Constituency Parsing• Dependency Parsing• Combination of both analyses (Liu & Gildea 2005)
![Page 12: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/12.jpg)
State of the ArtTypes of MT Evaluation
• Based on Semantics:– Recently developed– Focused on the semantics of the output level– Types:
• NEs: Quality over NEs (NEE)• Semantic Roles: Similarities over Semantic Roles
(SR)
![Page 13: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/13.jpg)
Discussion of MT evaluation Metrics
• Human Evaluation:– Advantatges:
• Allow to evaluate overall quality
– Disadvantatges:• Time-consuming• Expensive• Subjective
![Page 14: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/14.jpg)
Discussion of MT Evaluation Metrics
• Automatic Evaluation:– Advantages:
• Fast• Not expensive• Objective• Updatable
– Disadvantages?
![Page 15: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/15.jpg)
Discussion of MT Evaluation Metrics
• Automatic Metrics based on Lexical Matching:– Great advance in MT Research in the last decade– Widely accepted & used by the SMT research
community– BLEU is the most used Automatic Metric– Criticized by those not developing SMT systems– Usually depend on translation references– Only take into account lexical similarities &
disregard syntax– Biased
![Page 16: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/16.jpg)
Discussion of MT Evaluation Metrics
• Automatic Metrics based on Syntax:– Good improvement– Works at sentence level– Only focused on Syntax– What about meaning?
• Automatic metrics based on Semantics:– Good improvement– Only NEs & Semantic Roles– NEs not too relevant– Need further development– Only focused on meaning, what about syntax?
![Page 17: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/17.jpg)
Discussion of MT Evaluation Metrics
• Discussion of Automatic Metrics:– Each metric focuses on a partial aspect of
qualityStrongly biased evaluationsUnfair comparisson between systemsOvertuning of the system
− Need for integration of metrics• Parametric vs. Non-parametric• Evaluation of the quality of a metric combination
Human likeness Human acceptability
![Page 18: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/18.jpg)
Hypothesis & Objective
• Hypothesis:Adding new linguistic information will improve
the performance of Automatic Metrics
• Main Objective:Proposing a new Automatic Evaluation Metric
based on linguistic information.
![Page 19: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/19.jpg)
Hypothesis & Objective
• Secondary Objectives:– Explore linguistic information:
• Syntactic info: POS, shallow parsing, chunking, full parsing, dependency parsing, constituency parsing, etc.
• Semantic info: Semantic Roles, semantic features, Wordnet, Framenet, Lexical Semantics, etc.
– Look for linguistic resources appropriate to be computationally processed
– Look for linguistic resources publicly available– Explore the appropriate way to combine this
information
![Page 20: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/20.jpg)
Methodology & Schedule
• 4 stages:– Stage 1 (year 1 & 2):
• Bibliography research and analysis:– Detailed exploration and analysis of Automatic
Evaluation Metrics– Detailed exploration, analysis and selection of the
adequate linguistic information.– Exploration of the feasibility and availability of the
linguistic resources needed
– Stage 2 (year 1 & 2):• Selection of the Corpus of evaluation
![Page 21: Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56814745550346895db48226/html5/thumbnails/21.jpg)
Methodology & Schedule
– Stage 3 (year 3):• Experiments on how to combine this linguistic
information and the automatic evaluation metrics• Evaluation of our metric combination based on
either likeness or acceptability.
– Stage 4 (year 4):• Analysis & discussion of the results obtained• Summary of the findings and reflection on the
results obtained• Proposal of a new evaluation metric