rules based machine translation
DESCRIPTION
Rules Based Machine Translation. Fred Hollowood. Consultant. Sample Agenda. Introduction. 1. Rules Based Machine Translation. 2. Post-Editing. 3. Quality Measurement. 4. Controlled Language. 5. Introduction. The Aim - PowerPoint PPT PresentationTRANSCRIPT
1
Rules Based Machine Translation
Fred HollowoodConsultant
RBMT and CL
Sample Agenda
RBMT and CL 2
Introduction1
Rules Based Machine Translation2
Post-Editing3
Quality Measurement4
Controlled Language5
IntroductionThe Aim
Bring rapid, cost-effective translation to Symantec’s product and service divisions
Connect Symantec’s CMS to translation technologiesMetrics on the reduction of translation costs and time to market
The ApproachStructure source content so it accommodates MT
Use a language checker to monitor source grammar
Promote terminology as a key process and deliverableProactive rather than reactive
Define measures to monitor and drive productivityGTM, Meteor, BLEU
Work with post-editors to ensure a win-win
RBMT and CL 3
Technology Initiative - The Aim
Rules Based Machine Translation
RBMT and CL 4
SL Text
Analysis
SL Lexicon &Grammars
Transfer
SL->TL Lexical &Structural Rules
Synthesis
TL Text
TL Lexicon &Grammars
Flowchart of Rule-Based Machine Translation (RBMT)
MT Process Overview
RBMT and CL 5
Controlled Language Authoring
Automated Pre-processing
User Dictionary
Translation System
Normalisation Dictionary
Automated Post-processing
Human Post-Editing
Systran Engine
Remote Human ActivitySystem Control PhasesText Processing
Post-Editing
Fundamentally same relationship as with traditional vendorIncreased daily throughput expected for Post Edited content (6-8k Vs 2.5k p/day)Style requirements have been critically reviewed in the light of PE
E.g. stylistic inconsistencies are acceptable for post-edited content
RBMT and CL 6
RBMT and CL 7
Measurement
Metrics based on Comprehensibility
RBMT and CL 8
Score CriteriaExcellent MT output (E) (4)
Read the MT output first. Then read the Source Text (ST). Your understanding of the MT output is not improved by the reading of the ST because the MT output is satisfactory and would not need to be modified. An end-user who does not have access to the ST would be able to understand the MT output.
Good MT output (G) (3)
Read the MT output first. Then read the source text.Your understanding of the MT output is not improved by the reading of the ST even though the MT output contains minor grammatical mistakes. An end-user who does not have access to the source text could possibly understand the MT output.
Medium MT output (M) (2)
Read the MT output first. Then read the source text. Your understanding of the MT output is improved by the reading of the ST, due to significant errors in the MT output. An end-user who does not have access to the source text could only get the gist of the MT output.
Poor MT output (P) (1)
Read the MT output first. Then read the source text. Your understanding only derives from the reading of the ST, as you could not understand the MT output. An end-user who does not have access to the source text would not be able to understand the MT output at all.
Quality by Human Inspection
RBMT and CL 9
Hamlet Language Analysis TK 1 - 6
50
98
276
144
122
145
216
111112
141
230
107
41
102
190
267
0
50
100
150
200
250
300
Poor Medium Good Excellent
SpanishItalianGermanFrench
GTM Scoring
RBMT and CL 10
From the machine
From the post-editor
Quality Metrics by Language
RBMT and CL 11
Hamlet GTM Results
13.96%
18.92%20.07%
14.69%
2.93%
18.37%
1.11%0.64% 0.29%
6.39%
2.62%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00
FrenchSpanish
Hamlet GTM Results
0.64%0.29%
1.11%
2.62%
6.39%
13.96%
18.92%
20.07%
14.69%
2.93%
18.37%
0.60% 0.28%1.20%
2.97%
8.87%
18.37%
24.19%
19.54%
12.06%
2.35%
9.57%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00
FrenchSpanish
Hamlet GTM Results
0.64%0.29%
1.11%
2.62%
6.39%
13.96%
18.92%
20.07%
14.69%
2.93%
18.37%
0.60% 0.28%
1.20%
2.97%
8.87%
18.37%
24.19%
19.54%
12.06%
2.35%
9.57%
0.97% 0.86%
3.33%
9.42%
17.49%
22.31%
18.82%
12.55%
6.88%
1.25%
6.12%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00
FrenchSpanishItalian
Hamlet GTM Results
0.64%0.29%
1.11%
2.62%
6.39%
13.96%
18.92%
20.07%
14.69%
2.93%
18.37%
0.60% 0.28%
1.20%
2.97%
8.87%
18.37%
24.19%
19.54%
12.06%
2.35%
9.57%
0.97% 0.86%
3.33%
9.42%
17.49%
22.31%
18.82%
12.55%
6.88%
1.25%
6.12%
2.97%2.24%
5.67%
10.85%
16.29%
18.53%
16.14%
11.80%
6.30%
0.99%
8.21%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-0.99 01:00
FrenchSpanishItalianGerman
Project Scores by LanguageFrench: 73% Spanish: 68%Italian: 59%
German:57%
Example Style rulesAvoid using a colon after a drive letter
Avoid “he”, “she”, “he/she”, and “s/he”
Use numerals for all measurements over 10
Use the serial comma
Do not use more than two adverbs or adjectives in a series
Keep the subject and verb close to each other early in a sentenceAvoid meaningless openers
Avoid progressive tense when describing product use
Do not use future when describing product use
Make positive statements that tell users what to do or what they need to know
Use sentence-style capitalization for bulleted lists
Use a colon at the end of a sentence to introduce a bulleted list
Punctuate imperative sentences in bulleted lists
Use number × number
Use a hyphen in a unit
Repeat the unit of measure
RBMT and CL 12
CL rules based on CDGAvoid using the passive voice
Do not use more than 25 words in a sentence (original recommendation was 20)
Use relative pronouns
Use complementizers (“that”)
Avoid unnecessary words (such as “basic” or “just”)
Do not use 'this' or 'that' when they are not followed by a noun
Place all non-translatable text on its own line (programming code snippets)
RBMT and CL 13
CL rules for MTDo not use slashes to list lexical items
Do not write the full name of each operating system
Avoid –ing words
Use a noun at the start of subordinate clause
Repeat the head noun in ambiguous coordinated structures
Use a hyphen to indicate the first part of a compound
Use articles in specific contexts (for disambiguation)
Keep both parts of a two-part verb together
Use "could" with "if“
Avoid parenthetical expressions in the middle of a sentence
RBMT and CL 14
Examples of CL ViolationKeep both parts of a two-part verb together
This document gives directions to turn email scanning on or off. Dieses Dokument gibt Richtungen zum Umdrehung E-Mail-Prüfung
an oder weg. Ce document donne des directions à l'analyse du courrier
électronique de tour en fonction ou hors fonction.
This document gives directions to turn on or turn off email scanning. Dieses Dokument gibt Richtungen, E-Mail-Prüfung zu aktivieren
oder zu deaktivieren. Ce document donne des directions pour activer ou désactiver
l'analyse du courrier électronique.
RBMT and CL 15
Lessons LearnedStrict implementation when there is:
New contentLittle leverageTime
Rules can be context-sensitiveDifferent results depending on client applicationMay not always flag tag problems
Language-specific rulesProbably best implemented as:
Pre-processing stepNormalization dictionaries
CL + MT is not sufficientTerminology work to update dictionariesPE when specific qualify standard is required
RBMT and CL 16
Thank you!
Copyright © 2010 FRED Hollowood CONSULTING . All rights reserved.
This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice.
RBMT and CL 17
Fred [email protected]