the assessment of translation accuracy of the lexica machine translation system
TRANSCRIPT
This article was downloaded by: [University of York]On: 16 October 2014, At: 01:51Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK
Southern African Linguistics and Applied LanguageStudiesPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/rall20
The assessment of translation accuracy of the LexicaMachine Translation SystemFPJ Snyman & JA NaudéPublished online: 12 Nov 2009.
To cite this article: FPJ Snyman & JA Naudé (2003) The assessment of translation accuracy of the LexicaMachine Translation System, Southern African Linguistics and Applied Language Studies, 21:4, 295-306, DOI:10.2989/16073610309486350
To link to this article: http://dx.doi.org/10.2989/16073610309486350
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Copyright © 2003 NISC Pty Ltd
SOUTHERN AAFRICAN LLINGUISTICS
AND AAPPLIED LLANGUAGE SSTUDIES
EISSN 1727–9461
Southern African Linguistics and Applied Language Studies 2003, 21(4): 295–306
Printed in South Africa — All rights reserved
The assessment of translation accuracy of the Lexica MachineTranslation System
FPJ Snyman1* and JA Naudé2
1 Unit for Language Facilitation and Empowerment, University of the Free State, PO Box 339,
Bloemfontein 9300, South Africa2 Department of Near Eastern Studies, University of the Free State, PO Box 339, Bloemfontein
9300, South Africa
* Corresponding author, e-mail: [email protected]
Abstract: This article focuses on the research of the English–Afrikaans language pair develop-ment team and aims to establish an assessment method and procedure that will assess the trans-lation accuracy of the Lexica Machine Translation (MT) System in an easily repeatable, scientifical-ly acceptable way. Lexica is a transfer system that is used to carry out morphological, syntactic,semantic and contextual analysis and can be used for the following language pairs: Afrikaans,Tswana, Swahili and Portuguese to English; and English to Xhosa, Zulu and Afrikaans. The researchhas shown that there are no universally accepted and reliable methods and measures, and thatassessment methodology has been the subject of much discussion in recent years. To assess theaccuracy of translation of the Lexica MT System, diagnostic assessment was determined as themost suitable mode of assessment, as the focus of such assessment is on the identification of lim-itations, errors and deficiencies, which may then be corrected or improved by the development team.A method and procedure were developed according to which marks were awarded in terms of thefollowing two aspects: (a) preservation of meaning, (b) grammatical correctness.
Introduction
All over South Africa thousands of documents
on various themes are produced each day,
mostly in English. The Constitution of the
Republic of South Africa, 1996, Section 6(1),
provides for 11 official languages in terms of the
language dispensation of South Africa, namely
Pedi, Sotho, Tswana, Swati, Venda, Tsonga,
Afrikaans, English, Ndebele, Xhosa and Zulu
(Constitutional Assembly, 1997: 3). As the
Constitution provides for 11 official languages,
the practical implication is that all documents
produced, particularly in government depart-
ments, must be translated into languages
understandable to the recipients. In the
2000/2001 Annual Report of the National
Language Service Subdirectorate, it is stated
that the Subdirectorate’s African Languages
Section translated 665 documents in-house,
whereas 477 documents were outsourced. The
English/Afrikaans Section translated 2 740.41
pages in-house and outsourced 878.91 pages
(Department of Arts, Culture, Science and
Technology, 2000/2001: 82–84). It is thus evi-
dent that of the 1 142 documents that were
submitted to their African Languages Section
for translation, a total of 477 documents had to
be outsourced. In the English/Afrikaans
Section, out of a total of 10 684.33 pages, 7
703.92 pages were translated and edited in-
house, whereas 2 980.41 pages were out-
sourced (Department of Arts, Culture, Science
and Technology, 2000/2001: 84).
From these statistics the conclusion can be
drawn that the translation capacity available to
deal with the need for translations on an in-
house basis is insufficient. A practical method
of dealing with this problematic situation would
be the use of a machine translation system to
produce documents in the various official lan-
guages. From the very outset it should be noted
that such a machine translation system is not,
in any way, aimed at replacing the human
translator. However, such a system could assist
the human translator in his/her translation task,
thereby increasing and enhancing the transla-
tion speed and ultimately the editing process.
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Snyman and Naudé296
As a first step towards the implementation
of such a machine translation system, the Unit
for Language Facilitation and Empowerment
(ULFE) at the University of the Free State
entered into a co-operation agreement with
EPI-USE Systems to take over the rights of the
latter’s uniquely South African machine transla-
tion device, Lexica 3.0. Lexica is a transfer sys-
tem that is used to carry out morphological,
syntactic, semantic and contextual analysis
and can be used for the following language
pairs: Afrikaans, Tswana, Swahili and
Portuguese to English; and English to Xhosa,
Zulu and Afrikaans. The Lexica development
team was established in 1990 by Prof. Deon
Oosthuizen of the Department of Computer
Science at the University of Pretoria. From
1992, the Lexica system was commercially
developed and implemented in collaboration
with EPI-USE Systems (Pty) Ltd. EPI-USE
Systems discontinued the further development
of the Lexica software in 1997. Since no further
development of any of the existing language
pairs, nor of any new language pairs, was
undertaken, this machine translation system
has been lying dormant since then. The ulti-
mate goal of the further development of the
Lexica system is to make it possible to utilise it
within the Language Unit of the Free State
Provincial Government, in order to enhance the
translation process of reports, minutes and
agendas within this department.
This article focuses on the work of the
English–Afrikaans language pair development
team, and aims to establish an assessment
method and procedure that will make it possible
to assess the current translation accuracy of the
Lexica Machine Translation (MT) System in an
easily repeatable, scientifically acceptable way.
The article covers the general software
characteristics and level of processing of the
Lexica MT System; the international standard-
ised categories for MT systems and the Lexica
MT System; the problem of accuracy assess-
ment regarding MT in general, and specifically
the Lexica MT System; and the determination
of the method of assessment of the Lexica MT
System, namely that of diagnostic assessment.
General software characteristics and
level of processing of the Lexica MT
System
The Lexica MT System is based on a unique
transfer rule language that captures syntactic,
semantic, morphological and contextual infor-
mation in a single uniform notation by means
of an unrestricted set of features, which are
altogether definable by the grammar develop-
er. This generic notation makes it possible to
create a language-independent translation
device which relies on various rule bases,
each with its own set of language-dependent
information. The advantages of this approach
are twofold. Firstly, it allows for languages from
different language families, e.g. the Germanic
and African language families, to be accom-
modated by the system. Secondly, the ability to
utilise the different types of information (syn-
tactic, semantic etc.) simultaneously, helps the
developer to resolve, more effectively, some of
the most familiar difficulties encountered in
machine translation, such as ambiguity
(Oosthuizen & Coetzer, s.a.: 1).
Lexica consists of three separate components
(Figure 1):
• A totally configurable graphical user inter-
face (which operates in conjunction with MS
Word) (GUI).
• A platform-independent MT kernel (MTK).
The kernel is written in C and operates in
Windows, NT and UNIX, and over an LAN.
• A transfer grammar definition language
(TDL).
The transfer grammar definition component
consists of six parts:
• A language information section
• A morphological section
• A syntactic analysis section
• A transfer section
• A lexicon
• A phrase dictionary
Each of the sections, except the analysis
and transfer sections, resides in a separate
physical file (Oosthuizen & Coetzer, s.a.: 33).
International standardised cate-
gories for MT systems and Lexica
Since Lexica is a unique transfer system, it is
necessary to determine where it fits into inter-
national standardised categories for MT sys-
tems. Every MT system embodies some theory
of language and of translation in accordance
with international standardised categories for
MT systems in the following manner:
• The simplest MT system performs direct
replacement of terms and phrases from the
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Southern African Linguistics and Applied Language Studies 2003, 21: 295–306 297
source language to the target language.
Often rudimentary word order changes are
performed. Example-based systems
(EBMT) are one type belonging to this
class; they replace phrases or even whole
paragraphs at a time.
• More sophisticated MT systems try to
improve syntactic (grammatical) quality by
analysing the source sentence into a syntax
tree and then converting the tree into the
form required by the target syntax (for
example, by moving the verb complex).
Although such systems require the building
of grammars and parsers, they produce
higher quality. They are called syntactic
transfer systems.
• At the next consecutive level of complexity,
semantic transfer systems analyse the
source text into some kind of formalism that
tries to capture meaning, not just grammat-
ical form. The formalisms used by shallow
semantic systems are not fully language-
independent and hence require some trans-
formations into the target form. The most
complex systems analyse the input into a
language-neutral interlingual formalism,
from which many target languages can be
directly generated. No wide-coverage inter-
lingua has yet been developed.
In general, the more sophisticated the inter-
nal processing, the higher the output quality,
but at the same time, the more domain-specific
and brittle the system will be. Most modern
working systems include a blend of syntactic
and semantic transfer (The ISLE Classification
of Machine Translation Evaluations, 2000). In
the above categorisation of MT systems, the
Lexica system can be defined as a blend
between the second and third group, since it is
used to carry out morphological, syntactic,
semantic and contextual analysis.
The problem of accuracy assessment
regarding MT
It is hardly an overstatement to assert that the
demand for machine translation systems is
high and that such systems constitute a contin-
uously growing industry, as is evident from
Hutchins’ Compendium of Translation Software
which contains a list of commercial machine
translation systems and computer-aided trans-
lation support tools (Hutchins & Hartmann,
2002). Sales of commercial PC translation soft-
ware have shown a dramatic rise. It is estimat-
ed that there are now some 1 000 different MT
packages on sale (if each language pair is
counted separately). The products of one ven-
dor (Globalink) are present in at least 6 000
stores in North America alone; while in Japan
one particular system (Korya Eiwa from
Catena, for English-Japanese translation) is
said to have sold over 100 000 copies in its first
year on the market (Hutchins, 1999).
However, there are many people who, hav-
ing heard of the completion of a machine trans-
lation system, mistakenly interpret this to mean
that, regardless of the input, 100 percent accu-
rate translations are possible. What is possibly
even worse, is the fact that some people who
make enquiries as to how accurately an auto-
���
���
���
��� ���� �� ��� ������� ������ ��
������
����� ����� �!���" �!���"�������� �� �!���"���#��#��"�
Figure 1: A graphical representation of the Lexica Machine Translation system
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Snyman and Naudé298
matic system can translate, appear to be satis-
fied with a numerical answer such as “70 per-
cent” (Nagao, 1989: 57). One can therefore
agree with Nagao (1989: 57) that a query such
as: “How well does that machine translate?” is
actually quite a profound one.
House (1997: 1) states that different views
concerning translation lead to different con-
cepts of translation quality, and hence different
ways of assessing such quality. The approach
to translation will determine the method of
assessment; and this will inevitably differ from
approach to approach. In order to arrive at a
coherent method for the assessment of the
Lexica MT System, different methods and pro-
cedures were investigated, in order to develop
a method which fulfils the specific needs in
respect of the assessment of the Lexica MT
System. While there is general agreement
about the basic features of machine translation
(MT) assessment, as reflected in several gen-
eral introductory texts (for example Lehrberger
& Bourbeau, 1988; Hutchins & Somers, 1992;
Arnold et al., 1994), the research has shown
that there are no universally accepted and reli-
able methods and measures, and that assess-
ment methodology has been the subject of
much discussion in recent years (see, for
example, Arnold et al., 1993; Falkedal, 1994;
Hutchins, 1997: 418). A great number of stud-
ies on MT assessment have been carried out
over the past four decades. Hovy et al. (2002:
8) refer to a variety of MT assessments, rang-
ing from the influential ALPAC Report (Pierce et
al., 1966) to the largest-ever competitive MT
assessments, funded by the US Defence
Advanced Research Projects Agency (DARPA)
(White et al., 1994) and beyond. Some influen-
tial contributions include those of Kay (1980)
and Nagao (1989). Van Slype (1979) produced
a thorough study reviewing MT assessment at
the end of the 1970s, while reviews for the
1980s can be found in Lehrberger and
Bourbeau (1988) and King and Falkedal
(1990). The Association for Machine
Translation in the Americas (AMTA) also held a
workshop in San Diego in 1992, conducting dis-
cussions on the topic of MT Evaluation: Basis
for Future Directions (AMTA, 1992). In terms of
MT assessments in general, a distinction is
often drawn between so-called glass-box
assessment and black-box assessment. This
classification sometimes appears to differenti-
ate between component-based assessment
and whole-system assessment, and some-
times to presuppose a less clear-cut difference
between a qualitative/descriptive approach
(How does it do what it does?) and a quantita-
tive/analytic approach (How well does it do
what it does?) (Hirschman & Thompson, 1997:
410). According to Nyberg et al. (1994), the
assessment methodologies for MT systems
have heretofore centred on black-box
approaches, in terms of which global properties
of the system are assessed, such as the
semantic fidelity of the translation or the com-
prehensibility of the target language output.
While these assessments are extremely impor-
tant, they should be augmented by detailed
error analyses, as well as component assess-
ments, in order to produce causal analyses
pinpointing errors and therefore leading to sys-
tem improvement.
Against this background, the Lexica
research and development team, like many
other developers of MT systems, were con-
fronted with the lack of a standard method for
the assessment of translation accuracy. The
development of such a standard method and
procedure for the Lexica MT System per se is
deemed necessary, since the current percent-
age of translation accuracy will serve as a
benchmark against which any future develop-
ments and enhancements of the English–
Afrikaans language pair can be measured. The
first objective of the research and development
team was thus the development of a method
and procedure to assess translations carried
out by means of the Lexica MT System; and the
second objective was the assessment of a
selection of texts to determine the current accu-
racy of translation.
Determining the assessment method
for the Lexica MT System
To develop the proper method of assessment
specifically for the Lexica MT System, the
developers had to keep the reasons for assess-
ment in mind. Nyberg et al. (1994) list five pos-
sible reasons for the assessment of MT, i.e.:
• Comparison with human translations.
• The decision to use or buy a particular MT
system.
• Comparison of multiple MT systems.
• Tracking of technological progress.
• Improvement of a particular system.
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Southern African Linguistics and Applied Language Studies 2003, 21: 295–306 299
The fifth factor listed above, i.e. the
improvement of a particular system, is the area
in which component analysis and error attribu-
tion are most valuable. System engineers and
linguistic knowledge source maintainers (such
as lexicographers) perform best when given a
causal analysis of each error. As the prime
focus of the Lexica MT developers is to improve
the system, it was decided to develop an
assessment method with the ultimate aim of
improving the system.
The focus of the assessment method devel-
oped in this paper deviated from that of other
assessment methods which make provision for
categories dealing with textual and content
aspects, structuring and formulation, as well as
for a broad category focusing on punctuation
and on lexical, syntactic and stylistic aspects.
The UNISA System, for example, is used at the
University of South Africa and several other uni-
versities for the training of students, both
undergraduate and postgraduate. This system
is utilised to assess the translations of students.
This marking system is outcome-based, which
implies that the outcome which the trainer
envisages for the student is assessed (Botha,
2001: 48). The categories of this assessment
model include accuracy of translation; vocabu-
lary, idiom, and register; cohesion, coherence
and organisation; and technical points (presen-
tation, grammar, spelling, punctuation etc.).
Another model developed by Botha (2001:
111–112) serves as a checklist for in-house
training to achieve translation quality. This
model adds categories dealing with the func-
tionalism of translation. The categories of this
assessment instrument include textual aspects,
content aspects, as well as aspects dealing
with the building up, formulation and presenta-
tion of the translation. Another assessment
method is that of Juliane House (1997). House
(1997: 39–40) analyses both the source and
target texts in terms of “dimensions of language
user” and “dimensions of language use”. In
terms of the language user dimension, she con-
siders aspects such as geographical origin and
social class. In terms of the dimension of lan-
guage use aspects, she considers, inter alia,
aspects such as the medium used (i.e. whether
the language use is simple or complex); partic-
ipation between speaker and hearer; and the
social role relationship.
The above systems, together with several
other methods, are most valuable in translation
quality assessment; but the envisaged objec-
tive of the development of the translation
assessment for the Lexica MT System was of a
totally different kind. Whereas these assess-
ment systems provide for categories concern-
ing textual and content aspects, structuring and
formulation, as well as for a broad category
concerning punctuation, lexical, syntactic and
stylistic aspects, the focus of the assessment
method discussed in this paper falls on cate-
gories within the grammatical level. In the
development of this particular assessment
method, the three kinds of assessment distin-
guished by Hirschman and Thompson (1997:
409) were considered.
• Adequacy assessment
Adequacy assessment is the determination
of the fitness of a system for a purpose —
will it do what is required? How well? At
what cost? — etc. Typically, for a prospec-
tive user, such assessment may or may not
be comparative, and a considerable amount
of work may be required to identify a user’s
needs. One model of this type of assess-
ment is typified by consumer organisations
which publish the results of tests on, for
example, cars or appliances, and identify
the best buys for certain price-performance
targets. Such assessment can also be
referred to as evaluation or evaluation proper.
• Diagnostic assessment
Diagnostic assessment is the production of
a system performance profile in respect of
some kind of taxonomy of the space of pos-
sible inputs. It is typically used by system
developers, but is sometimes offered to
end-users as well. It usually requires the
construction of a large and, hopefully, rep-
resentative test suite. It is sometimes
referred to as diagnosis, or by the software
engineering term regression testing when it
is used to compare two generations of the
same system (Hirschman & Thompson,
1997: 410).
• Performance assessment
Performance assessment is the measure-
ment of system performance in one or more
specific areas. It is typically used to com-
pare like with like, whether in respect of two
alternative implementations of a technology,
or successive generations of the same
implementation. It is typically created for
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Snyman and Naudé300
system developers and/or R&D (Research
and Development) programme managers.
Hutchins (1997: 418) summarises these
three types of assessment as follows: adequa-
cy assessment, to determine the fitness of MT
systems within a specified operational context;
diagnostic assessment, to identify limitations,
errors and deficiencies, which may then be cor-
rected or improved (by the research team or by
the developers); and performance assessment,
to assess stages of system development or dif-
ferent technical implementation.
In determining the method of assessment
best suited to the Lexica MT System, certain
underlying objectives had to be considered.
The objective of the development of an assess-
ment method for the Lexica MT System was to
determine the current translation accuracy of
the system. As the system had been lying dor-
mant since 1997, the development of an
assessment method would assist the develop-
ment team of the English-to-Afrikaans lan-
guage pair to determine the current quality of
translation. Translations that had been carried
out with the aid of the system rendered transla-
tions containing major morphological and syn-
tactic errors. In order to raise the quality of
translation, these grammatical errors needed to
be rectified through the further development of
the rule database. The primary objective of this
assessment was not adequacy assessment,
i.e. determination of the fitness of the system;
whether it will do what is required; how well;
and at what cost. Neither was the objective to
measure the performance of the system in one
or more specific areas, i.e. performance
assessment. Rather, the primary objective of
this assessment was to identify limitations,
errors and deficiencies of the system on a
grammatical level, i.e. diagnostic assessment.
In the assessment of each individual sentence,
morphology, along with sentence and phrase
structure, is thus of importance. The errors
which are indicated in terms thereof are punc-
tuation, lexical, syntactic and stylistic errors.
After the assessment, the different grammatical
errors that were made by the system must be
analysed. The final output of the assessment
must comprise a list of systematic morphologi-
cal, syntactic and lexical errors which can be
referred to for enhancement of the dictionary
file, as well as for the development of the rule
database.
Diagnostic assessment as a method
of assessment
In the development of an assessment method
for the Lexica MT System, a method and pro-
cedure taken from Bohan et al. (2000) were
adapted and developed. Bohan et al. (2000)
distinguish between two aspects in their
method of assessment, namely preservation of
meaning and grammatical correctness. In the
assessment method discussed in this paper,
these two aspects were assessed in terms of
an N-point scale, according to which marks
were given to both the aspect of preservation of
meaning and that of grammatical correctness.
The translation quality of each sentence is
assessed separately.
The objective of the assessment of the
preservation of meaning was to determine
whether meaning was preserved in the TL text
or whether it was lost, as well as the number of
cases in which meaning was lost. The objective
was thus to determine the accuracy of the
transfer of information.
In terms of grammatical correctness, the TL
texts were evaluated to identify grammatical
errors. The focus was thus on morphology and
syntax. Morphology refers to the degree to
which words are correctly inflected (for exam-
ple, to indicate tense, number, gender, case,
aspect etc.) (The ISLE Classification of Machine
Translation Evaluations, 2000). Syntax refers to
the degree of correctness of the phrase and
sentence structure. Errors which were identified
included punctuation errors, as well as lexical,
syntactic and stylistic errors. Punctuation errors
obviously refer to punctuation. Lexical errors
refer to words or phrases that are inappropriate,
either because they have inappropriate conno-
tations or an inappropriate register, or constitute
errors in terms of collocations or idiomatic
expressions, or because they are too general or
too specific (although in cases where meaning
or nuance is lost, the error is considered to be a
translation error) (The ISLE Classification of
Machine Translation Evaluations, 2000).
Syntactic errors refer to errors of word order
occurring in a sentence or a phrase. Stylistic
errors denote errors such as unnecessary repe-
tition of a word or idea, or an excessively literal
translation of the source text, resulting in a
translation that is “unidiomatic or difficult to
understand” (The ISLE Classification of
Machine Translation Evaluations, 2000).
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Southern African Linguistics and Applied Language Studies 2003, 21: 295–306 301
It is clear that grammatical correctness and
the preservation of meaning are intertwined to
a great extent. However, grammatical errors do
not necessarily lead to a loss of meaning. For
example, if the MT System makes a punctua-
tion error, for instance by leaving out a comma
in the TL text, meaning will not necessarily be
lost as a result. Likewise, if a spelling error
occurs in the TL text, meaning will not neces-
sarily be lost.
According to this method each sentence is
rated on a 10-point scale in terms of: (a) preser-
vation of meaning; (b) grammatical correct-
ness.
(a) Preservation of meaning
Is the meaning of the TL sentence the same
as that of the SL sentence?
7–10 points (Good): the meanings of the
SL and TL sentences are about the same.
Almost no post-editing with regard to mean-
ing is necessary.
4–6 points (Understandable): the mean-
ings of the SL and TL sentences are not
exactly the same, but the sentence can be
understood. The sentence may have to be
retranslated during post-editing.
0–3 points (Bad): the sentence cannot be
understood at all, or has a completely dif-
ferent meaning from that of the SL text.
Retranslation is definitely necessary.
Points for preservation of meaning were
given in terms of Table 1. For every single
instance of meaning that is lost, 1 point is
deducted.
(b) Grammatical correctness
Is the TL sentence syntactically well-formed,
and does it include correct morphology?
7–10 points (Good): the sentence is gram-
matical. Post-editing would only entail sim-
ple stylistic corrections.
4–6 points (Understandable): despite
grammatical errors, the sentence can be
understood. Post-editing of the sentence
would include grammatical corrections.
0–3 points (Bad): the sentence contains
massive grammatical errors and can hardly
be understood. Post-editing would entail
complete rewriting/ retranslation of the sen-
tence.
For the assessment of grammatical correct-
ness the following schema was followed. The
schema is based on the number of corrections,
replacements, movements and/or deletions
that need to be made in order to render the
translation grammatically. For every grammati-
cal error encountered, 1 point is deducted.
Points for grammatical correctness were
given in terms of Table 2. For every single com-
ponent of meaning that is lost, 1 point is
deducted.
Procedure of MT assessment
Genre and text selection
The type of document translated can greatly
affect the output of an MT system. For exam-
ple, input to the METEO System is specific and
very restricted, mainly comprising weather fore-
cast texts, entailing the use of a limited lexicon
and particular syntactic constructions. As a
result the system produces accurate output
which is comparable to human translation. In
contrast, MT of arbitrary texts invariably pro-
duces output of much lower quality. Both the
genre and the application domain determine
the quality (The ISLE Classification for
Language Engineering, 2000).
Table 1: Points given for preservation of meaning
7–10 points (Good) 10 points Meaning is the same
9 points Meaning is lost in 1 case
8 points Meaning is lost in 2 cases
7 points Meaning is lost in 3 cases
4–6 points (Understandable) 6 points Meaning is lost in 4 cases
5 points Meaning is lost in 5 cases
4 points Meaning is lost in 6 cases
0–3 points (Bad) 3 points Meaning is lost in 7 cases
2 points Meaning is lost in 8 cases
1 point Meaning is lost in 9 cases
0 points Meaning is lost in 10 or more cases
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Snyman and Naudé302
As the type of input document can greatly
affect the output of the assessment, the genre
selected for this assessment was that of infor-
mative texts. Informative texts, also called fac-
tual texts, are defined in terms of their exclusive
purpose of rendering information, based on fac-
tual knowledge. Opinions, emotions and feel-
ings are usually absent from such texts, result-
ing in a relatively neutral body of text. The style
of a factual or an informative text is usually neu-
tral; this is the type of text that is not read for
pleasure, but rather to gain knowledge. The
newspaper article is probably the most typical
type of text in this category. The function there-
of is the transfer of meaning, factuality and the
progressive unlocking of information.
As examples of informative texts, Botha
(2001: 96) discusses, amongst others, newspa-
per articles, reports, newsletters, brochures,
minutes, curricula vitae, weekly weather
reports, memorandums, textbooks, etc. Botha
(2001: 97) describes the systemic characteris-
tics of informative texts as follows:
The style variations, in terms of language
use, are few, and usually centre on the attitude
of the speaker. The style is not prescribed and
the choice thereof is left to the autonomous
judgement of the writer. Throughout, prefer-
ence is given to a neutral style. Indicative trans-
mission of facts occurs and the ornamental
dimensions of language are under-utilised.
The test material was chosen in accor-
dance with the ultimate goal of the develop-
ment of the Lexica MT System, i.e. to use the
system within the Language Unit of the Free
State Provincial Government for the translation
of informative texts such as newspaper articles,
reports, minutes, agendas, memorandums, etc.
With this objective in mind, the following three
genres of informative texts were utilised in the
assessment:
• Newspaper articles
• Minutes of meetings
• Reports
As already indicated, each sentence was
assessed separately. The number of sentences
which were used in the assessment is as fol-
lows:
• Newspaper articles: 336 sentences
• Minutes of meetings: 259 sentences
• Reports: 272 sentences
The corpus of sentences might seem to be
rather limited, but for this particular application
of this method of assessment, the corpus of
sentences was large enough to detect system-
atic grammatical errors. The objective envis-
aged for this method of assessment is to apply
it to a larger number of texts in an easily repeat-
able, scientific way. As this only comprised the
first round of the assessment, with the objective
of detecting grammatical errors which are
repeated to a great extent, this corpus of sen-
tences provided a large list of systematic
errors, which can be used specifically for the
further development of the rule database and
dictionary file of the English-to-Afrikaans lan-
guage pair.
Preparation of SL texts
In their evaluation, Bohan et al. (2000) mea-
sured the performance of the MT system with
minimal user involvement (i.e. neither prior
adaptation of bad texts nor lexical coding of
unknown words). As the objective of the
assessment of the Lexica MT System was to
determine the current translation accuracy,
Table 2: Points given for grammatical correctness
7–10 points (Good) 10 points No grammatical errors
9 points 1 grammatical error
8 points 2 grammatical errors
7 points 3 grammatical errors
4–6 points (Understandable) 6 points 4 grammatical errors
5 points 5 grammatical errors
4 points 6 grammatical errors
0–3 points (Bad) 3 points 7 grammatical errors
2 points 8 grammatical errors
1 point 9 grammatical errors
0 points 10 or more grammatical errors
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Southern African Linguistics and Applied Language Studies 2003, 21: 295–306 303
which could serve as a benchmark for future
developments, only minimal user involvement
occurred for the purposes of this assessment.
The user interface (GUI) of the Lexica MT
System only accepts documents in text file for-
mat (.txt). As the assessment documents were
either in hypertext mark-up language (.html) or
Microsoft Word Document (.doc) format, the
documents were transferred to text format (.txt)
before the translation process was initiated. For
easier assessment the translated texts were
allocated into two columns.
Choice of assessors in the assess-
ment of TL texts
After the texts had been translated with the aid
of the Lexica MT System, the different texts
(source and target texts) were sent to the dif-
ferent assessors. In the choice of assessors to
be used in the assessment, the aspect of sub-
jectivity had to be taken into consideration. King
(1993: 267) indeed states that the greatest
weakness of such assessment tests in which
different test subjects are used, is their subjec-
tivity. Different subjects can vary widely in their
ratings. In order to limit subjectivity as far as
possible, a decision was made to use five dif-
ferent assessors in the assessment of the
Lexica MT System. Of these five assessors,
four are accredited translators of the South
African Translators’ Institute (SATI).
Assessed output
As the assessment was rated on a 10-point
value scale, a mark out of 10 was awarded to
each sentence. These marks were reworked to
a percentage out of 100. This percentage pro-
vided a benchmark of the current accuracy of
translation, in terms of preservation of meaning
and grammatical correctness, for the specified
texts which were translated. The output
assessments carried out by the five assessors
determining the accuracy of translation in terms
of preservation of meaning and grammatical
correctness, are summarised in Table 3.
The average assessment given by the four
assessors was as follows:
Preservation of meaning: 60%
Grammatical correctness: 45%
A typical example of a text translat-
ed by Lexica
Table 4 provides a limited selection from a
report as an example from the list of texts
(which included newspaper articles, minutes
and reports) which were utilised in the assess-
ment of the current translation accuracy of the
Lexica MT System.
Analysis of systematic errors
If Table 4 is considered, some of the systemat-
ic errors that occur can clearly be seen. This
section will merely list and briefly discuss some
errors that occur in Table 4. The systematising
of errors illustrated below, is the procedure
which will be followed in the systematising of all
errors for future research.
No new matters arose from this item. /
Geen het opgerys nuwe sake daarvan item.
In this sentence, on the syntactic level, a
word-order error occurs in the TL. The subject
nuwe sake follows the verb het opgerys. A lex-
ical error also occurs, since the SL verb arose
is translated with an incorrect lexical item in the
TL, namely het opgerys. A stylistic error is
observed in the too-literal translation of from in
the SL, which is rendered, together with this
(SL), as daarvan in the TL.
The agenda for the meeting was approved with the
addition of the following: /
Die agenda vir die vergadering is met die aan-
vulling van die volgende goedgekeur:
In this sentence the SL word addition is
translated in the TL as aanvulling, which is a
Table 3: Output assessment carried out by the different assessors
Panel total Preservation of meaning Grammatical correctness
Assessor 1 49 33
Assessor 2 66 55
Assessor 3 79 48
Assessor 4 48 43
Assessor 5 60 48
Translation Accuracy: General Domain 60 45
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Snyman and Naudé304
lexical error, since an inappropriate translation
equivalent has been chosen. Despite the fact
that information is still conveyed, an inappropri-
ate word has been chosen in the TL.
Progress report /
Vooruitgang vermeld
Firstly, in this example, a lexical error
occurs, since Progress is translated as
Vooruitgang. An inappropriate register has
been chosen. Secondly, the SL word report is a
common noun, but is translated in the TL as
vermeld, which is a verb. Thus, in this case,
incorrect information is conveyed and meaning
is therefore not correctly preserved.
Future research
The immediate future research of the Lexica
research and development team will be focused
on systematising all the errors that were made
during the translation of the different texts. The
systematised list of errors will then be used to
start working on the improvement of the dictio-
nary file and the rule database file of the Lexica
system. After the development and enhance-
ment of the different files — which will not only
include an update of the dictionary file, but also
substantial work on the rule database file, for
example work dealing with word-order errors,
passive voice errors etc. — new texts from the
same domain will be translated and assessed
Table 4: The original English text (source text) with the subsequent Afrikaans translation (target text)
English text
A quorum was not present, but as the previous meet-
ing was adjourned due to the absence of a quorum,
the committee decided to proceed with the meeting.
Approval of agenda
The agenda for the meeting was approved with the
addition of the following:
Dates and Formats for Meetings was removed from
the agenda.
Videoconference was changed to 6.5.3. 6.5.4
Progress report
Introduction
The achievements of the MIDP during this quarter
could surely be measured against the Budget
Speech of the Premier of the Free State who pledged
all activities of the MIDP as targeted for 2000 as part
of what she expected to be implemented to achieve
her vision for the year 2000.
This acceptance was also reflected in the Annual
Report of the Department of the Premier of which
three full pages were occupied by the achievements
of the MIDP in 1999.
No new matters arose from this item.
The following achievements of the quarter under dis-
cussion can be highlighted:
Afrikaans translation
‘n Kworum was nie aanwesig nie, maardie vorige
vergadering weens uitstel is die afwesigheid van ‘n
kworum, die komitee wat besluit word om met die
vergadering te voortgaan.
Goedkeuring van agenda
Die agenda vir die vergadering is met die aanvulling
van die volgende goedgekeur:
Datums en Formate vir Vergaderings is van die
agenda verwyder.
Videoconference is verander na 6.5.3. 6.5.4
Vooruitgang vermeld
Inleiding
Die prestasies van die MIDP tydens hierdiekon sek-
erlik wees gemeet teen die Begroting Toespraak van
die Premier van die Vry Staat wat soos alle aktiwiteite
van die MIDP pledged targeted vir 2000 soos deel
van wat sy verwag om te wees geïmplementeer om
haar visie vir die jaar uit te voer 2000.
Hierdie aanvaarding is ook reflekteer in die
Jaarverslag van die Departement van die Premier
waaraan drie vol bladsye is deur die prestasies van
die MIDP beset in 1999.
Geen het opgerys nuwe sake daarvan item.
Die volgende prestasies van die kwart onder
bespreking kan uitgelig word:
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Southern African Linguistics and Applied Language Studies 2003, 21: 295–306 305
AMTA. 1992. MT evaluation: basis for futuredirections. In: Proceedings of a Workshopheld in San Diego. November 1992. SanDiego, California. Washington, DC:Association for Machine Translation in theAmericas (AMTA).
Arnold D, Humphreys RL & Sadler L (eds).1993. Special issue on evaluation of MTSystems. Machine Translation 8(1–2):1–126.
Arnold D, Balkan L, Meier S, Humphreys R L& Sadler L. 1994. Machine Translation: AnIntroductory Guide. Manchester, Oxford:NCC/Blackwell.
Bohan N, Breidt E & Volk M. 2000. Evaluatingtranslation quality as input to product devel-opment. In: Proceedings of 2nd InternationalConference on Language Resources andEvaluation, Athens, Greece. Available at:h t t p : / / w w w . i f i . u n i z h . c h / c l / v o l k /papers/LREC2000.pdf
Botha S. 2001. Die Ontwikkeling van ‘nKwaliteitsassesseringsinstrument vir Plaas-
like Regeringstekste. MA-minithesis.Universiteit van die Vrystaat, Bloemfontein.
Constitutional Assembly. 1997. AnnotatedVersion. Wynberg, Cape Town:Constitutional Assembly
Department of Arts, Culture, Science andTechnology. 2000/2001. Annual Report.Available at: http://www.dac.gov.za/reports/annual_report/annual_report2000_2001.pdf
Falkedal K. (ed). 1994. Proceedings of theEvaluators’ Forum 1991, Les Rases, Vaud,Switzerland. Geneva: ISSCO. Available at:http://www.issco.unige.ch/publications/workshop.html
Hirschman L & Thompson HS. 1997.Overview of evaluation in speech and nat-ural language processing. In: Cole R,Mariani J, Uszkoreit H, Varile GB, ZaenenA, Zampolli A & Zue V (eds) Survey of theState of the Art in Human LanguageTechnology. Web Edition: CambridgeUniversity Press & Giardini. pp. 409–414.Available at: http://cslu.cse.ogi.edu/HLTsurvey/
according to the same procedure, in order to
determine the degree of improvement of the
system in terms of the preservation of meaning
and grammatical correctness.
Conclusion
The objective of this article was to establish an
assessment method and procedure to assess
the translation accuracy of the Lexica MT
System. To assess the system’s accuracy, a
diagnostic mode of assessment was chosen,
since the focus of this type of assessment is on
the identification of limitations, errors and defi-
ciencies, which may then be corrected or
improved by the development team.
With regard to the type of document used in
the assessment, it was decided that newspaper
articles, minutes of meetings and reports would
be used, as the objective of the development of
the Lexica MT System is to provide the Free
State Provincial Government with a tool specif-
ically aimed at enhancing and facilitating the
translation of minutes and reports.
The method used in the assessment of the
English–Afrikaans language pair was divided
into two categories, namely preservation of
meaning and grammatical correctness. In
terms of the preservation of meaning an aver-
age translation accuracy of 60% was achieved,
whereas the average for grammatical correct-
ness was a mere 45%.
The results of the assessment clearly indi-
cated that much work needs to be done in order
to raise the level of accuracy of translation. The
ultimate objective of the Lexica research and
development team is to deliver a system that
effectuates translation from any official South
African language into any other official South
African language with at least a ninety percent
translation accuracy. A fully functional transla-
tion system for all the languages of South Africa
would vastly increase the productivity and effi-
ciency of translators. It would also provide gen-
erators of reports, statements, study guides
and other documents with easily obtainable
multilingual sources. This mission will not, how-
ever, be achieved within a short timeframe and
will require several years of research and
development. The ULFE is currently focusing
on the development of the English–Afrikaans
language pair, as well as the redevelopment of
user interfaces. It is envisaged that a complete-
ly new system, using some of the functionalities
of the Lexica MT System, will be the end result
of this work and research.
References
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4
Snyman and Naudé306
House J. 1997. Translation QualityAssessment. A Model Revisited. Tübingen:Gunter Narr Verlag.
Hovy E, King M & Popescu-Belis A. 2002.Computer-aided specification of qualitymodels for machine translation evaluation.LREC 2002 Workshop on MachineTranslation Evaluation: Human EvaluatorsMeet Automated Metrics. Las Palmas deGran Canaria, Spain. pp. 1–7. Available at:http://andreipb.free.fr/textes/eh-mk-apb-lrec-02.pdf
Hutchins J. 1997. Evaluation of machinetranslation and translation tools. In: Cole R,Mariani J, Uszkoreit H, Varile GB, ZaenenA, Zampolli A & Zue V (eds) Survey of theState of the Art in Human LanguageTechnology. Web Edition: CambridgeUniversity Press & Giardini. pp. 418–420.Available at: http://cslu.cse.ogi.edu/HLTsurvey/
Hutchins J. 1999. The development and use ofmachine translation systems and computer-based translation tools. InternationalSymposium on Machine Translation andComputer Language InformationProcessing. 26–28 June 1999, Beijing,China. Available at: http://www.foreign-word.com/Technology/art/Hutchins/hutchins99.htm
Hutchins WJ & Hartmann W. 2002.Compendium of Translation Software.Commercial Machine Translation Systemsand Computer-Aided Translation SupportTools. (5th edn). Available at: http://our-world.compuserve.com/homepages/WJHutchins/Compendium-5.pdf
Hutchins WJ & Somers HL. 1992. AnIntroduction to Machine Translation.London: Academic Press.
Kay M. 1980. The proper place of men andmachines in language translation. XEROXPARC Research Report CSL-80-11. PaloAlto, CA: Xerox Parc.
King M. 1993. Forum: Evaluation of MTSystems. In: Nirenburg S (ed) Progress inMachine Translation. Amsterdam: IOSPress. pp. 267–282.
King M & Falkedal K. 1990. Using test suitesin evaluation of MT systems. In: Karlgren H(ed) Coling-90: Papers Presented to the13th International Conference onComputational Linguistics. Volume 2.Helsinki, Finland. pp. 211–216.
Lehrberger J & Bourbeau L. 1988. Machinetranslation: linguistic characteristics of MTsystems and general methodology of evalu-ation. Lingvisticæ Investigationes Suppl.15. Amsterdam, Philadelphia: JohnBenjamins.
Nagao M. 1989. Machine Translation. How FarCan It Go? Oxford: Oxford University Press.
Nyberg EH, Mitamura T & Carbonell JG.1994. Evaluation metrics for knowledge-based machine translation. Proceedings ofColing 94. Available at: http://www.lti.cs.cmu.edu/Research/Kant/PDF/evaluate.pdf
Oosthuizen GD & Coetzer T. s.a. A feature-based approach to translation of African andEuropean languages. Unpublished report.
Pierce J R, Carroll J B, Hamp E P, Hays D G,Hockett C F, Dettinger A G & Perlis A.1966. Computers in translation and linguis-tics. ALPAC Report 1416, National Academyof Sciences/National Research Council.
The Constitution of the Republic of SouthAfrica. 1996. Annotated Version. Wynberg,Cape Town: Constitutional Assembly.
The ISLE Classification of MachineTranslation Evaluations. InternationalStandards for Language Engineering(ISLE). Draft 1, October 2000.http://www.isi.edu/natural-language/mteval
Van Slype G. 1979. Critical methods for evalu-ating the quality of machine translation.Report BR 19142. European Commission/Directorate for General Scientific andTechnical Information Management (DGXIII). Available at: http://issco-www.unige.ch/projects/isle/van-slype.pdf
White JS, O’Connell T & O’Mara FE. 1994.The ARPA MT Evaluation Methodologies:Evolution, Lessons and FurtherApproaches. In: Technology Partnershipsfor Crossing the Language Barrier:Proceedings of the First Conference of theAssociation for Machine Translation in theAmericas. Columbia, Md. 1994,pp.193–205.
Dow
nloa
ded
by [
Uni
vers
ity o
f Y
ork]
at 0
1:51
16
Oct
ober
201
4