the assessment of translation accuracy of the lexica machine translation system

This article was downloaded by: [University of York]On: 16 October 2014, At: 01:51Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Southern African Linguistics and Applied LanguageStudiesPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/rall20

The assessment of translation accuracy of the LexicaMachine Translation SystemFPJ Snyman & JA NaudéPublished online: 12 Nov 2009.

To cite this article: FPJ Snyman & JA Naudé (2003) The assessment of translation accuracy of the LexicaMachine Translation System, Southern African Linguistics and Applied Language Studies, 21:4, 295-306, DOI:10.2989/16073610309486350

To link to this article: http://dx.doi.org/10.2989/16073610309486350

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/rall20

http://www.tandfonline.com/action/showCitFormats?doi=10.2989/16073610309486350

http://dx.doi.org/10.2989/16073610309486350

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Copyright © 2003 NISC Pty Ltd

SOUTHERN AAFRICAN LLINGUISTICS

AND AAPPLIED LLANGUAGE SSTUDIES

EISSN 1727–9461

Southern African Linguistics and Applied Language Studies 2003, 21(4): 295–306

Printed in South Africa — All rights reserved

The assessment of translation accuracy of the Lexica MachineTranslation System

FPJ Snyman1* and JA Naudé2

1 Unit for Language Facilitation and Empowerment, University of the Free State, PO Box 339,

Bloemfontein 9300, South Africa2 Department of Near Eastern Studies, University of the Free State, PO Box 339, Bloemfontein

9300, South Africa

* Corresponding author, e-mail: [email protected]

Abstract: This article focuses on the research of the English–Afrikaans language pair develop-ment team and aims to establish an assessment method and procedure that will assess the trans-lation accuracy of the Lexica Machine Translation (MT) System in an easily repeatable, scientifical-ly acceptable way. Lexica is a transfer system that is used to carry out morphological, syntactic,semantic and contextual analysis and can be used for the following language pairs: Afrikaans,Tswana, Swahili and Portuguese to English; and English to Xhosa, Zulu and Afrikaans. The researchhas shown that there are no universally accepted and reliable methods and measures, and thatassessment methodology has been the subject of much discussion in recent years. To assess theaccuracy of translation of the Lexica MT System, diagnostic assessment was determined as themost suitable mode of assessment, as the focus of such assessment is on the identification of lim-itations, errors and deficiencies, which may then be corrected or improved by the development team.A method and procedure were developed according to which marks were awarded in terms of thefollowing two aspects: (a) preservation of meaning, (b) grammatical correctness.

Introduction

All over South Africa thousands of documents

on various themes are produced each day,

mostly in English. The Constitution of the

Republic of South Africa, 1996, Section 6(1),

provides for 11 official languages in terms of the

language dispensation of South Africa, namely

Pedi, Sotho, Tswana, Swati, Venda, Tsonga,

Afrikaans, English, Ndebele, Xhosa and Zulu

(Constitutional Assembly, 1997: 3). As the

Constitution provides for 11 official languages,

the practical implication is that all documents

produced, particularly in government depart-

ments, must be translated into languages

understandable to the recipients. In the

2000/2001 Annual Report of the National

Language Service Subdirectorate, it is stated

that the Subdirectorate’s African Languages

Section translated 665 documents in-house,

whereas 477 documents were outsourced. The

English/Afrikaans Section translated 2 740.41

pages in-house and outsourced 878.91 pages

(Department of Arts, Culture, Science and

Technology, 2000/2001: 82–84). It is thus evi-

dent that of the 1 142 documents that were

submitted to their African Languages Section

for translation, a total of 477 documents had to

be outsourced. In the English/Afrikaans

Section, out of a total of 10 684.33 pages, 7

703.92 pages were translated and edited in-

house, whereas 2 980.41 pages were out-

sourced (Department of Arts, Culture, Science

and Technology, 2000/2001: 84).

From these statistics the conclusion can be

drawn that the translation capacity available to

deal with the need for translations on an in-

house basis is insufficient. A practical method

of dealing with this problematic situation would

be the use of a machine translation system to

produce documents in the various official lan-

guages. From the very outset it should be noted

that such a machine translation system is not,

in any way, aimed at replacing the human

translator. However, such a system could assist

the human translator in his/her translation task,

thereby increasing and enhancing the transla-

tion speed and ultimately the editing process.

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4

Snyman and Naudé296

As a first step towards the implementation

of such a machine translation system, the Unit

for Language Facilitation and Empowerment

(ULFE) at the University of the Free State

entered into a co-operation agreement with

EPI-USE Systems to take over the rights of the

latter’s uniquely South African machine transla-

tion device, Lexica 3.0. Lexica is a transfer sys-

tem that is used to carry out morphological,

syntactic, semantic and contextual analysis

and can be used for the following language

pairs: Afrikaans, Tswana, Swahili and

Portuguese to English; and English to Xhosa,

Zulu and Afrikaans. The Lexica development

team was established in 1990 by Prof. Deon

Oosthuizen of the Department of Computer

Science at the University of Pretoria. From

1992, the Lexica system was commercially

developed and implemented in collaboration

with EPI-USE Systems (Pty) Ltd. EPI-USE

Systems discontinued the further development

of the Lexica software in 1997. Since no further

development of any of the existing language

pairs, nor of any new language pairs, was

undertaken, this machine translation system

has been lying dormant since then. The ulti-

mate goal of the further development of the

Lexica system is to make it possible to utilise it

within the Language Unit of the Free State

Provincial Government, in order to enhance the

translation process of reports, minutes and

agendas within this department.

This article focuses on the work of the

English–Afrikaans language pair development

team, and aims to establish an assessment

method and procedure that will make it possible

to assess the current translation accuracy of the

Lexica Machine Translation (MT) System in an

easily repeatable, scientifically acceptable way.

The article covers the general software

characteristics and level of processing of the

Lexica MT System; the international standard-

ised categories for MT systems and the Lexica

MT System; the problem of accuracy assess-

ment regarding MT in general, and specifically

the Lexica MT System; and the determination

of the method of assessment of the Lexica MT

System, namely that of diagnostic assessment.

General software characteristics and

level of processing of the Lexica MT

System

The Lexica MT System is based on a unique

transfer rule language that captures syntactic,

semantic, morphological and contextual infor-

mation in a single uniform notation by means

of an unrestricted set of features, which are

altogether definable by the grammar develop-

er. This generic notation makes it possible to

create a language-independent translation

device which relies on various rule bases,

each with its own set of language-dependent

information. The advantages of this approach

are twofold. Firstly, it allows for languages from

different language families, e.g. the Germanic

and African language families, to be accom-

modated by the system. Secondly, the ability to

utilise the different types of information (syn-

tactic, semantic etc.) simultaneously, helps the

developer to resolve, more effectively, some of

the most familiar difficulties encountered in

machine translation, such as ambiguity

(Oosthuizen & Coetzer, s.a.: 1).

Lexica consists of three separate components

(Figure 1):

• A totally configurable graphical user inter-

face (which operates in conjunction with MS

Word) (GUI).

• A platform-independent MT kernel (MTK).

The kernel is written in C and operates in

Windows, NT and UNIX, and over an LAN.

• A transfer grammar definition language

(TDL).

The transfer grammar definition component

consists of six parts:

• A language information section

• A morphological section

• A syntactic analysis section

• A transfer section

• A lexicon

• A phrase dictionary

Each of the sections, except the analysis

and transfer sections, resides in a separate

physical file (Oosthuizen & Coetzer, s.a.: 33).

International standardised cate-

gories for MT systems and Lexica

Since Lexica is a unique transfer system, it is

necessary to determine where it fits into inter-

national standardised categories for MT sys-

tems. Every MT system embodies some theory

of language and of translation in accordance

with international standardised categories for

MT systems in the following manner:

• The simplest MT system performs direct

replacement of terms and phrases from the

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4

Southern African Linguistics and Applied Language Studies 2003, 21: 295–306 297

source language to the target language.

Often rudimentary word order changes are

performed. Example-based systems

(EBMT) are one type belonging to this

class; they replace phrases or even whole

paragraphs at a time.

• More sophisticated MT systems try to

improve syntactic (grammatical) quality by

analysing the source sentence into a syntax

tree and then converting the tree into the

form required by the target syntax (for

example, by moving the verb complex).

Although such systems require the building

of grammars and parsers, they produce

higher quality. They are called syntactic

transfer systems.

• At the next consecutive level of complexity,

semantic transfer systems analyse the

source text into some kind of formalism that

tries to capture meaning, not just grammat-

ical form. The formalisms used by shallow

semantic systems are not fully language-

independent and hence require some trans-

formations into the target form. The most

complex systems analyse the input into a

language-neutral interlingual formalism,

from which many target languages can be

directly generated. No wide-coverage inter-

lingua has yet been developed.

In general, the more sophisticated the inter-

nal processing, the higher the output quality,

but at the same time, the more domain-specific

and brittle the system will be. Most modern

working systems include a blend of syntactic

and semantic transfer (The ISLE Classification

of Machine Translation Evaluations, 2000). In

the above categorisation of MT systems, the

Lexica system can be defined as a blend

between the second and third group, since it is

used to carry out morphological, syntactic,

semantic and contextual analysis.

The problem of accuracy assessment

regarding MT

It is hardly an overstatement to assert that the

demand for machine translation systems is

high and that such systems constitute a contin-

uously growing industry, as is evident from

Hutchins’ Compendium of Translation Software

which contains a list of commercial machine

translation systems and computer-aided trans-

lation support tools (Hutchins & Hartmann,

2002). Sales of commercial PC translation soft-

ware have shown a dramatic rise. It is estimat-

ed that there are now some 1 000 different MT

packages on sale (if each language pair is

counted separately). The products of one ven-

dor (Globalink) are present in at least 6 000

stores in North America alone; while in Japan

one particular system (Korya Eiwa from

Catena, for English-Japanese translation) is

said to have sold over 100 000 copies in its first

year on the market (Hutchins, 1999).

However, there are many people who, hav-

ing heard of the completion of a machine trans-

lation system, mistakenly interpret this to mean

that, regardless of the input, 100 percent accu-

rate translations are possible. What is possibly

even worse, is the fact that some people who

make enquiries as to how accurately an auto-

��

��

��

��

��

�� !��" �!��"�� !��"��#��#��"�

Figure 1: A graphical representation of the Lexica Machine Translation system

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


matic system can translate, appear to be satis-

fied with a numerical answer such as “70 per-

cent” (Nagao, 1989: 57). One can therefore

agree with Nagao (1989: 57) that a query such

as: “How well does that machine translate?” is

actually quite a profound one.

House (1997: 1) states that different views

concerning translation lead to different con-

cepts of translation quality, and hence different

ways of assessing such quality. The approach

to translation will determine the method of

assessment; and this will inevitably differ from

approach to approach. In order to arrive at a

coherent method for the assessment of the

Lexica MT System, different methods and pro-

cedures were investigated, in order to develop

a method which fulfils the specific needs in

respect of the assessment of the Lexica MT

System. While there is general agreement

about the basic features of machine translation

(MT) assessment, as reflected in several gen-

eral introductory texts (for example Lehrberger

& Bourbeau, 1988; Hutchins & Somers, 1992;

Arnold et al., 1994), the research has shown

that there are no universally accepted and reli-

able methods and measures, and that assess-

ment methodology has been the subject of

much discussion in recent years (see, for

example, Arnold et al., 1993; Falkedal, 1994;

Hutchins, 1997: 418). A great number of stud-

ies on MT assessment have been carried out

over the past four decades. Hovy et al. (2002:

8) refer to a variety of MT assessments, rang-

ing from the influential ALPAC Report (Pierce et

al., 1966) to the largest-ever competitive MT

assessments, funded by the US Defence

Advanced Research Projects Agency (DARPA)

(White et al., 1994) and beyond. Some influen-

tial contributions include those of Kay (1980)

and Nagao (1989). Van Slype (1979) produced

a thorough study reviewing MT assessment at

the end of the 1970s, while reviews for the

1980s can be found in Lehrberger and

Bourbeau (1988) and King and Falkedal

(1990). The Association for Machine

Translation in the Americas (AMTA) also held a

workshop in San Diego in 1992, conducting dis-

cussions on the topic of MT Evaluation: Basis

for Future Directions (AMTA, 1992). In terms of

MT assessments in general, a distinction is

often drawn between so-called glass-box

assessment and black-box assessment. This

classification sometimes appears to differenti-

ate between component-based assessment

and whole-system assessment, and some-

times to presuppose a less clear-cut difference

between a qualitative/descriptive approach

(How does it do what it does?) and a quantita-

tive/analytic approach (How well does it do

what it does?) (Hirschman & Thompson, 1997:

410). According to Nyberg et al. (1994), the

assessment methodologies for MT systems

have heretofore centred on black-box

approaches, in terms of which global properties

of the system are assessed, such as the

semantic fidelity of the translation or the com-

prehensibility of the target language output.

While these assessments are extremely impor-

tant, they should be augmented by detailed

error analyses, as well as component assess-

ments, in order to produce causal analyses

pinpointing errors and therefore leading to sys-

tem improvement.

Against this background, the Lexica

research and development team, like many

other developers of MT systems, were con-

fronted with the lack of a standard method for

the assessment of translation accuracy. The

development of such a standard method and

procedure for the Lexica MT System per se is

deemed necessary, since the current percent-

age of translation accuracy will serve as a

benchmark against which any future develop-

ments and enhancements of the English–

Afrikaans language pair can be measured. The

first objective of the research and development

team was thus the development of a method

and procedure to assess translations carried

out by means of the Lexica MT System; and the

second objective was the assessment of a

selection of texts to determine the current accu-

racy of translation.

Determining the assessment method

for the Lexica MT System

To develop the proper method of assessment

specifically for the Lexica MT System, the

developers had to keep the reasons for assess-

ment in mind. Nyberg et al. (1994) list five pos-

sible reasons for the assessment of MT, i.e.:

• Comparison with human translations.

• The decision to use or buy a particular MT

system.

• Comparison of multiple MT systems.

• Tracking of technological progress.

• Improvement of a particular system.

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


The fifth factor listed above, i.e. the

improvement of a particular system, is the area

in which component analysis and error attribu-

tion are most valuable. System engineers and

linguistic knowledge source maintainers (such

as lexicographers) perform best when given a

causal analysis of each error. As the prime

focus of the Lexica MT developers is to improve

the system, it was decided to develop an

assessment method with the ultimate aim of

improving the system.

The focus of the assessment method devel-

oped in this paper deviated from that of other

assessment methods which make provision for

categories dealing with textual and content

aspects, structuring and formulation, as well as

for a broad category focusing on punctuation

and on lexical, syntactic and stylistic aspects.

The UNISA System, for example, is used at the

University of South Africa and several other uni-

versities for the training of students, both

undergraduate and postgraduate. This system

is utilised to assess the translations of students.

This marking system is outcome-based, which

implies that the outcome which the trainer

envisages for the student is assessed (Botha,

2001: 48). The categories of this assessment

model include accuracy of translation; vocabu-

lary, idiom, and register; cohesion, coherence

and organisation; and technical points (presen-

tation, grammar, spelling, punctuation etc.).

Another model developed by Botha (2001:

111–112) serves as a checklist for in-house

training to achieve translation quality. This

model adds categories dealing with the func-

tionalism of translation. The categories of this

assessment instrument include textual aspects,

content aspects, as well as aspects dealing

with the building up, formulation and presenta-

tion of the translation. Another assessment

method is that of Juliane House (1997). House

(1997: 39–40) analyses both the source and

target texts in terms of “dimensions of language

user” and “dimensions of language use”. In

terms of the language user dimension, she con-

siders aspects such as geographical origin and

social class. In terms of the dimension of lan-

guage use aspects, she considers, inter alia,

aspects such as the medium used (i.e. whether

the language use is simple or complex); partic-

ipation between speaker and hearer; and the

social role relationship.

The above systems, together with several

other methods, are most valuable in translation

quality assessment; but the envisaged objec-

tive of the development of the translation

assessment for the Lexica MT System was of a

totally different kind. Whereas these assess-

ment systems provide for categories concern-

ing textual and content aspects, structuring and

formulation, as well as for a broad category

concerning punctuation, lexical, syntactic and

stylistic aspects, the focus of the assessment

method discussed in this paper falls on cate-

gories within the grammatical level. In the

development of this particular assessment

method, the three kinds of assessment distin-

guished by Hirschman and Thompson (1997:

409) were considered.

• Adequacy assessment

Adequacy assessment is the determination

of the fitness of a system for a purpose —

will it do what is required? How well? At

what cost? — etc. Typically, for a prospec-

tive user, such assessment may or may not

be comparative, and a considerable amount

of work may be required to identify a user’s

needs. One model of this type of assess-

ment is typified by consumer organisations

which publish the results of tests on, for

example, cars or appliances, and identify

the best buys for certain price-performance

targets. Such assessment can also be

referred to as evaluation or evaluation proper.

• Diagnostic assessment

Diagnostic assessment is the production of

a system performance profile in respect of

some kind of taxonomy of the space of pos-

sible inputs. It is typically used by system

developers, but is sometimes offered to

end-users as well. It usually requires the

construction of a large and, hopefully, rep-

resentative test suite. It is sometimes

referred to as diagnosis, or by the software

engineering term regression testing when it

is used to compare two generations of the

same system (Hirschman & Thompson,

1997: 410).

• Performance assessment

Performance assessment is the measure-

ment of system performance in one or more

specific areas. It is typically used to com-

pare like with like, whether in respect of two

alternative implementations of a technology,

or successive generations of the same

implementation. It is typically created for

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


system developers and/or R&D (Research

and Development) programme managers.

Hutchins (1997: 418) summarises these

three types of assessment as follows: adequa-

cy assessment, to determine the fitness of MT

systems within a specified operational context;

diagnostic assessment, to identify limitations,

errors and deficiencies, which may then be cor-

rected or improved (by the research team or by

the developers); and performance assessment,

to assess stages of system development or dif-

ferent technical implementation.

In determining the method of assessment

best suited to the Lexica MT System, certain

underlying objectives had to be considered.

The objective of the development of an assess-

ment method for the Lexica MT System was to

determine the current translation accuracy of

the system. As the system had been lying dor-

mant since 1997, the development of an

assessment method would assist the develop-

ment team of the English-to-Afrikaans lan-

guage pair to determine the current quality of

translation. Translations that had been carried

out with the aid of the system rendered transla-

tions containing major morphological and syn-

tactic errors. In order to raise the quality of

translation, these grammatical errors needed to

be rectified through the further development of

the rule database. The primary objective of this

assessment was not adequacy assessment,

i.e. determination of the fitness of the system;

whether it will do what is required; how well;

and at what cost. Neither was the objective to

measure the performance of the system in one

or more specific areas, i.e. performance

assessment. Rather, the primary objective of

this assessment was to identify limitations,

errors and deficiencies of the system on a

grammatical level, i.e. diagnostic assessment.

In the assessment of each individual sentence,

morphology, along with sentence and phrase

structure, is thus of importance. The errors

which are indicated in terms thereof are punc-

tuation, lexical, syntactic and stylistic errors.

After the assessment, the different grammatical

errors that were made by the system must be

analysed. The final output of the assessment

must comprise a list of systematic morphologi-

cal, syntactic and lexical errors which can be

referred to for enhancement of the dictionary

file, as well as for the development of the rule

database.

Diagnostic assessment as a method

of assessment

In the development of an assessment method

for the Lexica MT System, a method and pro-

cedure taken from Bohan et al. (2000) were

adapted and developed. Bohan et al. (2000)

distinguish between two aspects in their

method of assessment, namely preservation of

meaning and grammatical correctness. In the

assessment method discussed in this paper,

these two aspects were assessed in terms of

an N-point scale, according to which marks

were given to both the aspect of preservation of

meaning and that of grammatical correctness.

The translation quality of each sentence is

assessed separately.

The objective of the assessment of the

preservation of meaning was to determine

whether meaning was preserved in the TL text

or whether it was lost, as well as the number of

cases in which meaning was lost. The objective

was thus to determine the accuracy of the

transfer of information.

In terms of grammatical correctness, the TL

texts were evaluated to identify grammatical

errors. The focus was thus on morphology and

syntax. Morphology refers to the degree to

which words are correctly inflected (for exam-

ple, to indicate tense, number, gender, case,

aspect etc.) (The ISLE Classification of Machine

Translation Evaluations, 2000). Syntax refers to

the degree of correctness of the phrase and

sentence structure. Errors which were identified

included punctuation errors, as well as lexical,

syntactic and stylistic errors. Punctuation errors

obviously refer to punctuation. Lexical errors

refer to words or phrases that are inappropriate,

either because they have inappropriate conno-

tations or an inappropriate register, or constitute

errors in terms of collocations or idiomatic

expressions, or because they are too general or

too specific (although in cases where meaning

or nuance is lost, the error is considered to be a

translation error) (The ISLE Classification of

Machine Translation Evaluations, 2000).

Syntactic errors refer to errors of word order

occurring in a sentence or a phrase. Stylistic

errors denote errors such as unnecessary repe-

tition of a word or idea, or an excessively literal

translation of the source text, resulting in a

translation that is “unidiomatic or difficult to

understand” (The ISLE Classification of

Machine Translation Evaluations, 2000).

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


It is clear that grammatical correctness and

the preservation of meaning are intertwined to

a great extent. However, grammatical errors do

not necessarily lead to a loss of meaning. For

example, if the MT System makes a punctua-

tion error, for instance by leaving out a comma

in the TL text, meaning will not necessarily be

lost as a result. Likewise, if a spelling error

occurs in the TL text, meaning will not neces-

sarily be lost.

According to this method each sentence is

rated on a 10-point scale in terms of: (a) preser-

vation of meaning; (b) grammatical correct-

ness.

(a) Preservation of meaning

Is the meaning of the TL sentence the same

as that of the SL sentence?

7–10 points (Good): the meanings of the

SL and TL sentences are about the same.

Almost no post-editing with regard to mean-

ing is necessary.

4–6 points (Understandable): the mean-

ings of the SL and TL sentences are not

exactly the same, but the sentence can be

understood. The sentence may have to be

retranslated during post-editing.

0–3 points (Bad): the sentence cannot be

understood at all, or has a completely dif-

ferent meaning from that of the SL text.

Retranslation is definitely necessary.

Points for preservation of meaning were

given in terms of Table 1. For every single

instance of meaning that is lost, 1 point is

deducted.

(b) Grammatical correctness

Is the TL sentence syntactically well-formed,

and does it include correct morphology?

7–10 points (Good): the sentence is gram-

matical. Post-editing would only entail sim-

ple stylistic corrections.

4–6 points (Understandable): despite

grammatical errors, the sentence can be

understood. Post-editing of the sentence

would include grammatical corrections.

0–3 points (Bad): the sentence contains

massive grammatical errors and can hardly

be understood. Post-editing would entail

complete rewriting/ retranslation of the sen-

tence.

For the assessment of grammatical correct-

ness the following schema was followed. The

schema is based on the number of corrections,

replacements, movements and/or deletions

that need to be made in order to render the

translation grammatically. For every grammati-

cal error encountered, 1 point is deducted.

Points for grammatical correctness were

given in terms of Table 2. For every single com-

ponent of meaning that is lost, 1 point is

deducted.

Procedure of MT assessment

Genre and text selection

The type of document translated can greatly

affect the output of an MT system. For exam-

ple, input to the METEO System is specific and

very restricted, mainly comprising weather fore-

cast texts, entailing the use of a limited lexicon

and particular syntactic constructions. As a

result the system produces accurate output

which is comparable to human translation. In

contrast, MT of arbitrary texts invariably pro-

duces output of much lower quality. Both the

genre and the application domain determine

the quality (The ISLE Classification for

Language Engineering, 2000).

Table 1: Points given for preservation of meaning

7–10 points (Good) 10 points Meaning is the same

9 points Meaning is lost in 1 case

8 points Meaning is lost in 2 cases


4–6 points (Understandable) 6 points Meaning is lost in 4 cases



0–3 points (Bad) 3 points Meaning is lost in 7 cases


1 point Meaning is lost in 9 cases

0 points Meaning is lost in 10 or more cases

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


As the type of input document can greatly

affect the output of the assessment, the genre

selected for this assessment was that of infor-

mative texts. Informative texts, also called fac-

tual texts, are defined in terms of their exclusive

purpose of rendering information, based on fac-

tual knowledge. Opinions, emotions and feel-

ings are usually absent from such texts, result-

ing in a relatively neutral body of text. The style

of a factual or an informative text is usually neu-

tral; this is the type of text that is not read for

pleasure, but rather to gain knowledge. The

newspaper article is probably the most typical

type of text in this category. The function there-

of is the transfer of meaning, factuality and the

progressive unlocking of information.

As examples of informative texts, Botha

(2001: 96) discusses, amongst others, newspa-

per articles, reports, newsletters, brochures,

minutes, curricula vitae, weekly weather

reports, memorandums, textbooks, etc. Botha

(2001: 97) describes the systemic characteris-

tics of informative texts as follows:

The style variations, in terms of language

use, are few, and usually centre on the attitude

of the speaker. The style is not prescribed and

the choice thereof is left to the autonomous

judgement of the writer. Throughout, prefer-

ence is given to a neutral style. Indicative trans-

mission of facts occurs and the ornamental

dimensions of language are under-utilised.

The test material was chosen in accor-

dance with the ultimate goal of the develop-

ment of the Lexica MT System, i.e. to use the

system within the Language Unit of the Free

State Provincial Government for the translation

of informative texts such as newspaper articles,

reports, minutes, agendas, memorandums, etc.

With this objective in mind, the following three

genres of informative texts were utilised in the

assessment:

• Newspaper articles

• Minutes of meetings

• Reports

As already indicated, each sentence was

assessed separately. The number of sentences

which were used in the assessment is as fol-

lows:

• Newspaper articles: 336 sentences

• Minutes of meetings: 259 sentences

• Reports: 272 sentences

The corpus of sentences might seem to be

rather limited, but for this particular application

of this method of assessment, the corpus of

sentences was large enough to detect system-

atic grammatical errors. The objective envis-

aged for this method of assessment is to apply

it to a larger number of texts in an easily repeat-

able, scientific way. As this only comprised the

first round of the assessment, with the objective

of detecting grammatical errors which are

repeated to a great extent, this corpus of sen-

tences provided a large list of systematic

errors, which can be used specifically for the

further development of the rule database and

dictionary file of the English-to-Afrikaans lan-

guage pair.

Preparation of SL texts

In their evaluation, Bohan et al. (2000) mea-

sured the performance of the MT system with

minimal user involvement (i.e. neither prior

adaptation of bad texts nor lexical coding of

unknown words). As the objective of the

assessment of the Lexica MT System was to

determine the current translation accuracy,

Table 2: Points given for grammatical correctness

7–10 points (Good) 10 points No grammatical errors

9 points 1 grammatical error

8 points 2 grammatical errors


4–6 points (Understandable) 6 points 4 grammatical errors



0–3 points (Bad) 3 points 7 grammatical errors


1 point 9 grammatical errors

0 points 10 or more grammatical errors

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


which could serve as a benchmark for future

developments, only minimal user involvement

occurred for the purposes of this assessment.

The user interface (GUI) of the Lexica MT

System only accepts documents in text file for-

mat (.txt). As the assessment documents were

either in hypertext mark-up language (.html) or

Microsoft Word Document (.doc) format, the

documents were transferred to text format (.txt)

before the translation process was initiated. For

easier assessment the translated texts were

allocated into two columns.

Choice of assessors in the assess-

ment of TL texts

After the texts had been translated with the aid

of the Lexica MT System, the different texts

(source and target texts) were sent to the dif-

ferent assessors. In the choice of assessors to

be used in the assessment, the aspect of sub-

jectivity had to be taken into consideration. King

(1993: 267) indeed states that the greatest

weakness of such assessment tests in which

different test subjects are used, is their subjec-

tivity. Different subjects can vary widely in their

ratings. In order to limit subjectivity as far as

possible, a decision was made to use five dif-

ferent assessors in the assessment of the

Lexica MT System. Of these five assessors,

four are accredited translators of the South

African Translators’ Institute (SATI).

Assessed output

As the assessment was rated on a 10-point

value scale, a mark out of 10 was awarded to

each sentence. These marks were reworked to

a percentage out of 100. This percentage pro-

vided a benchmark of the current accuracy of

translation, in terms of preservation of meaning

and grammatical correctness, for the specified

texts which were translated. The output

assessments carried out by the five assessors

determining the accuracy of translation in terms

of preservation of meaning and grammatical

correctness, are summarised in Table 3.

The average assessment given by the four

assessors was as follows:

Preservation of meaning: 60%

Grammatical correctness: 45%

A typical example of a text translat-

ed by Lexica

Table 4 provides a limited selection from a

report as an example from the list of texts

(which included newspaper articles, minutes

and reports) which were utilised in the assess-

ment of the current translation accuracy of the

Lexica MT System.

Analysis of systematic errors

If Table 4 is considered, some of the systemat-

ic errors that occur can clearly be seen. This

section will merely list and briefly discuss some

errors that occur in Table 4. The systematising

of errors illustrated below, is the procedure

which will be followed in the systematising of all

errors for future research.

No new matters arose from this item. /

Geen het opgerys nuwe sake daarvan item.

In this sentence, on the syntactic level, a

word-order error occurs in the TL. The subject

nuwe sake follows the verb het opgerys. A lex-

ical error also occurs, since the SL verb arose

is translated with an incorrect lexical item in the

TL, namely het opgerys. A stylistic error is

observed in the too-literal translation of from in

the SL, which is rendered, together with this

(SL), as daarvan in the TL.

The agenda for the meeting was approved with the

addition of the following: /

Die agenda vir die vergadering is met die aan-

vulling van die volgende goedgekeur:

In this sentence the SL word addition is

translated in the TL as aanvulling, which is a

Table 3: Output assessment carried out by the different assessors

Panel total Preservation of meaning Grammatical correctness

Assessor 1 49 33

Assessor 2 66 55

Assessor 3 79 48

Assessor 4 48 43

Assessor 5 60 48

Translation Accuracy: General Domain 60 45

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


lexical error, since an inappropriate translation

equivalent has been chosen. Despite the fact

that information is still conveyed, an inappropri-

ate word has been chosen in the TL.

Progress report /

Vooruitgang vermeld

Firstly, in this example, a lexical error

occurs, since Progress is translated as

Vooruitgang. An inappropriate register has

been chosen. Secondly, the SL word report is a

common noun, but is translated in the TL as

vermeld, which is a verb. Thus, in this case,

incorrect information is conveyed and meaning

is therefore not correctly preserved.

Future research

The immediate future research of the Lexica

research and development team will be focused

on systematising all the errors that were made

during the translation of the different texts. The

systematised list of errors will then be used to

start working on the improvement of the dictio-

nary file and the rule database file of the Lexica

system. After the development and enhance-

ment of the different files — which will not only

include an update of the dictionary file, but also

substantial work on the rule database file, for

example work dealing with word-order errors,

passive voice errors etc. — new texts from the

same domain will be translated and assessed

Table 4: The original English text (source text) with the subsequent Afrikaans translation (target text)

English text

A quorum was not present, but as the previous meet-

ing was adjourned due to the absence of a quorum,

the committee decided to proceed with the meeting.

Approval of agenda

The agenda for the meeting was approved with the

addition of the following:

Dates and Formats for Meetings was removed from

the agenda.

Videoconference was changed to 6.5.3. 6.5.4

Progress report

Introduction

The achievements of the MIDP during this quarter

could surely be measured against the Budget

Speech of the Premier of the Free State who pledged

all activities of the MIDP as targeted for 2000 as part

of what she expected to be implemented to achieve

her vision for the year 2000.

This acceptance was also reflected in the Annual

Report of the Department of the Premier of which

three full pages were occupied by the achievements

of the MIDP in 1999.

No new matters arose from this item.

The following achievements of the quarter under dis-

cussion can be highlighted:

Afrikaans translation

‘n Kworum was nie aanwesig nie, maardie vorige

vergadering weens uitstel is die afwesigheid van ‘n

kworum, die komitee wat besluit word om met die

vergadering te voortgaan.

Goedkeuring van agenda

Die agenda vir die vergadering is met die aanvulling

van die volgende goedgekeur:

Datums en Formate vir Vergaderings is van die

agenda verwyder.

Videoconference is verander na 6.5.3. 6.5.4

Vooruitgang vermeld

Inleiding

Die prestasies van die MIDP tydens hierdiekon sek-

erlik wees gemeet teen die Begroting Toespraak van

die Premier van die Vry Staat wat soos alle aktiwiteite

van die MIDP pledged targeted vir 2000 soos deel

van wat sy verwag om te wees geïmplementeer om

haar visie vir die jaar uit te voer 2000.

Hierdie aanvaarding is ook reflekteer in die

Jaarverslag van die Departement van die Premier

waaraan drie vol bladsye is deur die prestasies van

die MIDP beset in 1999.

Geen het opgerys nuwe sake daarvan item.

Die volgende prestasies van die kwart onder

bespreking kan uitgelig word:

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


AMTA. 1992. MT evaluation: basis for futuredirections. In: Proceedings of a Workshopheld in San Diego. November 1992. SanDiego, California. Washington, DC:Association for Machine Translation in theAmericas (AMTA).

Arnold D, Humphreys RL & Sadler L (eds).1993. Special issue on evaluation of MTSystems. Machine Translation 8(1–2):1–126.

Arnold D, Balkan L, Meier S, Humphreys R L& Sadler L. 1994. Machine Translation: AnIntroductory Guide. Manchester, Oxford:NCC/Blackwell.

Bohan N, Breidt E & Volk M. 2000. Evaluatingtranslation quality as input to product devel-opment. In: Proceedings of 2nd InternationalConference on Language Resources andEvaluation, Athens, Greece. Available at:h t t p : / / w w w . i f i . u n i z h . c h / c l / v o l k /papers/LREC2000.pdf

Botha S. 2001. Die Ontwikkeling van ‘nKwaliteitsassesseringsinstrument vir Plaas-

like Regeringstekste. MA-minithesis.Universiteit van die Vrystaat, Bloemfontein.

Constitutional Assembly. 1997. AnnotatedVersion. Wynberg, Cape Town:Constitutional Assembly

Department of Arts, Culture, Science andTechnology. 2000/2001. Annual Report.Available at: http://www.dac.gov.za/reports/annual_report/annual_report2000_2001.pdf

Falkedal K. (ed). 1994. Proceedings of theEvaluators’ Forum 1991, Les Rases, Vaud,Switzerland. Geneva: ISSCO. Available at:http://www.issco.unige.ch/publications/workshop.html

Hirschman L & Thompson HS. 1997.Overview of evaluation in speech and nat-ural language processing. In: Cole R,Mariani J, Uszkoreit H, Varile GB, ZaenenA, Zampolli A & Zue V (eds) Survey of theState of the Art in Human LanguageTechnology. Web Edition: CambridgeUniversity Press & Giardini. pp. 409–414.Available at: http://cslu.cse.ogi.edu/HLTsurvey/

according to the same procedure, in order to

determine the degree of improvement of the

system in terms of the preservation of meaning

and grammatical correctness.

Conclusion

The objective of this article was to establish an

assessment method and procedure to assess

the translation accuracy of the Lexica MT

System. To assess the system’s accuracy, a

diagnostic mode of assessment was chosen,

since the focus of this type of assessment is on

the identification of limitations, errors and defi-

ciencies, which may then be corrected or

improved by the development team.

With regard to the type of document used in

the assessment, it was decided that newspaper

articles, minutes of meetings and reports would

be used, as the objective of the development of

the Lexica MT System is to provide the Free

State Provincial Government with a tool specif-

ically aimed at enhancing and facilitating the

translation of minutes and reports.

The method used in the assessment of the

English–Afrikaans language pair was divided

into two categories, namely preservation of

meaning and grammatical correctness. In

terms of the preservation of meaning an aver-

age translation accuracy of 60% was achieved,

whereas the average for grammatical correct-

ness was a mere 45%.

The results of the assessment clearly indi-

cated that much work needs to be done in order

to raise the level of accuracy of translation. The

ultimate objective of the Lexica research and

development team is to deliver a system that

effectuates translation from any official South

African language into any other official South

African language with at least a ninety percent

translation accuracy. A fully functional transla-

tion system for all the languages of South Africa

would vastly increase the productivity and effi-

ciency of translators. It would also provide gen-

erators of reports, statements, study guides

and other documents with easily obtainable

multilingual sources. This mission will not, how-

ever, be achieved within a short timeframe and

will require several years of research and

development. The ULFE is currently focusing

on the development of the English–Afrikaans

language pair, as well as the redevelopment of

user interfaces. It is envisaged that a complete-

ly new system, using some of the functionalities

of the Lexica MT System, will be the end result

of this work and research.

References

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4


House J. 1997. Translation QualityAssessment. A Model Revisited. Tübingen:Gunter Narr Verlag.

Hovy E, King M & Popescu-Belis A. 2002.Computer-aided specification of qualitymodels for machine translation evaluation.LREC 2002 Workshop on MachineTranslation Evaluation: Human EvaluatorsMeet Automated Metrics. Las Palmas deGran Canaria, Spain. pp. 1–7. Available at:http://andreipb.free.fr/textes/eh-mk-apb-lrec-02.pdf

Hutchins J. 1997. Evaluation of machinetranslation and translation tools. In: Cole R,Mariani J, Uszkoreit H, Varile GB, ZaenenA, Zampolli A & Zue V (eds) Survey of theState of the Art in Human LanguageTechnology. Web Edition: CambridgeUniversity Press & Giardini. pp. 418–420.Available at: http://cslu.cse.ogi.edu/HLTsurvey/

Hutchins J. 1999. The development and use ofmachine translation systems and computer-based translation tools. InternationalSymposium on Machine Translation andComputer Language InformationProcessing. 26–28 June 1999, Beijing,China. Available at: http://www.foreign-word.com/Technology/art/Hutchins/hutchins99.htm

Hutchins WJ & Hartmann W. 2002.Compendium of Translation Software.Commercial Machine Translation Systemsand Computer-Aided Translation SupportTools. (5th edn). Available at: http://our-world.compuserve.com/homepages/WJHutchins/Compendium-5.pdf

Hutchins WJ & Somers HL. 1992. AnIntroduction to Machine Translation.London: Academic Press.

Kay M. 1980. The proper place of men andmachines in language translation. XEROXPARC Research Report CSL-80-11. PaloAlto, CA: Xerox Parc.

King M. 1993. Forum: Evaluation of MTSystems. In: Nirenburg S (ed) Progress inMachine Translation. Amsterdam: IOSPress. pp. 267–282.

King M & Falkedal K. 1990. Using test suitesin evaluation of MT systems. In: Karlgren H(ed) Coling-90: Papers Presented to the13th International Conference onComputational Linguistics. Volume 2.Helsinki, Finland. pp. 211–216.

Lehrberger J & Bourbeau L. 1988. Machinetranslation: linguistic characteristics of MTsystems and general methodology of evalu-ation. Lingvisticæ Investigationes Suppl.15. Amsterdam, Philadelphia: JohnBenjamins.

Nagao M. 1989. Machine Translation. How FarCan It Go? Oxford: Oxford University Press.

Nyberg EH, Mitamura T & Carbonell JG.1994. Evaluation metrics for knowledge-based machine translation. Proceedings ofColing 94. Available at: http://www.lti.cs.cmu.edu/Research/Kant/PDF/evaluate.pdf

Oosthuizen GD & Coetzer T. s.a. A feature-based approach to translation of African andEuropean languages. Unpublished report.

Pierce J R, Carroll J B, Hamp E P, Hays D G,Hockett C F, Dettinger A G & Perlis A.1966. Computers in translation and linguis-tics. ALPAC Report 1416, National Academyof Sciences/National Research Council.

The Constitution of the Republic of SouthAfrica. 1996. Annotated Version. Wynberg,Cape Town: Constitutional Assembly.

The ISLE Classification of MachineTranslation Evaluations. InternationalStandards for Language Engineering(ISLE). Draft 1, October 2000.http://www.isi.edu/natural-language/mteval

Van Slype G. 1979. Critical methods for evalu-ating the quality of machine translation.Report BR 19142. European Commission/Directorate for General Scientific andTechnical Information Management (DGXIII). Available at: http://issco-www.unige.ch/projects/isle/van-slype.pdf

White JS, O’Connell T & O’Mara FE. 1994.The ARPA MT Evaluation Methodologies:Evolution, Lessons and FurtherApproaches. In: Technology Partnershipsfor Crossing the Language Barrier:Proceedings of the First Conference of theAssociation for Machine Translation in theAmericas. Columbia, Md. 1994,pp.193–205.

Dow

nloa

ded

by [

Uni

vers

ity o

f Y

ork]

at 0

1:51

16

Oct

ober

201

4

the assessment of translation accuracy of the lexica machine translation system

Documents