towards a methodology for a corpus-based approach t o translation evaluation

39
The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of linguistic and conceptual .patterns The majority of corpus analysis tools also offer a number of other features, which often combine the data produced by the concordancer and word frequency counts

Upload: fell

Post on 23-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of linguistic and conceptual . patterns - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of linguistic and conceptual .patterns

The majority of corpus analysis tools also offer a number of other features, which often combine the data produced by the concordancer and word frequency counts

Page 2: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Towards a Methodology for a Corpus-Based Approach to Translation Evaluation

ByMojgan Heydarali

Professor : DR, BehbahaniCourse : Translation Assessment

Azad University of literature & Foreign Languages, Tehran South Branch

Page 3: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Content1 .Translator trainers responsibility

2 .Evaluation tools limitations 3 .Importance of corpus-based approach 4 .Characteristics of corpus-based approach

5.Challenges facing evaluators in academic context 6 .Corpora and corpus analysis tools

7.Designing an evaluation corpus a. Comparable Source Corpus b. Quality Corpus c. Quantity Corpus

d. Inappropriate Corpus

Page 4: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Translator trainers are responsible for:

Grading students’ work and importantly feedback, providing useful

Page 5: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

In the past translators and trainers worked with resources such as:

Dictionaries Printed parallel texts

Unverified intuition

Subject field experts

But they were not always conductive to providing the conceptual and linguistic knowledge necessary to

.an objective translation evaluation

Page 6: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

What is the importance of corpus-based approach ?

It removes a great deal of subjectivity : 1 2 :Provides improved access to

appropriate conceptual and linguistic information of specialized subject field which is documented by experts in

that field .

Page 7: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

In another word a specially designed evaluation corpus can act as a

benchmark for comparing students translations on a number of

different levels

Page 8: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

so

Translator trainers by having access to wide range of authentic and suitable texts can:

Verify or correct both conceptual and linguisticInformation and,Provide more constructive feedback based on evidence.

Page 9: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

What is a corpus-based approach characteristics?

Firstly, It is based on the analysis of a comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable form( i. e, a corpus).

Page 10: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Secondly,

Because it analyzes actual patterns of language use in the corpus, it is empirical and therefore

objective.

Page 11: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Thirdly,:This approach takes advantage of

Computational Tools , Methods for Manipulating the corpus ,

Arranging the Data ,

in ways that make it possible to spot items and patterns that would be difficult to identify in other types of resources.

Page 12: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Additionally

Computers provide consistent and reliable analysis (i.e., they do not change their minds or get distracted.)

Page 13: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Finally

The corpus-based approach combines both Quantitative and Qualitative techniques;

A computer is capable of churching out counts of linguistic features, but translator trainer is responsible for exploring and interpreting data in order to learn about patterns of language use.

Page 14: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

1 .Challenges Facing Evaluators in an Academic Context

A. The main difficulty surrounding translation evaluation is its subjective nature ; the notion of quality has very fuzzy and shifting boundaries.

B. Clients who commission translations are not interested educating the translator while trainer has .obligation to help students improve their performance

Page 15: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

C. In order to properly preparing students for entering the translation profession, students needs to be exposed to wide range of translation material and text types,

but naturally trainers are not expert in all subjects. So specially designed evaluation corpus can help to meet this need.

Page 16: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Corpora and corpus analysis toolsSimilarity between corpus and conventional parallel texts:

In translation context, a suitable corpus might be one containing texts that correspond to the intended skopos of target text. In this way a corpus is similar to the conventional parallel .texts used by many translators

Page 17: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

However an electronic corpus is generally much larger and can be processed with the help of computerized tools known as corpus analysis tools.

Page 18: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Most corpus analysis tools contains at least two main features:Word Frequency lists and Concordancers

A. Word Frequency lists, allows users to discover how many different words are in the corpus and how often

each appears .

DVD 765 * video 126* not 89 * player 80Is 341 * we 121* said 85 * all 79

Will 208 * have 116 * PC 82* MPEG 81

Page 19: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

“I really like translation because I think that translation is really, really interesting”.

:Tokens (total word ) = 13 They can be stored in * Alphabetical order

* Ascending order * Descending frequency

Types(different words) = 9

Page 20: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Words belonging to the same lemma can be counted together or separately, as can words beginning with upper or lower case.

Lemma refer to words which have same stem and belong to the same major word class, differing only by spelling or inflection.

Stop lists refer to lists of words to be ignored and can also be used In order to eliminate common function words such as prepositions or conjunctions.

Frequency information can be used for helping translators decide which term to use when faced with a number of potential synonyms or translation equivalents.

Page 21: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

B. Concordancer

A concordancer retrieves all the occurrences of particular search pattern in its immediate contexts and displays these in an easy-to-read format.

The most commonly used format is KWIC (key word in context) shows one occurrence of the search pattern per line with the search

pattern itself high-lighted in the center of the screen.

Atsushita slick, portable DVD player with a color LCD and Ndows explorer, but their movie player software refused to plaErs with a “record” button. The player will not even have the o

three years ,” he says. Such a player would have a display

Page 22: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

* The extent of the context on either side of the search pattern is variable,

* These contexts can be sorted in a variety of ways such as :

a. Order of appearance in the corpus,

b. Alphabetically, c. The words preceding or following

the search pattern

Page 23: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Concordancers are flexible and allow functions such as:

* Case-Sensitive VS Non-Case Sensitive searches (Bill ex president of USA & bill ,Polish people of poland & polish)

* Wildcard searches( e.g. ‘play’ to retrieve‘ play’, ‘player’, ‘played’, etc).

* Another term must appear within a user- specified distance of search term

(e.g. contexts where ‘play’ appear within five words of ‘DVD ’)

Page 24: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

The majority of corpus analysis tools also offer a number of other features, which often combine the data produced be the Concordancer and Frequency Counts.

It must be considered:

* The value of what comes out of a corpus is largely dependent on what texts are included in it.

* Criteria for designing general language corpora have been well- documented in literature ; however, these criteria cannot be adopted wholesale for the design of a special-purpose corpus

such as an Evaluation Corpus .

Page 25: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Designing an Evaluation Corpus

The evaluation Corpus is the collective name for the collection of texts that is divided into four main sub-corpora:

1 .The Comparable Corpus 2 .The Quality Corpus

3 .The Quantity Corpus 4 .The Inappropriate Corpus

These sub-corpora differ in content and intended function.

Page 26: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

1 .Comparable Source Corpus (CSC)

It is optional and depends on factors such as Time, text type, skopos of the target text.

CSC contains a selection of SL texts that are similar to the source text in term of

text type, publication date,

subject matter .

Page 27: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

The purpose of CSCIts purpose is to allow the evaluator to gauge the “normality” of the source text with regard to other source language texts of that type.

Normalization is a feature of translated texts; normalized texts display exaggerated features of the target language and conform to its

typical pattern (Baker.1997)

Page 28: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Sanitization:The suspected adaptation of a source text reality to make it more palatable for target audiences.(Kenny)

Both Normalization and Sanitization result in deliberately chosen unconventional lexical or syntactic ST features being changed in translation so that the TT fits in with the

conventions of the target language .

Page 29: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Determining inappropriate normalization or sanitization

: Evaluators can first: use the CSC as a reference corpus to

establish the relative normality of the ST.

Second: they can then use Quantity Corpus as reference corpus to establish the relative normality of TT.

Page 30: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

If the ST is deemed to be normal (in vocabulary, register, style, etc ). with

reference to texts in the comparable source corpus,

then the text should be normal when compared with texts in the Quantity Corpus( and vise versa).

Page 31: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

2 .Quality Corpus

The Quality Corpus is a high quality sub-corpus consisting of

hand picked texts primarily for their conceptual content,

It is very small by corpus linguistics standards containing four or five texts with total word 5,000 words

Page 32: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

The Quality Corpus is used primarily as a source of conceptual information rather than linguistic information so it is not necessary that all texts to be of the same text type.

But it is important to be complete texts( not a sample or extract of the text.)

At list some of the texts should be current.

Using Quality Corpus will help translator trainer become familiar with basic concepts in the field and identify some of the key terms.

If the texts are well chosen they can serve as benchmark for evaluating students translation.

Page 33: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

3 .Quantity CorpusWhy it is not appropriate to rely exclusively on the

Quality Corpus? Firstly

Because it is a relatively small collection,There is no real way to know that if selected texts are truly representative of the text type at large.

SecondlyThe texts contained in the Quality Corpus may be “older” texts and a term which was appropriate in the past may no longer be so.

Page 34: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

The Quantity Corpus is designed to provide a larger and more representative sample of specialized language in question.

External factors such as time and availability of data have influence on the question of how large and

.how representative

By experience the Quantity Corpora from 20,000 to 200,000 words have proved useful.

20,000 for highly specialized subject field,.200,000 for subject field that are not extremely narrow

Page 35: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

It is useful to divide the Quantity Corpus into further sub-corpora, one for each year , this enables translators or evaluators track terminological changes over time.

A Corpus analysis tool such as Word Smith allows users to consult multiple corpora at once.

Page 36: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

The Quantity Corpus: pros and cons

Pros:The Quantity corpus is compiled in semi-automated fashion and can be used by translator trainer to verify terminological, phraseological, and stylistic appropriateness made by students.

Most Corpus analysis software gives users the option of expanding the context to several lines or the complete text.

The volume of the data makes it possible to spot pattern more easily, to make generalizations and provide concrete

evidence to support decisions .

Page 37: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

Cons:Interacting solely with a large electronic corpusCauses loosing sight of the fact that translation is a text-

based activity .

In corpus analysis the focus is on micro- contexts and the primary power of corpus analysis remains at a sub- text level.

The texts are not readily available in electronic form.

Page 38: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

4 .Inappropriate corpus

It is a corpus containing “inappropriate” parallel texts.Its size vary based on the subjects.

In well established or with wider interest it would be larger, but it is smaller in very recent subjects.

Its purpose is to help translator trainer uncover the mysteries of the unsuitable equivalents in students translation.

If a student has used a term which does not appear in Quality and Quality Corpus it can be checked in this corpus.

Page 39: Towards  a  Methodology for a Corpus-Based Approach  t o Translation Evaluation

THE END

The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of

linguistic and conceptual patterns

The majority of corpus analysis tools also offer a number of other features, which often combinethe data produced by the concordancer and word frequency counts