using the internet for specialised translation

66
USING THE INTERNET FOR SPECIALISED TRANSLATION 1

Upload: others

Post on 16-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: USING THE INTERNET FOR SPECIALISED TRANSLATION

USING THE INTERNET

FOR SPECIALISED TRANSLATION

1

Page 2: USING THE INTERNET FOR SPECIALISED TRANSLATION

Translation Technology

“much translation work is carried out in a computer-assisted translation (CAT) environment, which may vary from a standard desktop equipped with word processing software and a browsertoa full-blown translator workstation consisting of a multiplicity of tools specifically created for translators of technical texts and localizers."

“Translation agencies organize their workflow around project management systems that distribute translation tasks, memories and terminologies to and around individual translators.”

(F. Zanettin 2014, “Corpora inTranslation”)

Page 3: USING THE INTERNET FOR SPECIALISED TRANSLATION

Translation technologies

• electronicdictionariesand terminologicaldatabases, thearrivalof the Internet with its numerous possibilities for research, documentation and communication, andtheemergence of computer-assisted translationtools.

Alcina A. (2008) «Translation technologies - Scope, tools andresources».Target 20:1, 79–102

Page 4: USING THE INTERNET FOR SPECIALISED TRANSLATION

Degrees of Translation automation

Page 5: USING THE INTERNET FOR SPECIALISED TRANSLATION

• The term traditional human translation is understood to refer totranslation

without any kind of automation

• Fullyautomatichigh quality translation (FAHQT)meanstranslation that is performed wholly by the computer, withoutany kind of

human involvement,and is of “highquality”• Human-aided machine translation (HAMT) refers to systems in

whichthetranslation is essentially carriedout by the program itself, but aidrequired fromhumans

• Machine-aided human translation (MAHT) comprises any process or degree of automation in the translation process, provided that the mechanical intervention provides some kind of linguistic support.

Degrees of Translation automation

Page 6: USING THE INTERNET FOR SPECIALISED TRANSLATION

Tools vs.Resources• The word tool refers to computer programs that enable translators to carry

out a series of functions or tasks with a set of data that they have preparedand, at the same time, allows a particular kind of results to be obtained.

• Internet search engines• Word processor• Trados, Wordfast, Déjà Vu, Across, OmegaT, …• Antconc, Wordsmith…

• By resourceswe refer to all sets of previously gathered linguistic data which are organized in a particular manner and made available in some electronic format so that they can be used or looked up or used by translators used in the course of some phase of processing. Terminological databases (e.g. IATE), glossaries, …

• (online) dictionaries• British National Corpus, …

Page 7: USING THE INTERNET FOR SPECIALISED TRANSLATION

why and how can we mine the web?

Page 8: USING THE INTERNET FOR SPECIALISED TRANSLATION

• thestudyofwords“bypresentingtheminthecompanytheyusuallykeep- thatistosay,anelementoftheirmeaningisindicatedwhentheirhabitualwordaccompanimentsare shown”

• “Extendedunitsofmeaning” at work in language(Sinclair, 1996)

Extended units of meaning

Wordsmustbestudied incontextratherthanin

isolation

• collocation• colligation• semantic preference• semantic prosody

Page 9: USING THE INTERNET FOR SPECIALISED TRANSLATION

Extended units of meaning

Wordsmustbestudied incontextratherthanin

isolation

• DifferencesinItalianbetween(fromTaylor,1998: 61):◦ “pressione alta”=“high(blood)pressure” [medical]◦ “altapressione”=“(banksof)highpressure” [meteorological]

• collocation• colligation• semantic preference• semantic prosody

Page 10: USING THE INTERNET FOR SPECIALISED TRANSLATION

11

• “Tendencyofcertainwordstoco-occurregularlyinagiven language”(MonaBaker,1992: 47)• Asobservedinactualtexts(vs. intuition)

• Keyfeaturesof collocationsolanguage-specific(collocationsvaryfromlanguagetolanguage)

• Collocationsarenotstableor fixedotheymaychangediachronically(overtime)ingenerallanguageotheymaychangeinLSPvs.generallanguageotheymaychangeacrossLSPdomains

Collocation

Page 11: USING THE INTERNET FOR SPECIALISED TRANSLATION

•“Aconsistentauraofmeaningwithwhichaformisimbuedbyitscollcates”(Louw1993)

• “Feeling”or“aura”thatisevokedbyusingcertainwords(reinforcedbycollocates,duetoco-selectionalimplicationsandrestrictions)

•Usuallythisfeelingis“positiveor negative”• “Provide”tendstooccurwithwordsdenotingthingswhicharedesirable,necessaryorgood,suchas “information”,“service(s)”,“support”,“help”,“money”,“protection”,“food”, “care”• cf.Italian“fornire”and “elargire”

• “Cause”tendstooccurwithwordsdenotingnegativerepercussions/consequences,suchas“pain”,“damage”, “harm”• cf.Italian “causare”

•Not20necessarilyaccessibletointuition.

Semantic prosody

Page 12: USING THE INTERNET FOR SPECIALISED TRANSLATION

•“Aconsistentauraofmeaningwithwhichaformisimbuedbyitscollcates”(Louw1993)

• “Feeling”or“aura”thatisevokedbyusingcertainwords(reinforcedbycollocates,duetoco-selectionalimplicationsandrestrictions)

•Usuallythisfeelingis“positiveor negative”• “Provide”tendstooccurwithwordsdenotingthingswhicharedesirable,necessaryorgood,suchas“information”,“service(s)”,“support”,“help”,“money”,“protection”,“food”, “care”• cf.Italian“fornire”and “elargire”

• “Cause”tendstooccurwithwordsdenotingnegativerepercussions/consequences,suchas“pain”,“damage”, “harm”• cf.Italian “causare”

Semantic prosody

Page 13: USING THE INTERNET FOR SPECIALISED TRANSLATION

14

•Relationbetweenalemmaandasetofsemanticallyrelatedwords(Stubbs,2001:65)• Lemma:baseform(lexeme)ordictionaryentryofa word• “Commit”isusedwithagroupofsemanticallysimilarwords,e.g.“murder”,“crime”,“suicide”(cf.Italian “commettere”)

•“Revoke”isusedwithe.g.“licence”,“permit”,“authorization”

•Semanticprosodyà positive/negativeevaluation•Semanticpreferenceà relationtowordsbelongingtoaparticular,definablesemantic field

Semantic preference

Page 14: USING THE INTERNET FOR SPECIALISED TRANSLATION

15

•Relationbetweenalemmaandasetofsemanticallyrelatedwords(Stubbs,2001:65)• Lemma:dictionaryentryofa word• “Commit”isusedwithagroupofsemanticallysimilarwords,e.g.“murder”,“crime”,“suicide”(cf.Italian “commettere”)

•“Revoke”isusedwithe.g.“licence”,“permit”,“authorization”

•Semanticprosodyà positive/negativeevaluation•Semanticpreferenceà relationtowordsbelongingtoaparticular,definablesemantic field

Semantic preference

Page 15: USING THE INTERNET FOR SPECIALISED TRANSLATION

16

•Relationbetweenapairofgrammaticalcategoriesorapairingoflexisandgrammar(Stubbs,2001: 65)

• hear,notice,see,watchenters into colligationwiththesequence ofobject +either the bareinfinitive orthe -ingform;e.g.Corr

• We heard thevisitors leave/leaving.• We noticed himwalk away/walking away.• We heard Pavarottising/singing.• We saw it fall/falling.espondingcollocationsandcolligationsin

Italian for“breakthe law”?

Colligation

Page 16: USING THE INTERNET FOR SPECIALISED TRANSLATION

17

•Relationbetweenapairofgrammaticalcategoriesorapairingof

• hear,notice,see,watch enters into colligationwiththesequenceofobject +either the bareinfinitive orthe -ingform;e.g.

• We heard thevisitors leave/leaving.• We noticed himwalk away/walking away.• We heard Pavarottising/singing.• We saw it fall/falling.espondingcollocationsandcolligationsin

Italian for“breakthe law”?

Colligation

•Relationbetween apair ofgrammatical categoriesorapairing oflexis andgrammar (Stubbs,2001: 65)

Page 17: USING THE INTERNET FOR SPECIALISED TRANSLATION

Conclusion on using theWebfor specialised translation – Main advantages

• massive amount of texts and multi-source information can besearched

• content is constantly “refreshed” (i.e. updated andextended)

• a lot of sources, text types and domains/topics arerepresented

• many languages (English is dominant, good presence ofItalian)

• replicable search techniques across (your working/target) languages

• it is availableat anytime, at virtuallyno cost!

How to friend and unfriend someone on Facebook - Computer Hope1.https://www.computerhope.com › ... › Facebook Help24 gen 2018 - Before you can connect with another person on Facebook and view their full profile, you must first become friends. Below are the steps on how to find new friends on Facebook, addfriends, and how to unfriend any of your current friends. How to findfriends on Facebook; How to friend someone on ...

Page 18: USING THE INTERNET FOR SPECIALISED TRANSLATION

Conclusion on using theWebfor specialised translation – Main advantages

• massive amount of texts and multi-source information can besearched

• content is constantly “refreshed” (i.e. updated andextended)

• a lot of sources, text types and domains/topics arerepresented

• many languages (English is dominant, good presence ofItalian)

• replicable search techniques across (your working/target) languages

• it is availableat anytime, at virtuallyno cost!

Page 19: USING THE INTERNET FOR SPECIALISED TRANSLATION

Main disadvantages andproblems

o need to differentiate good/reliable sources from questionable information§for facts (limited control over user-generated content likeWikipedia)§for linguistic usage (badly translated, non-native texts, poorauthors)§it may be difficult to identify differences betweenexpert/non-expertuse

o data/results still need to be interpreted

Page 20: USING THE INTERNET FOR SPECIALISED TRANSLATION

Main disadvantages andproblemso Google focuses on content/information, rather than linguisticforms

• the rankingand sorting of results are performed accordingto criteria like

• “popularity” of the websites, or geographic relevance

• the same search can yield different numbers of hits, depending on unpredictable and uncontrollable factors as the time of the day, or the location from which the queryis made -- wordcounts are not reliable+it is difficult tocompare frequencies to verify translationhypothes

• data on which searches are performed isunstable/changes

Page 21: USING THE INTERNET FOR SPECIALISED TRANSLATION

Main disadvantages and problems

Particularly relevant to linguists/translators:§ no possible/meaningful sorting of hits/results(esp. L/R-hand collocates)

- e.g. alphabetical sorting of collocates, from least to most frequent,etc.- think of e.g. the “a * range/array of”, “on the vergeof” exercises

§ punctuation and upper case (capitals) are ignored, e.g. “aids” vs.“AIDS”§ impossibleto searchpartsof words,e.g. start with “geo…”,end in “-itis”§ no lemmatisedsearches

- hard to calculate frequencies of specific wordcombinations- e.g. to calculatehow frequent is the combination “tirare l’acquaalproprio

mulino”, all inflected forms must be searchedfor§ no POS-sensitivesearches

- e.g. to search for ‘spot’ as a noun vs. as averb

§ no possibility to specify the span occuring between twosearch terms- i.e. the * wildcard can include zero to nwords

«Googleology is bad science» (Kilgarriff 2007)

Page 22: USING THE INTERNET FOR SPECIALISED TRANSLATION

MACHINE TRANSLATION

(MT)

1

Page 23: USING THE INTERNET FOR SPECIALISED TRANSLATION

24Machine translation (MT):definition and key terms

• Definition of machine translation:

“computerised systems responsible for the production of translations

from one natural language into another, with or without human

assistance” (Hutchins & Somers, 1992: 3)

o Human intervention is not necessarily excluded, but if it does occur it is

subordinated to the prevailing action of the computer

• Some key terms:

o MT system / engine / service = the software that produces the translation

o input = the source text (i.e. original that we are trying to translate)

o [raw] output = [unedited] target text (i.e. the translation that we obtain)

Page 24: USING THE INTERNET FOR SPECIALISED TRANSLATION

MT – popular conceptions

Probably the translation technology that attracts the most public attention, esp. among non-translators.Two extreme positions about MT:

1.MT is totally useless and a waste of time and money, as the quality o the output is generally very low (funny anedoctes)

Underestimates possibilities2.MT will bring down language barriers; in a few years’ time MT will

be as good as human translation, no more need for translatorsUnderestimate limitations

Quality varies according to language pairs, integrated tools (MT thatlearns) and pre- editingThere will be more pre-editing and post-editing jobs, for which human expertise is required à new spheres of activity for translators/languageprofessionals

Page 25: USING THE INTERNET FOR SPECIALISED TRANSLATION

“L'inglese di Expo non sembra Google Translate,è Google Translate”

From http://www.linkiesta.it/it/blog-post/2015/02/12/linglese-di-expo-non-sembra-google-translate-e-google-translate/22476/

Page 26: USING THE INTERNET FOR SPECIALISED TRANSLATION

MT – popular conceptions

Probably the translation technology that attracts the most public attention, esp. among non-translators.Two extreme positions about MT:

1.MT is totally useless and a waste of time and money, as the quality o the output is generally very low (funny anedoctes)

Underestimates possibilities2.MT will bring down language barriers; in a few years’ time MT will

be as good as human translation, no more need for translatorsUnderestimate limitations

and post-editing jobs, for which human expertise is required à new spheres of activity for translators/language professionals

Page 27: USING THE INTERNET FOR SPECIALISED TRANSLATION

28

Texts in SL Texts in TL

Parallel corpora: a collection of original texts in language L1 and their translationsinto a give L2

Machine translation (MT):main architectures of MT systems

Page 28: USING THE INTERNET FOR SPECIALISED TRANSLATION

29

• So why is translation difficult for computers?

o Some blame the computer’s lack of “real-world knowledge”

o Focus on potential translation problems for EN-IT (with a computer!!)

o A simple example: lexical gaps and lexical asymmetries (concrete nouns)

§ legno / bosco / foresta in IT (+ EN, FR, DE and your other languages…)

Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?

legno bosco foresta IT

wood forest EN

bois forêt FR

Page 29: USING THE INTERNET FOR SPECIALISED TRANSLATION

30

• Partly because the translation often depends on the context / situation, which the computer is not able to take into account

“The ball is in your court”

Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?

“Il pallone è nella vostra metà campo”(the manager to the players)

“Il ballo è nella vostra corte”(the chamberlain to the king)

Page 30: USING THE INTERNET FOR SPECIALISED TRANSLATION

31Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?

• Lexical ambiguities (gramm. category <-> meaning <-> translation)

for example, in EN: round

j) My team was eliminated in the first round

k) The cowboy started to round up the cattle

l) We can use the round table for dinner

m) Maggie is going on a cruise round the world

• These sentences are ambiguous and very complex (for MT!):

Time flies like an arrow

Gas pump prices rose last time oil stocks fell

: girone)Noun(

: radunare)Verb(

: rotondo)Adjective(

: intorno al)Preposition(

Page 31: USING THE INTERNET FOR SPECIALISED TRANSLATION

32

1) The chimp eats the banana because it is greedy.

2) The chimp eats the banana because it is ripe.

3) The chimp eats the banana because it is lunchtime.

____________

___________ __

__

?

Machine translation (MT):some linguistic phenomena that are particularly difficult for MT

• The case / example of pronominal anaphora (resolution), difficult for MT

Page 32: USING THE INTERNET FOR SPECIALISED TRANSLATION

33

MT post-editing

Page 33: USING THE INTERNET FOR SPECIALISED TRANSLATION

34

• The aim of post-editing is to make the revised output usable orunderstandable, with the least possible effort (quickly)

• The priority is to save time and money

• The extent and the accuracy of post-editing are negotiated/specified on a case by case basis, depending on the needs and requirements

• Different “types” and levels of post-editing (in companies, organisations):

• no post-editing• internal circulation, almost never external publication

• minimum post-editing• internal circulation, rarely external publication

• full/complete post-editing (but… is it worth it?)• very rarely internal circulation, mostly external publication

MT post-editing

Page 34: USING THE INTERNET FOR SPECIALISED TRANSLATION

35

• new skill that is acquired with experience, different from translation

• in this scenario one has to balance and optimise quality-speed-cost, inrelation to the intended use/duration of the translation

• length of use ofthe document

• needs and expectations of the end user(s)

• ability of the readers/addressees tomake use of the doc.

• type, length and “visibility” of the document

• available and viable options

MT post-editing: introduction

Page 35: USING THE INTERNET FOR SPECIALISED TRANSLATION

36

• (minimum/full-complete) are decided specifically

• Factors to be considered (prioritised)

• save time and money (quality is less relevant)

• understandability and correctness of general meaning are key

• Factors to be ignored (irrelevant in PE)

• any detail or nuance

•elegance, fluency, naturalness of expression, etc.

on average PE is paid roughly 50% of the “real/proper” translation

Aims and level of PE (vs. translation/proofreading!)

Page 36: USING THE INTERNET FOR SPECIALISED TRANSLATION

37

MT pre-editing

Page 37: USING THE INTERNET FOR SPECIALISED TRANSLATION

38

•There are two possibilities to limit the texts / language in / for MT:• adopt a controlled language (restricted input) • use the sublanguage approach

• Common aims with both options (to the advantage of MT):• limited vocabulary • more certainty on interpretation • reduce syntactic variation

Limit input domain / topic

Page 38: USING THE INTERNET FOR SPECIALISED TRANSLATION

39

• Prescriptive rules aimed at normalising the style of the input (ST), e.g.

• do not write sentences with more than 20 words (general, language-neutral)

• avoid passive constructions, use only active verb forms

• avoid anaphoras, make all subjects and pronominal references explicit

• in EN: do not omit “that” in relative clauses (language-specific)

• in IT: do not use “solo” as an adverb, but use “soltanto/solamente”

• in IT: use the word “minuto” only as a noun (i.e. to mean 60 seconds);

for the adjectival meaning, use only “piccolo”

Etc……

The result of controlled language is restricted input

Controlled language

Page 39: USING THE INTERNET FOR SPECIALISED TRANSLATION

40

• Natural/normal behaviour of language within a well-defined domain(~ LSP, specialised language, jargon, etc.)

• “sub-” in the mathematical sense as in “subset”, not derogatory!• referred to very well-defined, enclosed, limited domains and texts

• A sublanguage exists and is used regardless of MT, but one can designan MT system that takes advantage of this sublanguage

• vocabulary• limited (relatively few concepts to be covered/expressed)• finite/closed (innovation/deviation tend to be avoided)• a few homographs, in general limited use of synonyms and coreferences

• syntax• limited range of structures and constructions (regularity + repetitiveness)

• usually sublanguages are very similar cross-linguistically between SL/TL(s)

Sublanguage (1/2)

Page 40: USING THE INTERNET FOR SPECIALISED TRANSLATION

41

• Input must be in (or converted into) electronic format

• Correct formatting and layout of the input are very important

o the word “e r r o r” (spaced letters) would not be recognised / translated

o spelling and typos are crucial: THEY BOOKS A ROOM …

(anybody would understand banal mistakes, but not an MT system!)

• Limited availability of language combinations (improving with SMT)

o coverage mostly limited to “usual” big languages with commercial interest

Machine translation (MT):restrictions to the use of MT

Page 41: USING THE INTERNET FOR SPECIALISED TRANSLATION

COMPUTER-ASSISTED TRANSLATION

(CAT) TOOLS

1

Page 42: USING THE INTERNET FOR SPECIALISED TRANSLATION

43

• Computer-assisted translation or computer(machine)-aided translation (CAT) refers to a variety of tools, a family of software products designed to support professional translators in their work.

• CAT is a “recent” development, derived from MT over the last 20 years

• The actual development of commercial CAT tools started in the 1990’s – the so-called “translator’s workstation / workbench”, which includes

• terminology management packages• translation memory (TM) software (+ text alignment software, etc.)

• CAT tools are pieces of software designed to enhance the work of translators:

• maximise speed à higher productivity• improve coherence and precision à higher quality

Computer-assisted translation (CAT) tools

Page 43: USING THE INTERNET FOR SPECIALISED TRANSLATION

44

• Used to create, store, retrieve and manipulate bi-/multilingual termbases/glossaries

• As searching for terminology can be highly time-consuming (even up to 75% of translators’ time), setting up a database which gathers the terminology you come across is vital.

• Lists in word processors / spreadsheets (e.g. Excel) àlimited options for presenting and sorting data

• The terminology covered is usually that of a given (sub-)discipline or the terms needed for a specific translation project.

• Terminology records consist of a number of flexible fields

CAT tools, example 1:terminology management packages

Page 44: USING THE INTERNET FOR SPECIALISED TRANSLATION
Page 45: USING THE INTERNET FOR SPECIALISED TRANSLATION

46

• Translation memory (TM):

“multilingual text archive containing […]multilingual texts, allowing storage and retrieval of

aligned text segments against various search conditions”

(EAGLES* 1995)* Evaluation of Natural Language Processing Systems

• This roughly means: a “filing cabinet” (i.e. a database) of old translations whose bits can be retrieved and used when / as needed by the translator

• essentially a textual database that can be searched• pairs of source-text and target-text segments

CAT tools, example 2:translation memory (TM) software

Note: Translation memory indicates both the software tool and the contents of the database, i.e. the whole set of aligned text segments that it includes

Page 46: USING THE INTERNET FOR SPECIALISED TRANSLATION

47

• Key idea: recycle similar past translations, never translate the same (or a similar) text twice

• How it works: • TM tools divide the source text – which must be in (or turned

into, e.g. with OCR) electronic/digital format –into segments, which translators can translate one-by-one in the traditional way.

• These segments (usually sentences, or even phrases) are then sent to a built-in database. When there is a new source segment equal or similar to one already translated, the memory retrieves the previous translation from the database.

• When is this most useful:• for the translation of any text that has a high degree of repeated

terms and phrases which must be translated consistently, as is the case with e.g. user manuals, computer products and subsequent versions of the same document (e.g. website updates).

• mostly relevant to technical/specialised translation (not literature)

Translation memory (TM) software

Page 47: USING THE INTERNET FOR SPECIALISED TRANSLATION

48

• Scenario◦ you have to translate the user manual of a printer (new model) from English into Italian◦ a lot of repetition within the document itself ◦ overlap and repetitions across updated (old-new) versions of the documentation◦ you have a relevant TM (similar topic / domain / texts / clients)◦ you translated the previous manual(s)◦ TM provided by client / translation agency / colleague

Using translation memory (TM) software

Page 48: USING THE INTERNET FOR SPECIALISED TRANSLATION

49

• Translation of a printer manual English (A) à Italian (B)

Source text (in language A)

ST: There are 4 ways to change print settings for this printer

Exact/Perfect match (everything in the segment is exactly the same)

A: There are 4 ways to change print settings for this printerB: Ci sono 4 modi per cambiare le impostazioni di stampa di questa stampante

Full match (only figures, dates and similar small details are different)

A: There are 2 ways to change print settings for this printerB: Ci sono 2 modi per cambiare le impostazioni di stampa di questa stampante

Using translation memory (TM) software

Page 49: USING THE INTERNET FOR SPECIALISED TRANSLATION

50

Source text (in language A)

ST: “There are 4 ways to change print settings for this printer”

Fuzzy match 85% similar (a few words in translation unit are different)

A: “There are several ways to change print settings for the printer”B: “Ci sono vari modi per cambiare le impostazioni di stampa alla stampante”

Fuzzy match 60% similar (some words in translation unit are different)

A: “There are several ways to modify the default setting of your printer”B: “Ci sono vari modi per modificare l’impostazione standard della tua stampante”

• With the acceptibility threshold of the TM tool set at 75%, nocandidate translation unit under that level of similarity is retrievedand shown to the translator!!

Using translation memory (TM) software

Page 50: USING THE INTERNET FOR SPECIALISED TRANSLATION

• CATtools - Advantages• canspeedupthetranslation process andincrease productivity• canimprove translation quality (byenhancing terminologicalandphraseological coherence)• canhelptranslators provide quotations• allow forcollaboration overlargeprojects

• TMs/termbases canbesharedbyseveraltranslatorsandupdatedinrealtime

• Uselessforsometexttypes(e.g.literature)• Essentialformanyspecialized/technicaldomains

• Translation agencies require translators touse(specific typesof)CATtools

Page 51: USING THE INTERNET FOR SPECIALISED TRANSLATION

• Technical/practical issues• different approaches:someCATtools have aproprietary,stand-alonetexteditor,others are«integrated»(e.g.toWordprocessor),somerecent ones arefully online

• proprietory vs.interchange formats• nomatches calculated below sentence-level (e.g.at phraselevel)• but Concordance function is becomingstandard

• criteriaused todefine similarity /matches• maching is calculatednotonthebasis ofsentenceorwordmeaning,but onthebasis ofcharacter-string similarityTP:IbambinigiocanoingruppoconilpalloneFM1:Ipampinigiovanoil grulloconiltallone (94%match)FM2:Ibimbi sidivertono giocando acalcio insieme (42%match)

16Someissues about TMs

Page 52: USING THE INTERNET FOR SPECIALISED TRANSLATION

• Language/translation issues• segmentationimpliesthatoverallperceptionoftheST/TTislostà STstructuretendstobereproducedinTT• cross-linguisticdifferencesine.g.cohesivepatternsmightbeoverlooked

• using TMs limits thetranslator’s creativity,as s/heis usuallyexpected tousetheterminology andphraseology included intheTM

• TMs cansometimesbereversed,as if translation direction didnotmatter…

• need tocontrolthereliabilityoftranslationswithin TM

16Someissues about TMs

Page 53: USING THE INTERNET FOR SPECIALISED TRANSLATION

CORPORA AND TRANSLATION

1

Page 54: USING THE INTERNET FOR SPECIALISED TRANSLATION

• “acollectionofnaturally-occurringlanguagetext,chosentocharacterize astateorvarietyofalanguage”(Sinclair,1991:171)

• “acollectionoftextsassumedtoberepresentative ofagivenlanguage,dialect,orothersubsetofalanguage,tobeusedforlinguisticanalysis”(Francis,1992:7)

• “aclosedset oftextsinmachine-readableformestablishedforgeneralorspecificpurposesbypreviouslydefinedcriteria”(Engwall,1992:167)

• “afinite-sizedbodyofmachine-readabletext,sampledinordertobemaximallyrepresentativeofthelanguagevarietyunderconsideration”(McEnery&Wilson,1996:23)

• “acollectionof(1)machine-readable (2)authentic texts[…]whichis(3)sampled tobe(4)representativeofaparticularlanguageorlanguagevariety”(McEneryetal.,2006:5)

What is a corpus? Some (authoritative) definitions

Page 55: USING THE INTERNET FOR SPECIALISED TRANSLATION

What is / is not a corpus…?

AnewspaperarchiveonCD-ROM?Anonlineglossary?Adigital library (e.g.ProjectGutenberg)?All RAI1programmes (e.g.forspoken TVlanguage)

Theanswerisalways“NO”

(seedefinition)

Page 56: USING THE INTERNET FOR SPECIALISED TRANSLATION

Corpora vs. web•Corpora:

– Usuallystable•searches canbereplicated

– Controlovercontents•wecanselect thetextstobeincluded,orhavecontroloverselectionstrategies

– Ad-hoclinguistically-awaresoftwaretoinvestigatethem•concordancers cansort/organiseconcordance lines

•Web (asaccessedviaGoogleorothersearchengines):– Veryunstable

•resultscanchangeatanytimeforreasonsbeyondourcontrol– Nocontrolovercontents

•what/howmanytextsareindexedbyGoogle’s robots?– Limitedcontroloversearchresults

•cannotsortororganisehitsmeaningfully;theyarepresentedrandomly

Click here foranother corpusvs.Googlecomparison

Page 57: USING THE INTERNET FOR SPECIALISED TRANSLATION

• A corpus is a principled collection of naturally occurring electronictexts designed to be a representative sample of language in actual use

• Some of the main features and criteria used to describe and classify corpora:

What types of corpora exist? A brief overview

generalspecialised

writtenspoken (transcribed)

multimodal (audio/video)balanced (sample)

opportunisticsynchronicdiachronic

staticdynamic

closed / finiteopen-ended (monitor)

raw (pre-corpus)marked-up (augmented)

POS-tagged (augmented)annotated (augmented)

monolingualbi- / multilingual

parallelcomparable

Page 58: USING THE INTERNET FOR SPECIALISED TRANSLATION

An example of planned balance:the British National Corpus100 m words of contemporary spoken and written British EnglishRepresentative of British English “as a whole”Designed to be appropriate for a variety of uses: lexicography, education, research, commercial applications (computational tools)Balanced with regard to genre, subject matter and styleSampling and representativeness very difficult to ensure

Page 59: USING THE INTERNET FOR SPECIALISED TRANSLATION

Dynamic (Monitor) vs static (Finite)

A static corpus will give a snapshot of language use at a given time

EasiertocontrolbalanceofcontentMaylimitusefulness,esp.astimepasses

A dynamic corpus is ever-changingCalled“monitor”corpusbecauseallowsustomonitorlanguagechangeovertime

Page 60: USING THE INTERNET FOR SPECIALISED TRANSLATION
Page 61: USING THE INTERNET FOR SPECIALISED TRANSLATION

Concordance for nodeword “eyes” (sorted 1L) generated from the BNC

Page 62: USING THE INTERNET FOR SPECIALISED TRANSLATION

63

Parallel (translational)corpora• containtranslationally“equivalent”texts:STsandtheircorrespondingTTs• needtobealigned,usuallyatthesentencelevel,i.e.SLsentenceXmatchedtoTLsentenceX’• contextisprovidedtoaccountfor“equivalence”and“translationshifts”betweenSTandTT• translationdirectionneedstobeclear,i.e.whichareSLandTLcomponentsofthecorpus

Comparable corpora• textsoriginallyproduced(nottranslated)intherespectivelanguages• consistofindependenttextswhichare“similar”accordingtosomepre-determinedcriteria•thevariouslanguagecomponentsshareasetofcommonfeatures,e.g.texttype,genre,publicationspan,domain,topic• parametersdefiningthissimilarityvarywidely

Parallel vs.comparable multilingual corpora

Page 63: USING THE INTERNET FOR SPECIALISED TRANSLATION

Bilingual parallel corpora on the web

64

• OPUScorpus,opus.lingfil.uu.se

• Avariety ofmultilingual parallel corpora• European Parliament debates (EuroParl corpus)• European CentralBank corpus• UNdocuments• Subtitles (opensubtitle project)• Softwaremanuals (PHP,OO)• …

Page 64: USING THE INTERNET FOR SPECIALISED TRANSLATION

Query

Sort + Launch the query

Choose TL(s)

help

http://opus.lingfil.uu.se/ à EuroParl v7 search interface

Other useful functions

Choose SL

Page 65: USING THE INTERNET FOR SPECIALISED TRANSLATION

66

Comparable Eng/Ita corpus on botany

Page 66: USING THE INTERNET FOR SPECIALISED TRANSLATION

Summing up: corpus use in translationMain uses:Test/generate hypotheses as to interpretation of the source text, and as to appropriate translations

helpful when you’re dealing withlittle known text-types /domainshelpful when you’re dealing withalittle known language

Improve quality – capture subtleties of source text, produce translations which read like native speaker texts

More precisely,Reference corpora provide insights on phraseologicalregularities in discourseComparable corpora (automatic and manual) can be used for (contrastive) specialised/genre-controlled text analysisParallel corpora provide equivalents in context/evidence of translation strategies (and are more versatile than TMs)