using a parallel corpus in translation practice and research ana frankenberg-garcia...

34
Using a parallel corpus in translation practice and research Ana Frankenberg-Garcia [email protected]

Upload: jaron-croll

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Using a parallel corpus in translation

practice and research

Ana Frankenberg-Garcia

[email protected]

Machine Translation

Using machines to analyse

Human Translation

The study of human translation

Traditionally not a hard scienceDifficult to be systematic

But with the technology of corpus linguistics, things

can change …

What is a corpus?

large

specific criteriatext-retrieval software

machine-readable

Advantages of using corpora to study human translation

An enormous amount of translated texts

Systematic analyses

Quantifiable results

A bi-directional parallel corpus of Portuguese and English

COMPARAProject leadersAna Frankenberg-Garcia & Diana Santos

Research assistantsRosário Silva & Susana Inácio

Initial support (1999-2000)FCT (Portugal)ISLA (Lisboa) Oxford University (Language Centre)

Present funding (2001-2006)Linguateca: FCT/ POSI (POSI/PLP/43931/2001)

PT source texts EN source texts

COMPARA structure

EN translationsPT translations

COMPARA

COMPARA

English PortugueseOriginal Translated Portuguese Portuguese

Original TranslatedEnglish English

Source TranslationsTexts

COMPARA 8.0 varieties

Portugal

Brazil

Angola

Mozambique

UK

US

South Africa

PORTUGUESE ENGLISH

Unbalanced distribution!

COMPARA 8.0 Publication dates

1837

2002

1880

1997

1988

1914

COMPARA 8.0 genrePublished fiction other genres

EXTENSIBLE

COMPARA 8.0 authors

Portuguese writersCamilo Castelo Branco

Eça de Queirós

José Cardoso Pires

José Saramago

Jorge de Sena

Lídia Jorge

Mário de Carvalho

Sá Carneiro

COMPARA 8.0 authorsBrazilian writersAluísio Azevedo

Autran Dourado

Chico Buarque

Jô Soares

José de Alencar

Machado de Assis

Manuel Antônio de Almeida

Marcos Rey

Patrícia Melo

Paulo Coelho

Rubem Fonseca

COMPARA 8.0 authors

Angolan writersJosé Eduardo Agualusa

Mozambiquean writersMia Couto

COMPARA 8.0 authorsBritish writersDavid Lodge

Ian McEwan

Julian Barnes

Joseph Conrad

Joanna Trollope

Kazuo Ishiguro

Lewis Carrol

Mary Shelley

Oscar Wilde

COMPARA 8.0 authors

American writersHenry James

Edgar Allan Poe

Richard Zimler

South African writersNadine Gordimer

Can any text be included in the corpus?

Only published source texts and translations

Only English translated directly from Portuguese

Portuguese translated directly from English

Only human translations!

71 source texts (extracts)

74 translations

COMPARA 8.0 texts

COMPARA 8.0 size

1,536,269 1,423,937

words words

in in English Portuguese

Largest edited parallel corpus containing Portuguese

COMPARA users and usesLanguage learners - bilingual dictionary with examples

Language teachers - exercises and tests

Translators - language equivalents

Translation lecturers - exercises & problems

Translation theorists - test translation hypotheses

Lexicographers - bilingual dictionaries

Computational linguists - machine translation

Latest statistics: + 6000 queries per month

COMPARA availability

Free, online

For research and education

www.linguateca.pt/COMPARA/ COMPARA access

COMPARA

“nodded”

Studies using COMPARA

1. Observing source texts and translations

2. Constrasting Portuguese and English

3. Comparing translated and untranslated language

4. Examining the characteristics of translated texts

1. Observing source texts & translations

Improving bilingual dictionaries and machine-translation programs

Frankenberg-Garcia (2002) nod

Ribeiro & Dias (2005) grande

Specia et al. (2005) word-sense disambiguation

2. Contrasting English and Portuguese

Contrasting original fiction in English and Portuguese

Frankenberg-Garcia (2005)

PTLoan words

EN Loan words

PTLoan languages

EN Loan languages

3. Comparing translated and untranslated language

diferente(s)

simplesmente

end.* up

translations source texts *

30,7 15,4

15,6 5,1

13,5 2,8

* frequency/100 K words in COMPARA 7.0.4

2 x

3 x

4 x

lemma “rezar”

5,6 12,42 x

4. Examining the characteristics of translated texts

Are translations longer than source texts?Frankenberg-Garcia (2004)

Explicitation Hypothesis

Pt1500

words

Pt1500

words

Pt1500

words

Pt1500

words

Pt1500

words

Pt1500

words

Pt1500

words

Pt1500

words

En1500

words

En1500

words

En1500

words

En1500

words

En1500

words

En1500

words

En1500

words

En1500

words?

Source texts Translations

8 PT authors8 EN authors

8 PT translators8 EN translators

ST

TTTT

TTTT

TTTT

TTTT

TTTT

TTTT

TTTT

TTTT

+ 5%

Matched t-test: 95% probabilityTT longer than ST

Source texts Translations

Studies such as these were unthinkable before corpora

Many other studies are possible!

COMPARA is free and available online

Contact us: [email protected] [email protected]

To conclude....