pantera: a parallel corpus to study translation between ... · pantera: a parallel corpus to study...

4
ROM 2017 abstract PANTERA: a parallel corpus to study translation between Portuguese and Norwegian Diana Santos Linguateca & University of Oslo This paper presents an on-going project, PANTERA, which aims to identify all translations ever published between the two languages Portuguese and Norwegian, and obtain a sample of each for the study of translation between the two. PANTERA stands for Portuguese And Norwegian Texts for Education, Research and Acquisition of relevant knowledge... and started in the autumn of 2013 as a cooperation between Linguateca and the University of Oslo. Al- though the corpus is relatively small, it is already one of the most diversified parallel corpora in the world, if one considers source diversity – in number of authors, time stamps and genres... PANTERA’s development was inspired by the English-Norwegian Parallel Corpus, ENPC [4] and COMPARA [2], and, just like other similar projects – such as CorTrad and PoNTE – uses DISPARA [5] as the underlying software system. PANTERA allows the study of translation and of the similarities and differences between the two languages and is publicly available for search at http://www.linguateca.pt/PANTERA/. Building the corpus involves the following steps: we revise the digitized material in the two languages (or digitize if it is not electronically available), then apply automatic alignment using the Open CWB workbench, and per- form automatic (syntactic) annotation in the two languages. PALAVRAS [1] is used for Portuguese and the Oslo-Bergen-tagger [3] for Norwegian. We also apply some semantic annotation (so far only) to the Portuguese part, Currently 371 texts have been identified for inclusion in PANTERA: 192 translations of Norwegian texts into Portuguese, and 179 translations of texts in Portuguese into Norwegian (see PANTERA’s site for a detailed description of the texts and the sizes of the excerpts). Further information about each translation instance (163 different authors and more than 167 translators) will be stored in the STIG system, in progress (https://stig.hf.uio.no/). 1

Upload: others

Post on 19-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PANTERA: a parallel corpus to study translation between ... · PANTERA: a parallel corpus to study translation between Portuguese and Norwegian Diana Santos Linguateca & University

ROM 2017 abstract

PANTERA: a parallel corpus to studytranslation between Portuguese and Norwegian

Diana Santos

Linguateca & University of Oslo

This paper presents an on-going project, PANTERA, which aims to identifyall translations ever published between the two languages Portuguese andNorwegian, and obtain a sample of each for the study of translation betweenthe two.

PANTERA stands for Portuguese And Norwegian Texts for Education,Research and Acquisition of relevant knowledge... and started in the autumnof 2013 as a cooperation between Linguateca and the University of Oslo. Al-though the corpus is relatively small, it is already one of the most diversifiedparallel corpora in the world, if one considers source diversity – in numberof authors, time stamps and genres...

PANTERA’s development was inspired by the English-Norwegian ParallelCorpus, ENPC [4] and COMPARA [2], and, just like other similar projects –such as CorTrad and PoNTE – uses DISPARA [5] as the underlying softwaresystem. PANTERA allows the study of translation and of the similaritiesand differences between the two languages and is publicly available for searchat http://www.linguateca.pt/PANTERA/.

Building the corpus involves the following steps: we revise the digitizedmaterial in the two languages (or digitize if it is not electronically available),then apply automatic alignment using the Open CWB workbench, and per-form automatic (syntactic) annotation in the two languages. PALAVRAS [1]is used for Portuguese and the Oslo-Bergen-tagger [3] for Norwegian. We alsoapply some semantic annotation (so far only) to the Portuguese part,

Currently 371 texts have been identified for inclusion in PANTERA: 192translations of Norwegian texts into Portuguese, and 179 translations of textsin Portuguese into Norwegian (see PANTERA’s site for a detailed descriptionof the texts and the sizes of the excerpts). Further information about eachtranslation instance (163 different authors and more than 167 translators)will be stored in the STIG system, in progress (https://stig.hf.uio.no/).

1

Page 2: PANTERA: a parallel corpus to study translation between ... · PANTERA: a parallel corpus to study translation between Portuguese and Norwegian Diana Santos Linguateca & University

ROM 2017 abstract

STIG aims to be a platform for all interested in translation studies in general,and especially for the Portuguese-Norwegian pair.

Figure 1 provides a bird’s eye view in terms of the (source) Portuguesetexts included, chronologically. (Gil Vicente, born in the 15th century, aswell as navigation chronicles from 1500 have been ommitted not to bias thepicture.)

Figure 1: Publication dates of originals in Portuguese after 1800

Figure 2 provides a bird’s eye view in terms of the (source) Norwegiantexts included, chronologically.

Figure 2: Publication dates of Norwegian originals

2

Page 3: PANTERA: a parallel corpus to study translation between ... · PANTERA: a parallel corpus to study translation between Portuguese and Norwegian Diana Santos Linguateca & University

ROM 2017 abstract

Above the axis are the translations in Portugal, below in Brazil. In the paper,we will discuss the problems involved in labelling and identifying the texts,as well as possible reasons for the choice of the actual translations.

Some linguistic studies already performed using this corpus will be pre-sented:

• the contrast between respeito (PT) and respekt (NO) as an illus-tration of culture differences reflected by language

• a comparison between the use of the body parts dedo (PT) and taand finger (NO) as an illustration of different conceptualizations ofthe body in the two languages

• the identification of possessive datives and null objects in Portugueseusing the translations into Norwegian, as a technique to elicit complexsyntactic phenomena based on contrastive patterns

Finally, the use of the corpus for the semi-automatic creation of exercises andother teaching materials with the Ensinador paralelo tool [6] will be discussedin the context of the daily work of a university teacher of Portuguese forNorwegians.

The paper will end with some considerations about the possible uses ofthe resource, while also warning against some dangers of non-informed usesof the corpus.

References

[1] Eckhard Bick. The Parsing System ”Palavras”: Automatic GrammaticalAnalysis of Portuguese in a Constraint Grammar Framework. Tese dedoutoramento, Aarhus University, Aarhus University Press, Novembro de2000.

[2] Ana Frankenberg-Garcia e Diana Santos. Introducing COMPARA, thePortuguese-English parallel translation corpus. Em Federico Zanettin,Silvia Bernardini e Dominic Stewart, editores, Corpora in TranslationEducation. St. Jerome Publishing, Manchester, 2003, p. 71–87.

[3] Janne Bondi Johannessen, Kristin Hagen, Andre Lynum e AndersNøklestad. Obt+stat. a combined rule-based and statistical tagger. EmGisle Andersen, editor, Exploring Newspaper Language. Corpus compi-lation and research based on the Norwegian Newspaper Corpus, 2012. p.51–65.

3

Page 4: PANTERA: a parallel corpus to study translation between ... · PANTERA: a parallel corpus to study translation between Portuguese and Norwegian Diana Santos Linguateca & University

ROM 2017 abstract

[4] Signe Oksefjell. A Description of the English-Norwegian Parallel Corpus:Compilation and Further Developments. International Journal of CorpusLinguistics, 4(2):197–216, 1999.

[5] Diana Santos. DISPARA, a system for distributing parallel corpora onthe Web. Em Nuno Mamede e Elisabete Ranchhod, editores, Advancesin Natural Language Processing (PorTAL 2002), Faro, Portugal, 23-26Junho de 2002. p. 209–218.

[6] Diana Santos e Alberto Sim oes. Ensinador paralelo: Alicerces para umapedagogia nova. Em Alberto Sim oes, Anabela Barreiro, Diana Santos,Rui Sousa-Silva e Stella E. O. Tagnin, editores, Linguıstica, Informaticae Traducao: Mundos que se Cruzam. Homenagem a Belinda Maia, 29 demarco de 2015. p. 235–252.

4