wolfgang täger
DESCRIPTION
European Patent Office. European Machine Translation Programme. Wolfgang Täger. December 2006. Programme Partners and Goals. Trigger: Success of JP-EN patent translation Agreement EPO - Member States MT of patents/ abstracts/ communications to/from English Three language pairs per year - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/1.jpg)
EuropeanPatent Office
Wolfgang Täger
December 2006
EuropeanPatent Office
European Machine Translation Programme
![Page 2: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/2.jpg)
The European Patent OfficeEuropeanPatent Office
Programme Partners and Goals
• Trigger: Success of JP-EN patent translation
• Agreement EPO - Member States
1. MT of patents/ abstracts/ communications to/from English
2. Three language pairs per year
3. First three languages: FR - DE - ES
• Candidates for next year: Swedish, Dutch, Italian, Romanian, Greek
![Page 3: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/3.jpg)
The European Patent OfficeEuropeanPatent Office
MT engine
Trial with SMT system (Language Weaver)
Call for tender: Winner Worldlingo (Systran)
Going public (esp@cenet): December 2006
Needed: Improve translation by specific dictionaries
![Page 4: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/4.jpg)
The European Patent OfficeEuropeanPatent Office
Dictionary format
Desiderata • open standard • XML-Unicode• support features of MT engines• support conditional translations (e.g. based on IPC)
Is not intended for terminology (no definitions, lexical focus and no semantic focus).
OLIF format was chosen
How to get dictionaries ? By bilingual term extraction !
![Page 5: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/5.jpg)
The European Patent OfficeEuropeanPatent Office
Available corpora
560.000 EP-B publications => claims in EN,DE,FR
300.000 DE-T2 publications
37.000 ES-B3/T3 publications
=> Align corpora for term extraction, concordancing, translation memory (and SMT)
CL EN CL FR CL DE
DESC EN OR FR OR DE
EP-B1 DE-T2
CL ES
DESC ES
ES B3/T3 (LaTex)
(CL DE)
DESC DE
![Page 6: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/6.jpg)
The European Patent OfficeEuropeanPatent Office
Available corpora
560.000 EP-B publications => claims in EN,DE,FR
300.000 DE-T2 publications
37.000 ES-B3/T3 publications
=> Align corpora for term extraction, concordancing, translation memory (and SMT)
CL EN CL FR CL DE
DESC EN OR FR OR DE
EP-B1 DE-T2
CL ES
DESC ES
ES B3/T3 (LaTex)
(CL DE)
DESC DE
![Page 7: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/7.jpg)
The European Patent OfficeEuropeanPatent Office
Alignment & Extraction
Alignment: Trial at EPO with internally developed SW
Result was not improved by external companies during call for tender.
![Page 8: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/8.jpg)
The European Patent OfficeEuropeanPatent Office
Alignment & Extraction
Call for tender for bilingual term extraction
Winner: DFKI
1. Alignment of corpora, POS tagging, Identification of terms
2. Pairing of terms using clues like co-occurrence score, string similarity, grammatical clues, position, available dictionaries, ...
3. Providing further information like gender, inflection, transitivity, countable, ...
![Page 9: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/9.jpg)
The European Patent OfficeEuropeanPatent Office
Validation & Concordancing
Development of OLIF editor at EPO• Remove noise• Correct entries• Use concordancer (provides statistics based on parallel corpora)
=> DEMO
![Page 10: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/10.jpg)
The European Patent OfficeEuropeanPatent Office
OLIF format
• Support of more languages• Clarification of inflection scheme• Clarification of term vs lex approach• Tools
![Page 11: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/11.jpg)
The European Patent OfficeEuropeanPatent Office
Relational database ??
Concept Term
SurfForm
Lemma
InflForm
LexType
RegEx
Infl
SemRelTransl
Naming
![Page 12: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/12.jpg)
The European Patent OfficeEuropeanPatent Office
Relational database ??
„hot drink ...“ grüner Tee
grüner
grün
Nom. Sg. str. f. pos.
DE, Adj
-er
iLike „klein“
SemRelTransl
Naming
![Page 13: Wolfgang Täger](https://reader036.vdocuments.mx/reader036/viewer/2022062304/568138ad550346895da06ba0/html5/thumbnails/13.jpg)
The European Patent OfficeEuropeanPatent Office
End
Thank you!