finding translation correspondences from parallel parsed corpus for example-based translation eiji...

23
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi Sato (Kyoto-U), Hideo Watanabe (IBM Japan)

Upload: walter-evans

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Finding Translation Correspondences from Parallel Parsed Corpus for

Example-based Translation

Eiji Aramaki (Kyoto-U),

Sadao Kurohashi (U-Tokyo),

Satoshi Sato (Kyoto-U),

Hideo Watanabe (IBM Japan)

Our method

Introduction

1-2%

Co-occurrence informationParallelCorpus

Syntactic InformationTranslation dictionary

Statistical approach

50%

Translationexamples

Goal

大きく 寄与して いること が(great) (contribution) case-maker

大きく 寄与して いること が(great) (contribution) case-maker

This paper showsshows great contributionsgreat contributions of TFPof TFP ・・・

示されている(show)

示されている(show)

・・・全要素生産性 が(TFP) case-maker

全要素生産性 が(TFP) case-maker

Problems

• For finding many correspondences

Translation Dictionary

1: some words can not be consulted by a dictionary

2: ambiguity resolution of consulting dictionary

2 Problems

Overview

• Introduction

• Method

• Experiments

• Conclusion

Method

Step 1 Detection of Phrasal Dependency Structure

Detection of Basic Phrasal Correspondences by Consulting Dictionary

Discovery of New CorrespondencesBy Handling Remaining Phrases

Step 2

Step 3

Step1: Phrasal Dependency Structures

I

bought

this car

by monthly installments

I bought this car by monthly installments.

ESG (English Parser)

Rules

Step1: Phrasal Dependency Structures

RulesRules

Function words are grouped together with a following content-word.

A compound noun is considered as one phrase.

Auxiliary verbs are grouped together with a following verb. (is playing, was tired, …)

A parallel-relation word is considered as one phrase. ( and , or ,… )

Step2: Detection of Phrasal Correspondences

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

… …

… …

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

Step2: Detection of Phrasal Correspondences

… …

… …

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

Step2: Detection of Phrasal Correspondences

… …

… …

Step2: Detection of Phrasal Correspondences

information technology in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

Step2: Detection of Phrasal Correspondences

in science technology

科学 技術 に(Science Technology)

おける 情報 技術(Information Technology)

……

information technology…

• Criteria to choose phrasal correspondences – Correspondences of content words

– Correspondences of neighboring phrases

# of word-link X 2

# of J content-word + # of E content-word

Step2: Detection of Phrasal Correspondences

Method

Step 1 Detection of Phrasal Dependency Structure

Detection of Basic Phrasal Correspondences by Consulting Dictionary

Discovery of New CorrespondencesBy Handling Remaining Phrases

Step 2

Step 3

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

(New)

in post Cold war yearsCold war years

冷戦 終結 後 に(cold-war) (end) (after) case-maker

冷戦 終結 後 に(cold-war) (end) (after) case-maker

and servicesservicesgoods

物 や(object)

サービス の(service)

サービス の(service)

(merge)

• Criteria to discover new correspondences– Local and Global supports

• Local support: other phrasal correspondences within two-phrase distance in the dependency structure.

• Global support: phrase correspondences in the parallel sentences.

– POS Consistency– Inner Sufficiency

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

JapanJapan the rolethe role

日本  は(Japan) case-maker

日本  は(Japan) case-maker

役割 を(Role) case-maker

役割 を(Role) case-maker

果たす(Achieve)

play

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

・・・

technologytechnology become importantbecome important

技術 が(technology) case-maker

技術 が(technology) case-maker

重要 と( important )

重要 と( important )

なっている( become )

has・・・

Step3: Discovery of New CorrespondencesBy Handling Remaining Phrases

Experiments

Evaluation data:

200 sentence-pairs form White Paper & Example sentences in a Japanese-English dictionary

Gold standard data:

We manually tagged correct correspondences on

these sentences.

Correct : Exactly equal with a pre-aligned

Near-correct : Partly matches with a pre-aligned

Wrong : No match with Correct & Near-correct

Output Examples

English Japanese Scoreis being pursued

of G7 nations

geographical proximity

行われている(is doing by )

先進 7 カ国の(advanced 7 countries )

地理的に近い(near in geography)

2.75

2.6

2.0

tree (become)

went [to bed]

She ( held)

その木は(That tree is)

寝る(Go to bed)

彼女は(She is)

1.2

1.0

0.5

Near-correct

Correct

70

75

80

85

90

60 65 70 75 80

Recall

Precision

Precision – Recall

Correct→  

Correct   + Near-Correct × 0.5→  

Conclusion

• We can find more correspondences than statistical approach.

• In comparable corpus, a statistical approach seems to be effective, however in parallel corpus, our approach is more effective to get large number of translation examples.

Statistical approach 1-2% of the input corpus

Our system 51-68% of the input corpus