kanlaya-waterloo 04/23/02

24
Kanlaya-Waterloo 04/23/02

Upload: kelii

Post on 21-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Kanlaya-Waterloo 04/23/02. Some MT Approaches. Kanlaya-Waterloo 04/23/02. Direct MT. SL-TL Dict. Word Treatment. S ource L anguage. T arget L anguage. Simple Process BUT limited Accuracy. Transfer MT. SL Gram. TL Gram. Analyzer. Transfer. Generator. SL. TL. SL - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kanlaya-Waterloo   04/23/02

Kanlaya-Waterloo 04/23/02

Page 2: Kanlaya-Waterloo   04/23/02

Some MT Approaches

Kanlaya-Waterloo 04/23/02

Category Approach Advantages Disadvantages

Three Classic Strategies

Direct MT Simple approach.

Limited accuracy.

Interlingual MT Treat SL and TL separately. Difficulties in developing Interlingual system.

Transfer MT Less complexity in analysis and generation.

Information lost during the transfer process.

Language dependent.

Nonlinguistic Information Strategies

Example-Based MT

Translations come with scores which might be useful to a post editor.

Relies heavily on the availability of a large and diverse quality corpus which are not available for most

Statistics-Based MT

Apply statistical techniques in translation process.

languages.

Hybrid strategies Knowledge-Based MT

Interpreting the SL by reference to world knowledge.

Reasoning is complex. More suitable for paraphrase

than for translation.

Shake-and-Bake MT

SL and TL are associated via bilingual lexicon without a transfer module or interlingual.

Lack efficiency due to the generation process which tries all possibilities.

Page 3: Kanlaya-Waterloo   04/23/02

Direct MTSL-TL Dict

Simple Process BUT limited Accuracy

Word TreatmentSource Language Target Language

Transfer MT

More Accurate BUT Inadequate & Not Attractive To Multilingual System

SL Gram

AnalyzerSLSL

RepresentationTransfer

SL Dict

TL Representation

Generator TL

TL Gram

TL Dict

The Best Approach for Multilingual MT BUT Still an Ideal

Interlingual MTSL Gram

AnalyzerSL

SL-IL Dict

Generator TL

TL Gram

IL-TL Dict

IL

Kanlaya-Waterloo 04/23/02

Page 4: Kanlaya-Waterloo   04/23/02

Translation Examples

SL: The Spirit is willing but the flesh is weak.

Russian English

The whisky is all right but the meat has gone bad.

The vodka is good but the meat is rotten.

The liquor is holding out all right, but the meat has

spoiled.

Paraphased : The mind is ready but the body is weak.

Kanlaya-Waterloo 04/23/02

Page 5: Kanlaya-Waterloo   04/23/02

Some MT Approaches

Category Approach Advantages Disadvantages

Three Classic Strategies

Direct MT Simple approach.

Limited accuracy.

Interlingual MT Treat SL and TL separately. Difficulties in developing Interlingual system.

Transfer MT Less complexity in analysis and generation.

Information lost during the transfer process.

Language dependent.

Nonlinguistic Information Strategies

Example-Based MT

Translations come with scores which might be useful to a post editor.

Relies heavily on the availability of a large and diverse quality corpus which are not available for most

Statistics-Based MT

Apply statistical techniques in translation process.

languages.

Hybrid strategies Knowledge-Based MT

Interpreting the SL by reference to world knowledge.

Reasoning is complex. More suitable for paraphrase

than for translation.

Shake-and-Bake MT

SL and TL are associated via bilingual lexicon without a transfer module or interlingual.

Lack efficiency due to the generation process which tries all possibilities.

Advantages Disadvantages

+ accurate translation

Kanlaya-Waterloo 04/23/02

Page 6: Kanlaya-Waterloo   04/23/02

Generate and Repair MT : A hybrid MT system

SLTranslation Candidate

SL,SL-TL Dict

Inappropriate TC

RI

TL

TCE

accept

accept

SL, TL Const.

Accuracy Simplicity Modularity Extendibility Multilinguality

SL Grammar & Lexicon

TL Const.

Repaired TC

TL Grammar & Lexicon

ALMTALMT TCE RI

Kanlaya-Waterloo 04/23/02

Page 7: Kanlaya-Waterloo   04/23/02

Generate and Repair MT : A hybrid MT system

SLTranslation Candidate

SL,SL-TL Dict

Inappropriate TC

RI

TL

TCE

accept

accept

SL, TL Const.

Accuracy Simplicity Modularity Extendibility Multilinguality

SL Grammar & Lexicon

TL Const.

Repaired TC

TL Grammar & Lexicon

ALMT

Kanlaya-Waterloo 04/23/02

Page 8: Kanlaya-Waterloo   04/23/02

Generate and Repair MT : A hybrid MT system

ALMTSLTranslation Candidate

SL,SL-TL Dict

Inappropriate TC

RI

TL

TCE

accept

accept

SL, TL Const.

Accuracy Simplicity Modularity Extendibility Multilinguality

SL Grammar & Lexicon

TL Const.

Repaired TC

TL Grammar & Lexicon

Kanlaya-Waterloo 04/23/02

Page 9: Kanlaya-Waterloo   04/23/02

Generate and Repair MT : A hybrid MT system

ALMTSLTranslation Candidate

SL,SL-TL Dict

Inappropriate TC

RI

TL

TCE

accept

accept

SL, TL Const.

Accuracy Simplicity Modularity Extendibility Multilinguality

SL Grammar & Lexicon

TL Const.

Repaired TC

TL Grammar & Lexicon

Kanlaya-Waterloo 04/23/02

Page 10: Kanlaya-Waterloo   04/23/02

1.Analysis Lite MT : Generate a Translation Candidate (TC)

SL TC

Inflectional Analysis

SL Const.

Constraint Application

SL Dict

WordAsso Application

WordAsso

Word Ordering

Word Addition

TL Const. W-order KB

Word Treatment Word Selection Word Ordering

foundDict

look-up

SL-TL Dict

found

Kanlaya-Waterloo 04/23/02

Page 11: Kanlaya-Waterloo   04/23/02

A mother duck hatches seven eggs in a nest.SL :

mother duck hatch seven egg plural in nest

Constraint Application

mother duck hatches seven egg plural in nest

Word Selection

Word Addition

TC :

Dictionary Look-up & Inflectional Analysis

The word “ฟอง” is added to indicate the quantity of eggs”.

Word Ordering

แม� เปด กก เจ�ด ไข� ใน ร�ง เล��ยงด� ก�ม ค�ด

กระต��น เข�ามา ทำ"าร�ง จ�บกด

ลง เข�าไป

ผ�าล�น�นขาว

อย�าง

แม� เปด กก เจ�ด ไข� ใน ร�ง

แม� เปด กก เจ�ด ไข� ฟอง ใน ร�ง

แม� เปด กก ไข� เจ�ด ฟอง ใน ร�งKanlaya-Waterloo 04/23/02

Page 12: Kanlaya-Waterloo   04/23/02

แม� เปด กก ไข� เจ�ด ฟอง ใน ร�งSL :

แม� เปด กก ไข� เจ�ด plural ใน ร�ง

Constraint Application

แม� เปด กก ไข� เจ�ด plural ใน ร�ง

Word Selection

Word Addition

TC :

Dictionary Look-up & Inflectional Analysis

Word Ordering

mother duck hatch egg seven in nest inne

r reed

mother duck hatch egg seven in nest

a mother duck hatches eggs seven in a nest

A mother duck hatches seven eggs in a nest.

inside

Kanlaya-Waterloo 04/23/02

Page 13: Kanlaya-Waterloo   04/23/02

2.Translation Candidate Evaluation : Verify a TC

Semantic Comparison

SemanticCompariso

n

Analyzer

accept

accept

Translation result

TC with semantic differences info.

Semantic Extraction

TC

SL

SL Grammar & Lexicon

SL Parsing

TL Parsing

TL Grammar & Lexicon

Kanlaya-Waterloo 04/23/02

Page 14: Kanlaya-Waterloo   04/23/02

HPSG…What ?

HPSG

An integrated theory of natural language syntax and

semantics.

A feature-based grammatical framework.

HPSG Key Features

Small number of rules and rich-information lexicon.

Unification-based constraints.

Language-independent principles.

HPSG…What and Why ?

Kanlaya-Waterloo 04/23/02

Page 15: Kanlaya-Waterloo   04/23/02

SLTL

Kanlaya-Waterloo 04/23/02

Page 16: Kanlaya-Waterloo   04/23/02

SL TL

Kanlaya-Waterloo 04/23/02

Page 17: Kanlaya-Waterloo   04/23/02

SL TL

Kanlaya-Waterloo 04/23/02

Page 18: Kanlaya-Waterloo   04/23/02

SL: John talks about the broken spring.TC : จอห'น พ�ด เก�)ยวก�บ ฤด�ใบไม�ผล� ทำ�) ห�ก น��น

STRING: 0 จอห'น 1 พ�ด 2 เก�)ยวก�บ 3 ฤด�ใบไม�ผล� 4 · ทำ�) 5 ห�ก 6 น��น CATEGORY: phrase QSTORE ne_set_quant ELT quant DET the RESTIND nom_obj INDEX [0] ref IGENN neut INUM sg IPER per MODE reff RESTR ne_set_psoa ELT psoa NUCLEUS broken1 WORDASSO 13-8-5 INSTANCE [0] SURFACE ห�ก QUANTS e_list ELTS ne_set_psoa ELT psoa NUCLEUS season WORDASSO 2-4-3-2 INSTANCE [0] SURFACE ฤด�ใบไม�ผล� QUANTS e_list ELTS e_set ELTS e_set . . .

STRING: 0 john 1 talks 2 about 3 the 4 broken 5 spring 6 CATEGORY: phrase QSTORE ne_set_quant ELT quant DET the RESTIND nom_obj INDEX [0] ref IGENN neut INUM sg IPER per MODE reff RESTR ne_set_psoa ELT psoa NUCLEUS broken1 WORDASSO 13-8-5 INSTANCE [0] SURFACE broken QUANTS e_list ELTS ne_set_psoa ELT psoa NUCLEUS device WORDASSO 1-1-2-1-1-3-1-4 INSTANCE [0] SURFACE spring QUANTS e_list ELTS e_set ELTS e_set . . .

English Word Treatment Output Descriptionspring ÄÙãºäÁé¼ÅÔ (ryduubajmáajphlì),

2-4-3-2the season between winter andsummer

Ê»ÃÔ§ (sapring),1-1-2-1-1-3-1-4

an elastic device

¡ÃÐâ (kradòod),2-1-5-5-1

to move suddenly upward orforward

¹éÓ¾Ø(námphú),1-1-2-1-4-1

place where water comes upnaturally from the ground

Kanlaya-Waterloo 04/23/02

Page 19: Kanlaya-Waterloo   04/23/02

3.Repair and Iterate : Repair a TC

TC with the semantic

differences info.Repair

Word Treatment Output

accept

accept

Translation resultTCEWord Ordering

Repaired

TC

Kanlaya-Waterloo 04/23/02

Page 20: Kanlaya-Waterloo   04/23/02

SL : John talks about the broken spring.TC :จอห'น พ�ด เก�)ยวก�บ สปร�ง ทำ�) ห�ก น��น

STRING: 0 john 1 talks 2 about 3 the 4 broken 5 spring 6 CATEGORY: phrase QSTORE ne_set_quant ELT quant DET the RESTIND nom_obj INDEX [0] ref IGENN neut INUM sg IPER per MODE reff RESTR ne_set_psoa ELT psoa NUCLEUS broken1 WORDASSO 13-8-5 INSTANCE [0] SURFACE broken QUANTS e_list ELTS ne_set_psoa ELT psoa NUCLEUS device WORDASSO 1-1-2-1-1-3-1-4 INSTANCE [0] SURFACE spring QUANTS e_list ELTS e_set ELTS e_set . . . CONT psoa NUCLEUS talk2 WORDASSO 2-1-3-1-1

STRING: 0 จอห'น 1 พ�ด 2 เก�)ยวก�บ 3 สปร�ง 4 ทำ�) 5 ห�ก 6 น��น CATEGORY: phrase QSTORE ne_set_quant ELT quant DET the RESTIND nom_obj INDEX [0] ref IGENN neut INUM sg IPER per MODE reff RESTR ne_set_psoa ELT psoa NUCLEUS broken1 WORDASSO 13-8-5 INSTANCE [0]

SURFACE ห�ก QUANTS e_list ELTS ne_set_psoa ELT psoa NUCLEUS device WORDASSO 1-1-2-1-1-3-1-4 INSTANCE [0]

SURFACE สปร�ง QUANTS e_list ELTS e_set ELTS e_set . . . CONT psoa

Kanlaya-Waterloo 04/23/02

Page 21: Kanlaya-Waterloo   04/23/02

ทำ�)พงหญ�า แม�เปดกกไข�เจ�ดฟองในร�ง ล�กเปดกะเทำาะเปล-อกไข� เปล-อกไข�หกฟองถ�กกะเทำาะ ไข�ฟองทำ�)เจ�ดม�ขนาดใหญ�ทำ�)ส�ด แม�เปดพยายามกกไข�ฟองทำ�)เจ�ด ในทำ�)ส�ดไข�ฟองส�ดทำ�ายก�กะเทำาะ แม�เปดตะล/งเม-)อม�นเห�นล�กเปดต�วน��น อย�างไรก�ตามแม�,เปดก�ยอมร�บล�กของม�น ล�กเปดข��เหร�ม�ขนส�เทำา ล�กเปดต�วอ-)นๆม�ขนส�เหล-องนวล แม�เปดพาล�กเปดเจ�ดต�วว�ายน"�า ล�กเปดเจ�ดต�วเด�นตามแม�เปด ล�กเปดข��เหร�เด�นต�วส�ดทำ�าย ฝ�งว�วและฝ�งแกะก"าล�งก�นหญ�า พวกม�นห�วเราะเยาะล�กเปดข��เหร� ล�กเปดข��เหร�เส�ยใจและน�อยใจ

ล�กเปดข��เหร�ร�องไห� ล�กเปดเจ�ดต�วว�ายน"�าในแอ�งน"�าใหญ� ล�กเปดข��เหร�ว�ายน"�าไกลกว�าต�วอ-)นๆ ล�กเปดหกต�วไม�เล�นก�บล�กเปดข��เหร� หม�และไก�ล�อเล�ยนล�กเปดข��เหร� ล�กเปดข��เหร�ร�องไห�อ�กคร��ง ล�กเปดข��เหร�ซ�อนต�วในพงหญ�า ล�กเปดข��เหร�หน�ไปจากแอ�งน"�า ม�นหว�งว�าม�นจะม�เพ-)อนใหม� ล�กเปดข��เหร�พเนจรอย�างโดดเด�)ยว ล�กเปดข��เหร�มาถ/งบ/ง เปดป4าจ�องด�ล�กเปดข��เหร� ล�กเปดข��เหร�อาศ�ยอย��ก�บเปดป4า เช�าว�นหน/)งนายพรานมาถ/งบ/ง นายพรานย�งป7น เปดป4าหน�ไปด�วยความหวาดกล�ว ล�กเปดข��เหร�หลงจากเปดป4า ล�กเปดข��เหร�เร�ร�อนอ�กคร��ง ล�กเปดข��เหร�มาถ/งกระทำ�อม หญ�งชราอาศ�ยอย��ในกระทำ�อม หญ�งชราเล��ยงแมวและ แม�ไก� ล�กเปดข��เหร�ทำ�กทำายแมวและไก� แมวส�ด"าเห�นล�กเปดข��เหร� แมวส�ด"าข�� หญ�งชราได�ย�นเส�ยง หญ�งชราหย�บไม�กวาด หญ�งชราต�ล�กเปดข��เหร� ล�กเปดข��เหร�เร�ร�อนอ�กคร��ง บ�ดน��ล�กเปดข��เหร�เต�บโต ล�กเปดข��เหร�เต�บโต ล�กเปดข��เหร�บ�นส�งและไกล ล�กเปดข��เหร�มาถ/งล"าธาร ล�กเปดข��เหร�เห�นหงส'ก"าล�งว�ายน"�า ล�กเปดตะล/งในความสวยงามของหงส' ล�กเปดข��เหร�ทำ�กทำายฝ�งหงส' ล�กเปดข��เหร�เห�นเงาของม�นในน"�า ม�นเปนเงาของหงส' ในทำ�)ส�ดล�กเปดข��เหร�ก�พบพ�อแม�ทำ�)แทำ�จร�ง

Ugly Duckling Thai Version

Kanlaya-Waterloo 04/23/02

Page 22: Kanlaya-Waterloo   04/23/02

At a thicket. A mother duck hatches seven eggs in a nest .A duckling cracks an eggshell .Six eggshells are cracked .The seventh egg has the

biggest size .A mother duck tries to hatch the seventh egg . Finally the last egg cracks. A mother duck amazes when it sees the

duckling .However a mother duck accepts its child .An ugly duckling has gray feather .Other ducklings have yellow feather. A mother duck

leads seven ducklings swim. Seven ducklings walk after a mother duck .An ugly duckling walks the last .Cows and sheeps are eatting

grass.They make_fun_of an ugly duckling .An ugly duckling is sorry and peevish .An ugly duckling cries. Seven ducklings swim in a big

pond .An ugly duckling swims further than others .Six ducklings do not play with an ugly duckling .A pig and a hen banter an ugly duckling

.An ugly duckling cries again .An ugly duckling hides in a thicket .An ugly duckling escapes from a pond .It hopes that it will have a new

friend .An ugly duckling wanders lonely .An ugly duckling arrives at a swamp .A wild duck stares an ugly duckling .An ugly duckling lives

with a wild duck. In_the_morning a hunter arrives at a swamp .A hunter shoots. A wild duck escapes with fear .An ugly duckling loses from

a wild duck .An ugly duckling wanders again .An ugly duckling arrives at a cottage. An old woman lives in a cottage .An old woman feeds

a cat and a hen .An ugly duckling greets a cat and a hen. A black cat sees an ugly duckling .A black cat screams. An old woman hears a

sound .An old woman picks a bloom .An old woman hits an ugly duckling . An ugly duckling wanders again. Now an ugly duckling

grows .An ugly duckling grows. An ugly duckling flies high and far .An ugly duckling arrives at a stream .An ugly duckling sees a swan is

swimming .A duckling amazes in a beauty of a swan .An ugly duckling greets swans .An ugly duckling sees its shadow in water. It is a

shadow of a swan .Finally an ugly duckling meets a real parent .

English Translated Version

Kanlaya-Waterloo 04/23/02

Page 23: Kanlaya-Waterloo   04/23/02

Conclusions

GRMT a novel approach to MT which generates accurate translations. consists of three major components, ALMT, TCE and RI. ALMT generates the TC by taking into account the differences

between languages (constraints) and the semantic relationship between

words. TCE an RI ensure an accuracy of the translation by verifying the syntax

and semantics of the TC and improving the quality of the TC if necessary.

Key features: Simplicity, Modularity, Extendibility and Multilinguality

EnglishThai MT system prototype

Knowledge-bases: English Dictionary, Thai Dictionary, EnglishThai Dictionary,

English (HPSG) grammar, Thai (HPSG) grammar,

English Lexicon, Thai Lexicon.

Kanlaya-Waterloo 04/23/02

Page 24: Kanlaya-Waterloo   04/23/02

Suggestions/Comments

Thanks

Kanlaya-Waterloo 04/23/02