the construction of bilingual knowledge bank based on a bitext synchronous parsing technique

44
The Construction Of The Construction Of Bilingual Knowledge Bank Bilingual Knowledge Bank Based On a Bitext Based On a Bitext Synchronous Parsing Synchronous Parsing Technique Technique Computer Aided Translation Computer Aided Translation Unit Unit School of Computer Sciences School of Computer Sciences U U niversity niversity S S cience cience M M alaysia alaysia Example-Based Machine Example-Based Machine Translation Based on the Translation Based on the Synchronous SSTC Annotation Synchronous SSTC Annotation Schema Schema

Upload: nyssa-atkins

Post on 03-Jan-2016

25 views

Category:

Documents


5 download

DESCRIPTION

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema. The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique. Computer Aided Translation Unit School of Computer Sciences U niversity S cience M alaysia. Presentation Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

The Construction Of Bilingual The Construction Of Bilingual Knowledge Bank Based On a Bitext Knowledge Bank Based On a Bitext

Synchronous Parsing TechniqueSynchronous Parsing Technique

Computer Aided Translation UnitComputer Aided Translation UnitSchool of Computer Sciences School of Computer Sciences UUniversity niversity SScience cience MMalaysia alaysia

Example-Based Machine Translation Example-Based Machine Translation Based on the Synchronous SSTC Based on the Synchronous SSTC

Annotation SchemaAnnotation Schema

Page 2: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Presentation Outline

Synchronous Structured String-Tree Correspondence (SSTC)EBMT based on synchronous SSTC

Structured String-Tree Correspondence (SSTC)

Introduction

The Construction of a BKB Based on the Synchronous SSTC

Bitext World-level Mapping (Word Alignment)Bitext Synchronous Parsing Technique

Page 3: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

TheThe SStructured SString-TTree CCorrespondence (SSTCSSTC)

SSTCSSTC = string + arbitrary tree structure + correspondence CorrespondenceCorrespondence = node(X/Y)

eat(2-3 (2-3 /0-4)

cats(1-2/0-2)

mice(3-4/3-4)

all(0-1/0-1)

0 all

1 cats 2 eat 3 mice

4

2-32-32-32-3

22 eat eat

3322 eat eat

33

eat(2-3/0-4)

cats(1-2/0-2)

mice(3-4/3-4)

all(0-1/0-1)

all cats eat mice0-1 1-2 2-3 3-4

0-40-40-40-4

00 all all

11 cats cats

22 eat eat

33 mice mice

4400 all all

11 cats cats

22 eat eat

33 mice mice

44

Tree Tree

String String

X:SNODEX:SNODE Y:STREEY:STREE

interval of the substring that corresponds to the node.interval of the substring that corresponds to the subtree having the node as root.

X:SNODEX:SNODE = =

Y:STREEY:STREE = =

Page 4: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

eat(2-3/0-4)(2-3/0-4)

cats(1-2/0-2)(1-2/0-2)

mice(3-4/3-4)(3-4/3-4)

all(0-1/0-1)

0 all 1 cats 2 eat

3 mice

4

1-21-21-21-2

11catscats

2211catscats

22

Tree

String

eat(2-3/0-4)(2-3/0-4)

cats(1-2/0-2)(1-2/0-2)

mice(3-4/3-4)(3-4/3-4)

all(0-1/0-1)

all cats eat 3 mice

4

0-20-20-20-2

00

all all 11 cats cats

22 00

all all 11 cats cats

22

Tree

String

X:SNODE X:SNODE X:STREE X:STREE

Page 5: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Translation Translation unitsunits

English source sentence “ he picks the ball up”“ he picks the ball up”Malay target sentence “dia kutip bola itu”“dia kutip bola itu”

ENGLISHENGLISH MALAYMALAY

kutip[v](1-2/0-4)

itu[det](3-4/3-4)

dia[n](0-1/0-1)

bola[n](2-3/2-4)

0dia

1kutip

2bola

3itu

4

MEpick[v] up[p](1-2+4-5/0-5)

the[det](2-3/2-3)

he[n](0-1/0-1)

ball[n](3-4/2-4)

0he

1pick

2the

3ball

4up

5

IndexStree

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)

(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

Page 6: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

English source sentence “ I did not give it to him”“ I did not give it to him” French target sentence “Je ne le lui ai pas donn锓Je ne le lui ai pas donné”

IndexStree

IndexSnode

(0-7,0-7)

(0-1,0-1)

(5-6, - )(4-5,2-3)(0-1,0-1)

(2-3, 1-2+5-6)

ENGLISHENGLISH FRENCHFRENCH Translation unitsTranslation units

ai[v]donné [v](4-5+6-7/0-1+2-5+6-7)

Je [n](0-1/0-1)

0Je

1ne

2le

3lui

4ai

5 pas6donné

7

lui [n](3-4/3-4)

F

le [n](2-3/2-3)

(0-2+3-7, 0-1+2-5+6-7)

(6-7,3-4)

Did [v] give [v](1-2+3-4/3-7)

I [n](0-1/0-1)

0I1did

2not

3give

4it

5to

6him

7

to [p](5-6/5-7)

E

it [n](4-5/4-5)

him [n](6-7/6-7)

not [neg](2-3/0-7)

ne[neg] pas[neg](1-2+5-6/0-7)

:

(1-2+3-4, 4-5+6-7)

Page 7: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

miss [v](2-3/0-4)

hopefully [adv](0-1/0-1)

E

IndexStree

IndexSnode

(0-1,0-3) (1-2,6-7) (0-4,0-7) (3-4,3-4)

(3-4,3-4)(2-3,4-5+5-6)(1-2,6-7)(0-1,0-1+1-2+2-3)

ENGLISHENGLISH FRENCHFRENCH

Translation Translation unitsunits

Dale [n](3-4/3-4)Kim [n]

(1-2/1-2)

0 hopefully

1 Kim

2 miss

3 Dale

4

manque[v] á[p](4-5+5-6/0-7)

on[n]espére[v]que[c](0-1+1-2+2-3/0-3)

F

Dale [n](3-4/3-4)

Kim [n](6-7/6-7)

0on

1espére

2que

3 Dale4 manque

5 á6 Kim7

English source sentence “ hopefully Kim miss Dale”“ hopefully Kim miss Dale”French target sentence “on espére que Dale manque á Kim”“on espére que Dale manque á Kim”

Page 8: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

EExample-BBased MMachine TTranslation (EBMTEBMT)

EBMT is the case-based reasoning approach to MT

EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.

Page 9: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

BKBBKB

The general ArchitectureThe general Architecture forfor EBMTEBMT

Source sentence

For Source language

For Target language

correspondence

CombinationCombinationRetrieve

Corresponding TL examples

Retrieve Corresponding TL examples

Targetsentence

Find closest

related SL examples

Find closest

related SL examples

Page 10: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Different senses for the word “bank” :

bank 1: a land beside the river.

bank 2: a place to keep money.

E.g: The1 man2 keep1 his1 money1 in1 the1 bank2.

BKB

Replacement & CombinationReplacement & Combination

source sentence

tagger

target sentence

Tagged source

sentence

List of sub-synchronous SSTCs generated based on

the source sentence

List of Sub-synchronous SSTCs constructed from

the chosen example

A chosen closest synchronous SSTC

example

The resultant synchronous

SSTC

EBMT based on synchronous SSTC.

Page 11: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

English sentence: The lamp is off.Malay translation:

Lampu itu padam.

English sentence:

He pick the ball up.Malay translation:

Dia kutip bola itu.

English sentence: The green signal turn on.

Malay translation: Isyarat hijau itu bertukar.

English sentence:

The old man drink tea.Malay translation:

Lelaki tua itu minum teh.

1

43

2

Source sentence: The old man picks the green lamp up The old man picks the green lamp up

Page 12: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

padam(1)[v](2-3/0-3)

lampu(1)[n](0-1/0-2)

itu(1)[det](1-2/1-2)

0lampu

1itu

2padam

3

2Mis[v](2) off(1)[adv]

(2-3+3-4/0-4)

lamp(1)[n](1-2/0-2)

the(1)[det](0-1/0-1)

0the

1lamp

2is

3off

4

2E IndexStree

IndexSnode

(0-4,0-4)(0-2,0-2)(0-4,0-4)(0-1,1-2)

(0-1,1-2)(0-4,0-4)(1-2,0-1)

(2-3+3-4,2-3)

Set of synchronous SSTCs represents Example-base.

English sentence:

The lamp is off.Malay translation:

Lampu itu padam.

kutip(1)[v](1-2/0-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

0dia

1kutip

2bola

3itu

4

1M1E

pick(1)[v] up(1)[p](1-2+4-5/0-5)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

0he

1pick

2the

3ball

4up

5

IndexStree

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)

(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

English sentence:

He pick the ball up.Malay translation:

Dia kutip bola itu.

Page 13: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

bertukar(2)[v](3-4/0-4)

isyarat(1)[n](0-1/0-3)

itu(1)[det](2-3/2-3)

hijau(1)[adj](1-2/1-2)

0Isyarat

1hijau

2itu

3bertukar

4

3Mturn(1)[v] on(1)[adv]

(3-4+4-5/0-5)

signal(2)[n](2-3/0-3)

the(1)[det](0-1/0-1)

green(1)[adj](1-2/1-2)

0the

1green

2signal

3turn

4on

5

3E IndexStree

IndexSnode

(0-5,0-4)(0-3,0-3)(0-1,2-3)(1-2,1-2)

(1-2,1-2)(0-1,2-3)(2-3,0-1)

(3-4+4-5,3-4)

English sentence: The green signal turn on.

Malay translation: Isyarat hijau itu bertukar.

English sentence: The old man drinks tea.

Malay translation: Lelaki tua itu minum teh.

drink (1)[v] (3-4/0-5)

man (1)[n](2-3/0-3)

the (1)[det](0-1/0-1)

0the

1old

2man

3drink

4 tea

5

old (1)[adj](1-2/1-2)

4E IndexStree

IndexSnode

(0-5,0-5)(0-3,0-3)(0-1,2-3)(1-2,1-2)

(1-2,1-2)(0-1,2-3)(2-3,0-1)(3-4,3-4)

tea (1)[n](4-5/4-5)

0lelaki

1tua

2itu

3minum

4teh

5

minum (1)[v](3-4/0-5)

lelaki (1)[n](0-1/0-3)

itu (1)[det](2-3/2-3)

tua (1)[adj](1-2/1-2)

4M

teh (1)[n](4-5/4-5)

(4-5,4-5)

(4-5,4-5)

Page 14: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

SourceSource: the old man picks the green lamp upthe old man picks the green lamp up

0the1green2signal3turn4on5

turn[v]on[adv] (3-4+4-5/0-5)

signal[n](2-3/0-3)

the[det](0-1/0-1)

green[adj](1-2/1-2)

(2)

is[v]off[adv] (2-3+3-4/0-4)

lamp[n](1-2/0-2)

the[det](0-1/0-1)

0the1lamp2is3off4

(3)

pick[v] up[p](2-3+5-6/0-6)

the[det](3-4/3-4)

boy[n](1-2/0-2)

ball[n](4-5/3-5)

0the

1boy

2pick

3the

4ball

5up

6

(1)

the[det](0-1/0-1)

drink[v](3-4/0-5)

man[n](2-3/0-3)

the[det](0-1/0-1)

0the1old2man3drink4tea5

old[adj](1-2/1-2)

(4)

tea[n](4-5/4-5) green[adj]

(5-6/5-6)

lamp[n](6-7/ 4-7 )

the[det](4-5/4-5)

pick[v] (3-4/ 0-8 )

pick[v] (3-4/ 0-8 )

up[p](7-8/-)up[p](7-8/-)

old[adj](1-2/1-2)

man[n](2-3/0-3 )

the[det](0-1/0-1)

man[n](2-3/0-3)man[n]

(2-3/0-3)

the[det](0-1/0-1)the[det](0-1/0-1)

old[adj](1-2/1-2)old[adj](1-2/1-2)

green[adj](1-2/1-2)

green[adj](1-2/1-2)

pick[v] up[p](2-3+5-6/0-6)pick[v] up[p](2-3+5-6/0-6)

lamp[n](1-2/0-2)lamp[n](1-2/0-2)

Page 15: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Sub-synchronous Sub-synchronous SSTCSSTCs for the source sentences for the source sentence

green[adj](5-6/5-6)

lamp[n](6-7/ 4-7 )

the[det](4-5/4-5)

pick[v] (3-4/ 0-8 )

pick[v] (3-4/ 0-8 )

up[p](7-8/-)up[p](7-8/-)

old[adj](1-2/1-2)

man[n](2-3/0-3 )

the[det](0-1/0-1)

lelaki (1)[n](0-1/0-3)

itu (1)[det](2-3/2-3)

0lelaki

1tua

2itu

3

tua (1)[adj](1-2/1-2)

man(1)[n](2-3/0-3)

the(1)[det](0-1/0-1)

old(1)[adj](1-2/1-2)

0the

1old

2man

3

IndexStree

IndexSnode

(0-3,0-3)(0-1,2-3)(1-2,1-2)

(2-3,0-1)(0-1,2-3)(1-2,1-2)

(1)

pick(1)[v](3-4/3-4)

IndexStree

IndexSnode(3-4,3-4)

(3-4,3-4)

kutip(1)[v](3-4/3-4)

3pick

4 3kutip

4

(2)

lamp(1)[n](6-7/4-7)

the(1)[det](4-5/4-5)

green(1)adj](5-6/5-6)

4the

5green

6lamp

7

IndexStree

IndexSnode

(4-7,4-7)(4-5,6-7)(5-6,5-6)

(6-7,4-5)(4-5,6-7)(5-6,5-6)

lampu(1)[n](4-5/4-7)

itu(1)[det](6-7/6-7)

hijau(1)[adj](5-6/5-6)

4lampu

5hijau

6itu

7

(3)

up(1)[p](7-8/7-8)

IndexStree

IndexSnode(7-8,-)

(7-8,-) 7up

8

(4)

Page 16: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Selected closed exampleSelected closed example

Sub-synchronous Sub-synchronous SSTCSSTCs derived from the examples derived from the example

kutip(1)[v](1-2/0-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

0dia

1kutip

2bola

3itu

4

1M1Epick(1)[v] up(1)[p]

(1-2+4-5/0-5)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

0he

1pick

2the

3ball

4up

5

IndexStree

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

English sentence:

He pick the ball up.Malay translation:

Dia kutip bola itu.

he(1)[n](0-1/0-1)

IndexStree

IndexSnode(0-1,0-1)

(0-1,0-1)

dia(1)[n](0-1/0-1)

0he

1 0dia

1

(1)

(2)pick(1)[v](1-2/0-5)

IndexStree

IndexSnode(0-5,0-4)

(1-2,1-2)

kutip(1)[v](1-2/0-4)

1pick

2 1kutip

2

(4)up(1)[p](4-5/ -)

IndexStree

IndexSnode(- , -)

(4-5, -)

4up

5

bula(1)[n](2-3/2-4)

itu (1)[det](3-4/3-4)

2bula

3itu

4

ball(1)[n](3-4/2-4)

the(1)[det](2-3/2-3)

2the

3ball

4

IndexStree

IndexSnode

(2-4,2-4)(2-3,3-4)

(2-3,0-1)(3-4,2-3)

(3)

Page 17: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

lelaki (1)[n](0-1/0-3)

itu (1)[det](2-3/2-3)

0lelaki

1tua

2itu

3

tua (1)[adj](1-2/1-2)

man(1)[n](2-3/0-3)

the(1)[det](0-1/0-1)

old(1)[adj](1-2/1-2)

0the

1old

2man

3

IndexStree

IndexSnode

(0-3,0-3)(0-1,2-3)(1-2,1-2)

(2-3,0-1)(0-1,2-3)(1-2,1-2)

(1)

pick(1)[v](3-4/3-4)

IndexStree

IndexSnode(3-4,3-4)

(3-4,3-4)

kutip(1)[v](3-4/3-4)

3pick

4 3kutip

4

(2)

lamp(1)[n](6-7/4-7)

the(1)[det](4-5/4-5)

green(1)adj](5-6/5-6)

4the

5green

6lamp

7

IndexStree

IndexSnode

(4-7,4-7)(4-5,6-7)(5-6,5-6)

(6-7,4-5)(4-5,6-7)(5-6,5-6)

lampu(1)[n](4-5/4-7)

itu(1)[det](6-7/6-7)

hijau(1)[adj](5-6/5-6)

4lampu

5hijau

6itu

7

(3)

up(1)[p](7-8/7-8)

IndexStree

IndexSnode(7-8,-)

(7-8,-) 7up

8

(4)

Sub-synchronous SSTCs.

he(1)[n](0-1/0-1)

IndexStree

IndexSnode

(0-1,0-1)

(0-1,0-1)

dia(1)[n](0-1/0-1)

0he

1 0dia

1

(1)

(2)pick(1)[v](1-2/0-5)

IndexStree

IndexSnode(0-5,0-4)

(1-2,1-2)

kutip(1)[v](1-2/0-4)

1pick

2 1kutip

2

(4)up(1)[p](4-5/ -)

IndexStree

IndexSnode(- , -)

(4-5, -)

4up

5

bula(1)[n](2-3/2-4)

itu (1)[det](3-4/3-4)

2bula

3itu

4

ball(1)[n](3-4/2-4)

the(1)[det](2-3/2-3)

2the

3ball

4

IndexStree

IndexSnode

(2-4,2-4)(2-3,3-4)

(2-3,0-1)(3-4,2-3)

(3)

Source sentenceSource sentence Example sentenceExample sentence

Page 18: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

kutip(1)[v](1-2/0-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

dia kutip bola itu0-1 1-2 2-3 3-4

1E 1M

pick(1)[v] up(1)[p](1-2+4-5/0-5)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

he pick the ball up0-1 1-2 2-3 3-4 4-5

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

ReplacementReplacementEnglish Malay

Pick(1)[v](1-2/0-5)

pick(1)[v](1-2/0-5)

IndexStree

IndexSnode(0-5,0-4)

(1-2,1-2)

kutip(1)[v](1-2/0-4)

1pick

2 1kutip

2

(2)

Example partExample part

pick(1)[v](3-4/3-4)

IndexStree

IndexSnode(3-4,3-4)

(3-4,3-4)

kutip(1)[v](3-4/3-4)

3pick

4 3kutip

4

(2)

Source partSource part

kutip(1)[v](1-2/0-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

dia kutip bola itu0-1 1-2 2-3 3-4

1E 1M

pick (1)[v] up(1)[p](1-2+4-5/0-5)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

he pick the ball up0-1 1-2 2-3 3-4 4-5

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

pick (1)[v]pick (1)[v]1-21-2 0-50-5

(0-5,0-4)(0-5,0-4)

(1-2 ,1-2)(1-2 ,1-2)

kutip(1)[v]kutip(1)[v](1-2/0-4)(1-2/0-4)

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

0dia

1kutip

2bola

3itu

4

1E 1M

pick(1)[v] up(1)[p](3-4+4-5/3-4)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

0he

1pick

2the

3ball

4up

5

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

Page 19: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

ReplacementReplacement

(1)

he(1)[n](0-1/0-1)

IndexStree

IndexSnode

(0-1,0-1)

(0-1,0-1)

dia(1)[n](0-1/0-1)

0he

1 0dia

1

Example partExample partlelaki (1)[n]

(0-1/0-3)

itu (1)[det](2-3/2-3)

0lelaki

1tua

2itu

3

tua (1)[adj](1-2/1-2)

man(1)[n](2-3/0-3)

the(1)[det](0-1/0-1)

old(1)[adj](1-2/1-2)

0the

1old

2man

3

IndexStree

IndexSnode

(0-3,0-3)(0-1,2-3)(1-2,1-2)

(2-3,0-1)(0-1,2-3)(1-2,1-2)

(1)

Source partSource part

MalayEnglish

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

dia(1)[n]dia(1)[n](0-1/0-1)(0-1/0-1)

bola(1)[n](2-3/2-4)

0dia

1kutip

2bola

3itu

4

1E 1M

pick(1)[v] up(1)[p](3-4+4-5/3-4)

the(1)[det](2-3/2-3)

he(1)[n]he(1)[n](0-1/0-1)(0-1/0-1)

ball(1)[n](3-4/2-4)

0he

1pick

2the

3ball

4up

5

IndexSnode

(0-5,0-4)(0-1,0-1)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

dia kutip bola itu0-1 3-4 2-3 3-4

1E 1M

pick(1)[v] up(1)[p](3-4+7-8/3-4)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

he pick the ball up0-1 3-4 2-3 3-4 7-8

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

he(1)[n](0-1/0-1)

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

dia kutip bola itu0-1 3-4 2-3 3-4

1E 1M

pick(1)[v] up(1)[p](3-4+7-8/3-4)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

he pick the ball up0-1 3-4 2-3 3-4 7-8

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

he(1)[n]he(1)[n]0-10-1 0-10-1

(0-1,0-1)(0-1,0-1)

(0-1,0-1)(0-1,0-1)

dia(1)[n]dia(1)[n](0-1/0-1)(0-1/0-1)

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

bola(1)[n](2-3/2-4)

0lelaki

1tua

2itu

3kutip

4bola

5itu

6

1M

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

lelaki(1)[n](0-1/0-3)

itu (1)[det](2-3/2-3)

tua (1)[adj](1-2/1-2)

1E

pick(1)[v] up(1)[p](3-4+7-8/3-4)

the(1)[det](2-3/2-3)

ball(1)[n](3-4/2-4)

0the

1old

2man

3pick

4the

5ball

6up

7

man(1)[n](2-3/0-3)

the(1)[det](0-1/0-1)

old(1)[adj](1-2/1-2)

Page 20: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

kutip(1)[v](3-4/3-4)

lelaki tua itu kutip lampu hijau itu 0-1 1-2 2-3 3-4 4-5 5-6 6-7

1M

IndexSnode

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

(1-2+4-5,1-2)

IndexStree

lelaki(1)[n](0-1/0-3)

itu(1)[det](2-3/2-3)

tua(1)[adj](1-2/1-2)

1E

pick(1)[v] up(1)[p](3-4+7-8/0-8)

the old man pick the green lamp up0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8

man(1)[n](2-3/0-3)

the(1)[det]

(0-1/0-1)

old(1)[adj](1-2/1-2)

lamp(1)[n](2-3/0-3)

the(1)[det]

(0-1/0-1)

green(1)[adj]

(1-2/1-2)

(0-1,0-1)(2-4,2-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

lampu(1)[n](0-1/0-3)

itu(1)[det](2-3/2-3)

hijau(1)[adj](1-2/1-2)

(2-3,3-4)

GenerationGeneration

lelaki tua itu kutip lampu hijau itulelaki tua itu kutip lampu hijau itu

lelaki tua itu kutip lampu hijau itulelaki tua itu kutip lampu hijau itu

The translation

The translation for the source sentence is generated from the synchronous SSTC the Malay part, which is the String in the SSTC.

Page 21: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

EBMTEBMT General Problems General Problems

How to utilize more than one example to translate one source sentence

lack of flexibility in representing translation relations between source and target substrings

The construction of well-formed target language sentences from extracted fragments of a BKB.

The treatment of wild linguistic phenomena, which are non-standard, e.g. crossed dependencies

Our approach Our approach overcomes these overcomes these

problems problems

Our approach Our approach overcomes these overcomes these

problems problems

Page 22: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Transfer Approach to MTTransfer Approach to MT

SourceSource TargetTarget

Ana

lysi

s

Ana

lysi

s

Synthesis

Synthesis

transfertransfer

Page 23: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

The general ArchitectureThe general Architecture forfor EBMTEBMT

BKBBKB

Source sentence

For Source language

For Target language

correspondence

CombinationCombinationRetrieve

Corresponding TL examples

Retrieve Corresponding TL examples

Targetsentence

Find closest

related SL examples

Find closest

related SL examples

Page 24: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

How to Construct The Bilingual Knowledge Bank

(BKB) or (Example-Base)

Substantial Reservation !!!

Page 25: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

BiText: Text that is available in two languages.

The Construction of a The Construction of a BKBBKB Based on the Synchronous Based on the Synchronous SSTCSSTC

S: English T: MalayThe basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base.

Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh.

Based on Bitext Synchronous Parsing TechniqueBased on Bitext Synchronous Parsing Technique

Page 26: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Alignment Process

Bi-textApple Pie Parser

Parsing & POS Tagging for the

English source text

Parsing & POS Tagging for the

English source text

Build the SSTC for Malay target text based on the SSTC

for the English source text using the word alignment

Build the SSTC for Malay target text based on the SSTC

for the English source text using the word alignment

Bilingual dictionary Sentence level

word level

Phrase level

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

Malay targetEnglish source

Compile the APP output into SSTC for the English source

text

Compile the APP output into SSTC for the English source

text

English source

Malay target

SSTC Editor English

sourceMalay target

Synchronous SSTC

English source

Malay target

BKB

SchemaSchema

Page 27: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Alignment Process

Bi-textApple Pie Parser

Bilingual dictionary Sentence level

word level

Phrase level

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

Malay targetEnglish source

English source

Malay target

SSTC Editor English

sourceMalay target

Synchronous SSTC

English source

Malay target

BKB

Page 28: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Bitext World-level Mapping (Word Alignment)

Real texts are noisy:- Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa. - crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.

Page 29: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

0102030405060708090

100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400410420430440450

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520

source

targ

et

Page 30: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

mapping

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

180

190

200

210

220

230

240

250

260

270

280

290

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500source

Page 31: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

±n Context Window Word Alignment

S: English T: Malay

0The1basic2idea3of4example5-

6based7parsing8is9very10simple11:

12It13is14to15find16the17corresponding18representation19for20an21input22

sentence23based24on25the26representations27of28similar29sentences

30in31the32example33-34base35 .36

0Idea1asas2bagi3penghuraian4

berasaskan5-6contoh7adalah

8mudah9:10Iaitu11untuk12

mencari13perwakilan14yang15

sepadan16 bagi17suatu18ayat

19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27-

28contoh29.30

The correspondence between the source and the target is denoted by

an interval attached to each subtext according to its offset in the text.

Page 32: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

±n Context Window Word Alignment

Find the TPCs between the source and the target.

(Bilingual dictionary)

Cognate words Computer Komputer

Bilingual dictionary

Dice coefficient Dice = 2prob(S,T) / [prob(S) + prob(T)]

-The probabilities of S and T to occur in the text.

-The probability of both to co-occur in the same

bitext segment.

Page 33: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

±n Context Window Word Alignment

Find out the chains for all possible TPCs for a source word.

Example(4-5)contoh(6-7)

contoh(28-29)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

bagi(2-3) penghuraian(3-4) berasaskan(4-5) – (5-6) contoh (6-7)

– (27-28) contoh(28-29)

basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)

Page 34: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

±n Context Window Word Alignment

For every chain, calculate the weight W:

)(*

1)(

)(log chainlen

gaplen

seqlenw

len(seq): length of continuous sequence of words. len(gap): length of the gaps between the words in the chain. len(chain): length of the chain.

Example(4-5)contoh(6-7)

contoh(28-29)

W=1.39

W=0.60

Page 35: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Bitext Synchronous Parsing Technique

The basic idea of example-based parsing is very simple

Idea asas bagi penghuraian berasaskan – contoh adalah mudah

S: English T: Malay

0The1basic2idea3of4example5-

6based7parsing8is9very10simple11:

12It13is14to15find16the17corresponding18representation19for20an21input22

sentence23based24on25the26representations27of28similar29sentences

30in31the32example33-34base35 .36

0Idea1asas2bagi3penghuraian4

berasaskan5-6contoh7adalah

8mudah9:10Iaitu11untuk12

mencari13perwakilan14yang15

sepadan16 bagi17suatu18ayat

19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27-

28contoh29.30

Page 36: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Alignment Process

Bi-textApple Pie Parser

Bilingual dictionary Sentence level

word level

Phrase level

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

Malay targetEnglish source

English source

Malay target

SSTC Editor English

sourceMalay target

Synchronous SSTC

English source

Malay target

BKB

Page 37: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Apple Pie Parser (Apple Pie Parser (APPAPP))

It is a bottom-up probabilistic chart parser to find the parse tree for an input text (English).

It was developed at New York University.

It is Free, and available to download with the source code.

The parser generates a syntactic tree in PennTreeBank bracketing.

http://cs.nyu.edu/cs/projects/proteus/sekine

Page 38: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Apple Pie Parser (Apple Pie Parser (APPAPP))

The basic idea of example-based parsing is very simple

APPAPP

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

The representation structure and the POS for the source English is obtained

Page 39: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Alignment Process

Bi-textApple Pie Parser

Bilingual dictionary Sentence level

word level

Phrase level

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

Malay targetEnglish source

English source

Malay target

SSTC Editor English

sourceMalay target

Synchronous SSTC

English source

Malay target

BKB

Page 40: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Compile the APP output to SSTC structure

(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Tree

String

is(8-9/8-9)

The basic idea(0-3/0-3)

of(3-4/3-4)

Example-based parsing(4-8/4-8)

Very simple(9-11/9-11)

S(Ø/0-11)

VP(Ø/8-11)

NP(Ø/0-8)

NPL(1)(Ø/0-3)

NPL(1)(Ø/4-8)

PP(1)(Ø/3-8)

ADJP(1)(Ø/9-11)

Page 41: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Lexical Transfer

The basic idea of example-based parsing is very simple

Idea asas bagi penghuraian berasaskan – contoh adalah mudah

0the1basic2idea3of4example5-6based7parsing8is9very10simple11

Tree

String

is(8-9/8-9)

The basic idea(0-3/0-3)

of(3-4/3-4)

Example-based parsing(4-8/4-8)

Very simple(9-11/9-11)

S(Ø/0-11)

VP(Ø/8-11)

NP(Ø/0-8)

NPL(1)(Ø/0-3)

NPL(1)(Ø/4-8)

PP(1)(Ø/3-8)

ADJP(1)(Ø/9-11)

0idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah8mudah9

Tree

String

adalah(7-8/7-8)

Idea asas(0-2/0-2)

bagi(2-3/2-3)

Penghuraian berasaskan-contoh(3-7/3-7)

mudah(8-9/8-9)

S(Ø/0-9)

VP(Ø/7-9)

NP(Ø/0-7)

NPL(1)(Ø/0-2)

NPL(1)(Ø/3-7)

PP(1)(Ø/2-7)

ADJP(1)(Ø/8-9)

Page 42: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Alignment Process

Bi-textApple Pie Parser

Bilingual dictionary Sentence level

word level

Phrase level

( S ( NP . ( ..(..)))

( S ( VP …( ..(..)))

Malay targetEnglish source

English source

Malay target

SSTC Editor English

sourceMalay target

Synchronous SSTC

English source

Malay target

BKB

Page 43: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

The synchronous The synchronous SSTCSSTC editor. editor.

File Edit Correspondences Windows

0the1 basic2 idea3 of4 exampleexample5 5 ––6 6 basedbased7 7

parsingparsing88 is9 very10 simple11

0Idea1 asas2 bagi3 penghuraianpenghuraian4 4 berasaskanberasaskan5 5

––6 6 contohcontoh77 adalah 8 mudah9

is(8-9/8-9)

The basic idea

(0-3/0-3)

of(3-4/3-4)

Example-based parsing(4-8/4-8)

Very simple(9-11/9-11)

S(Ø/0-11)

VP(Ø/8-11)

NP(Ø/0-8)

NPL(1)(Ø/0-3)

NPL(1)(Ø/4-8)

PP(1)(Ø/3-8)

ADJP(1)(Ø/9-11)

adalah(7-8/7-8)

Idea asas(0-3/0-3)

bagi(2-3/2-3)

Penghuraian berasaskan-contoh

(3-7/3-7)

mudah(8-9/8-9)

S(Ø/0-9)

VP(Ø/7-9)

NP(Ø/0-7)

NPL(1)(Ø/0-2)

NPL(1)(Ø/3-7)

PP(1)(Ø/2-7)

ADJP(1)(Ø/8-9)

Page 44: The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Discussion Discussion

Thank you…..Thank you…..