from the unl hypergraph to geta's multilevel tree etienne blanc geta, clips-imag bp 53, f-38041...

29
From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 [email protected]

Upload: morgan-bentley

Post on 26-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

From the UNL hypergraph

to GETA's multilevel tree

Etienne BLANCGETA, CLIPS-IMAG

BP 53, F-38041 Grenoble cedex [email protected]

Page 2: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

- UNL: Universal Networking Language - Pivot language used in the UNL programme. - Aim of the UNL programme : to make the interlingual communication over the Net easier by using this interlingua. - a programme coordinated and supported by the United Nations University (Pr Uchida, Pr Della Senta, Tokyo). Soon a UNL fundation in Geneva. - Several « Language Centers » involved , among which the

- GETA: Groupe d’Etude pour la Traduction Automatique in charge of the French part of the programme.

- hypergraph, multilevel tree : ways of coding a text meaning in UNL and in GETA’s methodology.

From the UNL hypergraph to GETA's multilevel tree

Page 3: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

I. Overview of the UNL programme

II. The UNL to French deconverter

III. From the UNL hypergraph to GETA’s multilevel tree

IV. Conclusion

Page 4: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

Universal Networking Language

Portuguese

Arabic

ChineseFrench

Indonesian

German

Hindi

Italian

Japanese

LatvianMongolian

Russian

Spanish

Thai

English

I. OVERVIEW OF THE UNL PROGRAMMETHE PROGRAMME IS BASED ON THE PIVOT PARADIGM

AND TACKLES 16 LANGUAGES

Korean

Page 5: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

I. OVERVIEW OF THE UNL PROJECTA DOCUMENT IS DISPATCHED

IN THE UNL FORMAT OVER THE NET

Source document (Chinese)

Enconverted document (UNL)

Deconverted document

(Mongolian)

Deconverted document (Latvian)

Deconverted document

(Japanese)

Deconverted document (French)

Enconversion

Deconversions

Page 6: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

[S:4]{org:cn}貓在閣樓抓了一隻灰色老鼠{/org}{unl}agt(catch.@entry.@pres,cat(icl>animal).@def)obj(catch.@entry.@pres,mouse(icl>animal).@indef)plc(catch.@entry.@pres,attic.@def)mod(mouse(icl>animal).@indef,grey(icl>color)){/unl}{ab}{/ab}{cn}{/cn} ...{fr}{/fr}...{sh}{/sh}{th}{/th}[/S]

ENCONVERSION

DECONVERSIONS

I. OVERVIEW OF THE UNL PROJECT:A SENTENCE IN A UNL FILE

Page 7: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

agt(catch(icl>do).@entry.@pres,cat.@def)

I. OVERVIEW OF THE UNL PROJECT UNIVERSAL WORDS AND BINARY RELATIONS

agt : binary relation « defining a thing which initiates an action. » Closed list.

catch(icl>do): «Universal Word » made up of the English « headword » catch and the disambiguating « constraint » (icl>do)

@entry.@pres: « attributes » providing information about how the concept is used in a particular sentence

« The cat catches »

Page 8: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

catch(icl>do).@entry.@pres

cat.@def attic.@def mouse(icl>animal).@indef

grey (icl>color)

I. OVERVIEW OF THE UNL PROJECTA SIMPLE GRAPH

agt(catch(icl>do).@entry.@pres,cat.@def)obj(catch(icl>do).@entry.@pres,mouse(icl>animal).@indef)plc(catch(icl>do).@pres,attic.@def)mod(mouse(icl>animal).@indef,grey(icl>color))

agt plcobj

mod

    « The cat catches a grey mouse in the attic » 

Page 9: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

c a t c h

c a t m o u s e

a g t o b j

T X : E X P A N S I V E

L E X I C A L T R A N S F E R

T S : S T R U C T U R A L

T R A N S F E RE X P A N S R O B R A

T R A N S F E R

E X P A N ST L : L E X I C A L

T R A N S F E R

T Y : E X P A N S I V E

L E X I C A L T R A N S F E RE X P A N S

R O B R AA S : S T R U C T U R A L

A N A L Y S I S

A N A LY S IS

A Y : E X P A N S I V E

L E X I C A L A N A L Y S I S

A X : E X P A N S I V E

L E X I C A L A N A L Y S I S

A M : M O R P H O L O G .

A N A L Y S I S

E X P A N S

E X P A N S

A T E F

R O B R AG S : S T R U C T U R A L

G E N E R A T I O N

G E N E R A T IO N

G X : E X P A N S I V E

L E X . G E N E R A T I O N

G Y : E X P A N S I V E

L E X . G E N E R A T I O N

G M : M O R P H O L O G .

G E N E R A T I O N

E X P A N S

E X P A N S

S Y G M O R

M A N D A T O R Y P H A S E O P T I O N A L P H A S E S L L P S L L P

S o u r c e

t e x t

T a r g e t

t e x t

II. THE FRENCH DECONVERTER :THE ARIANE-G5 MT ENVIRONMENT

GENERATION

T R A N S F E R

ANALYSIS

Page 10: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

attraper:CAT(V)...

ULTXT

ULFRA

PHVB:TPS(PRES)...

II. THE FRENCH DECONVERTER :SIMPLIFIED DEEP LEVEL ARIANE TREE (output of transfer step)

GN:RL(ARG0)...

chat:CAT(N),GNR(MAS)...

GA:RS(QUAL)...

GN:RL(ARG1)...

souris gris

GN:RS(LOC)...

RS:SEMANTIC RELATIONRL:LOGICAL RELATION

grenier

« The cat catches a grey mouse in the attic  »

Page 11: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

T S : S T R U C T U R A L

T R A N S F E RR O B R A

T R A N S F E R

T X : E X PA N S I V E

L E X I C A L T R A N S F E RE X P A N S

R O B R AG S : S T R U C T U R A L

G E N E R A T I O N

G E N E R A T I O N

G X : E X PA N S I V E

L E X . G E N E R A T I O N

G Y : E X PA N S I V E

L E X . G E N E R A T I O N

G M : M O R P H O L O G .

G E N E R A T I O N

E X P A N S

E X P A N S

S Y G M O R

G R A P H T O T R E E

S T R U C T U R A L &

L E X I C A L

T R A N S F E R

S o u r c e

U N L g r a p h

T a r g e t

F r e n c h t e x t

II. THE FRENCH DECONVERTER OVERALL STRUCTURE

UNL graph ARIANE tree

T R A N S F E R

French text

GENERATION

Page 12: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

II. THE FRENCH DECONVERTER SOME FIGURES

Universal Words - French dictionary : 40000 UW

Transfer grammar : 1200 lines

Generation grammar : 2600 lines

Page 13: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

[D:dn=,on=UNL Center,[email protected]] [P] [S:1] {org:el} UNU and UNU/IAS {/org} {unl} and(unu/ias(iclfacilities).@entry,unu(iclorganization)) {/unl} {ab}{/ab}{cn}{/cn}{de}{/de}{el}{/el}{es}{/es}{fr}{/fr}{id}{/id}{hd}{/hd}{it}{/it}{jo}{/jo}{jp}{/jp}{lv}{/lv}{mg}{/mg}{pg}{/pg}{ru}{/ru}{sh}{/sh}{th}{/th} [/S] [S:2] {org:el} United Nations University is an international academic organization of the Unites Nations, and reports to General Assembly of the UN and to the General Conference of UNESCO. {/org} {unl} and(report(icldo).@present.@entry,organization(iclabstract thing).@indef) aoj(organization(iclabstract thing).@indef,united nations university(iclorganization)) mod(organization(iclabstract thing).@indef,international(mod<thing)) mod(organization(iclabstract thing).@indef,academic(mod<thing)) mod(organization(iclabstract thing).@indef,united nations(iclorganization)) plt(report(icldo).@present.@entry,:01) and:01(general conference of unesco(iclmeeting).@entry,general assembly of the un(iclmeeting)) {/unl} {ab}{/ab}{cn}{/cn}{de}{/de}{el}{/el}{es}{/es}{fr}{/fr}{id}{/id}{hd}{/hd}{it}{/it}{jo}{/jo}{jp}{/jp}{lv}{/lv}{mg}{/mg}{pg}{/pg}{ru}{/ru}{sh}{/sh}{th}{/th} [/S] [S:3] {org:el} It's mission is to address pressing global problems that are of the concern of the United Nations and its member states. {/org} {unl} aoj(:01.@present.@entry,mission(iclabstract thing)) mod(mission(iclabstract thing),it) obj:01(address(icldo(objthing)).@entry,problem(iclabstract thing).@pl) ….

II. THE FRENCH DECONVERTER FROM A UNL DOCUMENT ...

Page 14: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

L'Université des Nations Unies est une organisation internationale académique des Nations Unies et rend compte à l'assemblée générale des Nations Unies et à la conférence générale de l'UNESCO. Sa mission est que des problèmes globaux urgents qui sont les responsabilités des Nations Unies et de ses états membres soient abordés. L'UNU apporte ensemble pour remplir sa mission la contribution d'universitaires de pays développés et en voie de développement et d'institutions. Il fonctionne par un réseau de centres de recherche et de formation (<RTC>) ou de programmes de recherche et de formation (<RTP>). La structure <organisationnelle> de l'UNU comprend (un tableau est vu) des corps constituants suivants. Le conseil d'université fixe des principes et des politiques pour l'université. Le président présent est <dr> Jairam_Reddy. Le recteur académique et administratif général de l'université est le responsable avec la responsabilité de la direction de son programme global et d'une organisation et d'une administration.

II. THE FRENCH DECONVERTER … TO THE FRENCH DECONVERTED DOCUMENT

Page 15: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

II. THE FRENCH DECONVERTEREVALUATION OF THE RESULT:

CONSEQUENCE OF AN UNDERSPECIFIED GRAPH

Obtained French deconversion result:La mission de l’UNU est que des problèmes soient abordés...(UNU’s mission is that problems should be adressed...)

Wished French deconversion result: La mission de l’UNU est d’aborder des problèmes...(UNU’s mission is to adress problems...)

The UNL input graph was underspecified:the agent relation between to adress and UNU , implicitin English or French should be explicited in the graph,which was not the case.

Page 16: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

III . FROM THE UNL GRAPH TO GETA’S TREE

A UNL GRAPH COMPRISING A COMPOUND UW

   « He knows that you will not come and regrets it. »

he know

you

come

agt

agt

and obj

obj

agt:01

regret(icl>do)

agt(regret(equ>be sorry).@entry,he(icl>human))obj(regret(equ>be sorry).@entry,S01)agt:01(come(agt>human,goal>place).@entry.@future.@not,you)and(regret(equ>be sorry).@entry,know(agt>human,icl>#event))agt(know(agt>human,icl>#event),he(icl>human))obj(know(agt>human,icl>#event),S01)

Page 17: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

he know

you

comecome

agt

agt

and obj

obj

agt:01

regret(icl>do)

he#1 know

you

come

agt

agt

and obj

objagt

regret(icl>do)

he#1 S01

III . FROM THE UNL GRAPH TO GETA’S TREEGRAPH TO TREE STRUCTURAL TRANSFER

UNL graph tree

S01grp

« He knows that you will not come and regrets it. »

Page 18: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

he#1 know

you

come

agt

agt

and obj

objagt

regret(icl>do)

he#1 S01

S01grp

il#1 savoir

tu

venir

agt

agt

and obj

objagt

regretter

il#1 S01

S01grp

III . FROM THE UNL GRAPH TO GETA’S TREELEXICAL TRANSFER

Universal Word French lemma(or subtree)

Page 19: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

savoir

il#1

NP:RL(ARG0)... PHSUB#2:RL(ARG1)...

venir

NP:RL(ARG0)...

tu

PHVB:RS(COORD)...

regretter

il#1

PHVB:TPS(PRES)...

PHSUB#2:RL(ARG1)...

NP:RL(ARG0)...

III . FROM THE UNL GRAPH TO GETA’S TREEFINAL RESULT OF THE TRANSFER

SEMANTIC TRANSFER + FINAL STRUCTURAL TRANSFER

Page 20: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

agt a thing which initiates an actionand a conjunctive relation between conceptsaoj  a thing which is in a state or has an attributebas a thing used as the basis for expressing degreeben a not directly related beneficiary or victim of an event or state...

EMIT agent of a process or experiencerID the same relation as the father node-ANALOG reference for comparisonRECEPT receiver or beneficiary...

III . FROM THE UNL GRAPH TO GETA’S TREETHE SEMANTIC TRANSFER

The first five UNL semantic relations

Their equivalents in the French generator used

Page 21: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

UNL41 semantic relations

GETA 26 semantic relations3 argumentary relations (predicates)

III . FROM THE UNL GRAPH TO GETA’S TREESEMANTIC TRANSFER

Page 22: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

THE DIFFICULTY OF THE DETERMINATION OF THE EXACT SEMANTIC RELATIONS BETWEEN A PREDICATE AND ITS DEPENDENTS ... src defines the initial state of object or the thing initially associated with object of an event pof defines a concept of which a focused thing is a part.… COULD BE AVOIDED BY THE USE OF ARGUMENTARY RELATIONS, NOT (YET ?) USED IN UNL

The IAS Faculty is constituted by researchers and professors

obj(constitute(icl>compose).@present.@entry,IAS Faculty.@def.@topic)src(constitute(icl>compose).@present.@entry,:01)and:01(professor(icl>educator).@pl.@entry,researcher(icl>person).@pl)

obj.@0(constitute(icl>compose).@present.@entry,IAS Faculty.@def.@topic)src.@1(constitute(icl>compose).@present.@entry,:01)and:01(professor(icl>educator).@pl.@entry,researcher(icl>person).@pl)

III . FROM THE UNL GRAPH TO GETA’S TREELACK OF ARGUMENTARY RELATIONS

Page 23: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

III . FROM THE UNL GRAPH TO GETA’S TREEUSEFULNESS OF ARGUMENTARY RELATIONS

Using the argumentary relations of the predicates is helpful:

- In the enconversion process (Natural language into UNL) : avoids the necessity of the (difficult) determination of the exact semantic relation between a predicate and its dependents

-In the deconversion process (UNL into Natural Language) : allows a more direct NL generation by distinguishing between a circunstancial phrase and a predicate argument

obj(reach(icl>do).@entry,tower.@def) plt(reach(icl>do).@entry,heaven(icl>region)) (The tower reaches until the heaven.)

obj.@0(reach(icl>do).@entry,tower.@def) plt.@1(reach(icl>do).@entry,heaven(icl>region)) (The tower reaches the heaven.)

Page 24: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

IV . CONCLUSION

The UNL programm is a challenging project devising and using an Interlingua between all the languages of the world, despite linguistic and cultural differences.

Hence a difficult project, mainly: 1 - difficulty of devising and using a complete set of semantic relations universally accepted. 2 - difficulty of devising and using a huge set of Universal Words.

We essentially discussed here point 1. We are convinced that, as complete a set of semantic relations may be, the use of argumentary relations associated to the predicates remains unavoidable in the devising of such an Interlingua, independantly of the processing methodology used . But this implies a general agreement on the choice of the arguments for all predicates, which is not so easy...

Page 25: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

Do you understand UNL?

agt(wish(agt>human).@entry,I)ben(wish(agt>human).@entry,you)obj(wish(agt>human).@entry,conference(icl>meeting))mod(conference(icl>meeting),fruitful)

Or better:

agt(wish(agt>human).@entry,I)obj(wish(agt>human).@entry,:01)aoj:01(fruitful.@entry,conference(icl>meeting))ben:01(fruitful.@entry,you)

NB: aoj is the ‘attribute’ semantic relation.

Page 26: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

V . THE DEVELOPMENT ENVIRONMENT

ARIANE

generator of

MT systems

CASH

hypertext.

control of

ARIANE

e-mail

link

DEVELOPER'S MAC IBM MAINFRAME

PARAX-UNL

multil. lex.

database

UNL graph

to

dep. tree

transfer

module

e-mail

link

UNL

graph

Page 27: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

R u s s i a n d i c t i o n a r y

< R u s s i a n U N L t e a m ( M o s c o w )

2 7 5 0 0 w o r d s e n s e s ( i n 1 9 9 8 )

F r e n c h d i c t i o n a r y

< F r e n c h U N L t e a m ( G r e n o b l e )

3 7 8 0 0 w o r d s e n s e s ( i n 1 9 9 9 )

U n i v e r s a l W o r d s d i c t i o n a r y

< J a p a n e s e U N L t e a m ( T o k y o )

J a p a n e s e d i c t i o n a r y

< J a p a n e s e U N L t e a m ( T o k y o )

1 7 0 8 0 0 w o r d s e n s e s ( i n 1 9 9 7 )

V . THE DEVELOPMENT ENVIRONMENT :THE « PARAX » MULTILINGUAL DATABASE

Page 28: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

V . THE DEVELOPMENT ENVIRONMENT : MULTILINGUAL DATABASE

A PAGE OF THE FRENCH DICTIONARY(THE WORD « FONCTION »)

fonctionfunction(fld>mathematics)sens: 4CAT(CATN),GNR(FEM),N(NC)

French word:UW:sense number:gramm. descr.:

UW FR RU JPAccess to the equivalents of the UWIn other languages

Page 29: From the UNL hypergraph to GETA's multilevel tree Etienne BLANC GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr

V . THE DEVELOPMENT ENVIRONMENT :INTERACTIVE LEXICAL DISAMBIGUATION

UNL graph

Frenchdependency tree

lexical &structuraltransfer

(1st step)

disambiguationwindow