m. lafourcade (lirmm & ch. boitet (geta, clips)lrec-02, las palmas, 31/5/2002 1 lrec-2002, las...

Post on 30-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20021

LREC-2002, Las Palmas, May 2002

Mathieur Lafourcade & Christian BoitetLIRMM, Montpellier

GETA, CLIPS, IMAG, GrenobleChristian.Boitet@imag.fr http://www-clips.imag.fr/geta

Mathieu.Lafourcade@lirmm.fr http://www.lirmm.fr/~lafourca

UNL Lexical Selection with Conceptual Vectors

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20022

Outline

The problem: disambiguation in UNL-French deconversionFinding the known UW nearest to an unknown UWFinding the best French lemma for a given UW

Conceptual vectorsNature & example on French (873 dimensions)Building (Dec. 201: 64,000 terms, 210,000 CVs)

CVD (CV Disambiguation) running for FrenchRecooking the vectors attached to a document treePlacing each recooked vector in the word sense tree

Using CVD in UNL-French deconversion: ongoing

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20023

The UNL-FR deconversion process

UNL-FRA Graph (UW)

UNL-L1Graph “UNL Tree”

GMA structure

UMA structure

UMC structure

French utterance

Validation & Localization

Graph to tree conversion

Structural transfer

Paraphrase choice

Morphological generation

Syntactic generation

Lexical Transfer

Conceptual vectorscomputations

UNL-FRA Graph

(French LU)

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20024

The problem: disambiguation in UNL-French deconversion

Find the known UW nearest to an unknown UWknown UWs: obj(open(icl>occur),door)(in KB context) a door opens

obj(open(icl>do),door)one opens a door

input graph: obj(open(icl>occur,ins>concrete thing),door)ins(open(icl>occur,ins>concrete

thing),key…) a key opens a door / a door opens with a key==> choose nearest open(icl>occur) for correct result

Find best French lemma for a UW in a given contextmeeting(icl>event) ==> réunion [ACTION, DURATION…]

rencontre [EVENT, MOMENT…]

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20025

How to solve them?

1. unknown UW best known UW1. Accessing KB in real time impractical (web server)2. KB not enough: still many possible candidates

2. known UW best LU1. Often no clear symbolic conditions for selection2. Possibility to transform UNLLUfr dictionary into a kind

of neural net (cf. MSR MindNet)

3. a possible unifying solution: Lexical selection through DCV,

Disambiguation using Conceptual Vectors which works quite well for French on large scale

experiments

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20026

Conceptual vectors

CV = vector in concept space (4th level in Larousse)V(to tidy up) = CHANGE [0.84], VARIATION [0.83],

EVOLUTION [0.82], ORDER [0.77], SITUATION [0.76], STRUCTURE [0.76], RANK [0.76] …

V(to cut) = GAME [0.8], LIQUID [0.8], CROSS [0.79], PART [0.78] MIXTURE [0.78], FRACTION [0.75], TORTURE [0.75] WOUND [0.75], DRINK [0.74] …

Global vector of a term = normalized sum of the CVs of its meanings/senses V(head) = HEAD [0.83], . BEGINNING [0.75],

ANTERIORITY [0.74], PERSON [0.74] INTELLIGENCE [0.68], HIERARCHY [0.65], …

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20027

Conceptual vectors and sense space

Conceptual vector modelReminiscent of Vector Models (Salton and all.) & Sowa

Applied on preselected concepts (not terms)

Concepts are not independent

Set of k basic conceptsThesaurus Larousse = 873 concepts (translation of Roget’s)

A vector = a 873 uple of reals in [0..1]

Encoding for each dimension C = 215 : [0..32767]

Sense space = vector space + vector set

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20028

Thematic relatedness

Conceptual vector distanceAngular Distance DA(x, y) = angle (x, y)

0 <= DA(x, y) <= Interpretation

if DA(x, y) = 0 x // y (colinear): same idea

if DA(x, y) = /2 x y (orthogonal): nothing in common

if DA(x, y) = DA(x, y) = DA(x, -x): -x anti-idea of x

x’

xy

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/20029

Collection processStart from a few handcrafted term/meanings/vectors<do forever> //running constantly on Lafourcade’s Mac

<choose a word at random (with or without a CV) find NL definitions of its senses (mainly on the Web) for each sense definition SD

analyze SD into linguistic tree TreeDefattach existing or null CVs to lexical nodes of TreeDefiterate propagation of CVs in TreeDef (ling. rules used

here)until CV(root) converges or limit of cycle numbers is reached

CV(sense) CV(root(TreeDef)) use vector distance to arrange the CVs of senses into a binary

« discrimination tree »

</choose>

</do>

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200210

An example discrimination tree

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200211

Status on French CVsBy Dec. 2001

64,000 terms 210,000 CVs Average of 3.3 senses/term

Method robot to access web lexicon servers large coverage French analyzer by J.Chauché in Sigmart

See more details on http://www.lirmm.fr/~lafourca

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200212

Disambiguation in French

Recook the vectors attached to a document tree– Take a document– Analyze it with Sigmart analyzer into ONE possibly big

tree (30 pages OK as a unit)– Use the same process as for processing definitions– Final CV(root) usable as thematic classifier of document– Final CV (lexemes) used as « sense in context »

Place each recooked vector in the discrimination tree– Walk down the discrimination tree, using vector distance– Stop at nearest node:

If leave node, full disambiguation (relative to available sense set) If internal node, partial disambigation (subset of senses)

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200213

Example with some ambiguities

•The white ants strike rapidly the trusses of the roof

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200214

Initialize: attach CVs to lexemes

• The white ants strike rapidly the trusses of the roof

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200215

Up / Down propagation of

the CVs

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200216

Result: sense selection

•The white ants strike rapidly the trusses of the roof

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200217

Disambiguation in UNL-French deconversion

Our set-upExample input UNL-graph

Outline of the process Two usages of DCV (disambiguation with CV)

Finding the known UW nearest to an unknown UW

Finding the best French lemma for a given UW

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200218

A UNL input graph

agt ins plt

obj mod

Ronaldo head(pof>body)corner

leftgoal(icl>thing)

score(icl>event,agt>human,fld>sport).@entry.@past.@complete

objpos

•Ronaldo has headed the ball into the left corner of the goal”

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200219

Corresponding UNL-treewith CVs attached: localization DCV

1- Ronaldo: agt

corner: plt

left: mod

1- goal(icl>thing): obj

score(icl>event,agt>human,fld>sport).@entry.@past.@complete

1- goal(icl>thing): objVthing(goal)

Vthing(goal)

V(human)

Vplace(corner)

V(left)

V = Vevent(score)+ Vhuman(score)+ Vsport(score)

2- Ronaldo: pos

V(human)

Vbody(head)

head(pof>body): ins

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200220

Result of first step: the « best » UWs

The vector contextualization generalizes both kinds of localization (lexical and cultural).

On each node, the selected UW is the one in the UNL-French database which vector is the closest to the contextualized vector.

Formulas used for up and dow propagation:

↑ V ' ( N ) = V ( N ) ⊕

i = 0

n

∑ V ( ni

)

↓V'(ni)=V(ni)⊕V(N)⊗V(ni)

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200221

Second step: select the « best » LUs

Depending on the strategy of the generator, a lexical unit (LU) may be a lemma

a whole derivational family

(pay, payment, payable…)

Dictionay: <UW, CVdict> {<LUi, CVi>}

Input: <UW,CVcontext>

Output: LU i with nearest CVi

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,

31/5/200222

Conclusion

Another case of fruitful integration of symbolic & numerical methods

Further work plannedintegration into running UNL-FR serverwork on feed-back (Pr SU’s line of thought)

if user corrects the choice of LU for chosen UW or worse, if user chooses a LU corresponding to another

UW!==> then recompute vectors by giving more weight to

chosen CVs

top related