m. lafourcade (lirmm & ch. boitet (geta, clips)lrec-02, las palmas, 31/5/2002 1 lrec-2002, las...
TRANSCRIPT
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20021
LREC-2002, Las Palmas, May 2002
Mathieur Lafourcade & Christian BoitetLIRMM, Montpellier
GETA, CLIPS, IMAG, [email protected] http://www-clips.imag.fr/geta
[email protected] http://www.lirmm.fr/~lafourca
UNL Lexical Selection with Conceptual Vectors
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20022
Outline
The problem: disambiguation in UNL-French deconversionFinding the known UW nearest to an unknown UWFinding the best French lemma for a given UW
Conceptual vectorsNature & example on French (873 dimensions)Building (Dec. 201: 64,000 terms, 210,000 CVs)
CVD (CV Disambiguation) running for FrenchRecooking the vectors attached to a document treePlacing each recooked vector in the word sense tree
Using CVD in UNL-French deconversion: ongoing
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20023
The UNL-FR deconversion process
UNL-FRA Graph (UW)
UNL-L1Graph “UNL Tree”
GMA structure
UMA structure
UMC structure
French utterance
Validation & Localization
Graph to tree conversion
Structural transfer
Paraphrase choice
Morphological generation
Syntactic generation
Lexical Transfer
Conceptual vectorscomputations
UNL-FRA Graph
(French LU)
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20024
The problem: disambiguation in UNL-French deconversion
Find the known UW nearest to an unknown UWknown UWs: obj(open(icl>occur),door)(in KB context) a door opens
obj(open(icl>do),door)one opens a door
input graph: obj(open(icl>occur,ins>concrete thing),door)ins(open(icl>occur,ins>concrete
thing),key…) a key opens a door / a door opens with a key==> choose nearest open(icl>occur) for correct result
Find best French lemma for a UW in a given contextmeeting(icl>event) ==> réunion [ACTION, DURATION…]
rencontre [EVENT, MOMENT…]
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20025
How to solve them?
1. unknown UW best known UW1. Accessing KB in real time impractical (web server)2. KB not enough: still many possible candidates
2. known UW best LU1. Often no clear symbolic conditions for selection2. Possibility to transform UNLLUfr dictionary into a kind
of neural net (cf. MSR MindNet)
3. a possible unifying solution: Lexical selection through DCV,
Disambiguation using Conceptual Vectors which works quite well for French on large scale
experiments
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20026
Conceptual vectors
CV = vector in concept space (4th level in Larousse)V(to tidy up) = CHANGE [0.84], VARIATION [0.83],
EVOLUTION [0.82], ORDER [0.77], SITUATION [0.76], STRUCTURE [0.76], RANK [0.76] …
V(to cut) = GAME [0.8], LIQUID [0.8], CROSS [0.79], PART [0.78] MIXTURE [0.78], FRACTION [0.75], TORTURE [0.75] WOUND [0.75], DRINK [0.74] …
Global vector of a term = normalized sum of the CVs of its meanings/senses V(head) = HEAD [0.83], . BEGINNING [0.75],
ANTERIORITY [0.74], PERSON [0.74] INTELLIGENCE [0.68], HIERARCHY [0.65], …
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20027
Conceptual vectors and sense space
Conceptual vector modelReminiscent of Vector Models (Salton and all.) & Sowa
Applied on preselected concepts (not terms)
Concepts are not independent
Set of k basic conceptsThesaurus Larousse = 873 concepts (translation of Roget’s)
A vector = a 873 uple of reals in [0..1]
Encoding for each dimension C = 215 : [0..32767]
Sense space = vector space + vector set
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20028
Thematic relatedness
Conceptual vector distanceAngular Distance DA(x, y) = angle (x, y)
0 <= DA(x, y) <= Interpretation
if DA(x, y) = 0 x // y (colinear): same idea
if DA(x, y) = /2 x y (orthogonal): nothing in common
if DA(x, y) = DA(x, y) = DA(x, -x): -x anti-idea of x
x’
xy
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/20029
Collection processStart from a few handcrafted term/meanings/vectors<do forever> //running constantly on Lafourcade’s Mac
<choose a word at random (with or without a CV) find NL definitions of its senses (mainly on the Web) for each sense definition SD
analyze SD into linguistic tree TreeDefattach existing or null CVs to lexical nodes of TreeDefiterate propagation of CVs in TreeDef (ling. rules used
here)until CV(root) converges or limit of cycle numbers is reached
CV(sense) CV(root(TreeDef)) use vector distance to arrange the CVs of senses into a binary
« discrimination tree »
</choose>
</do>
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200210
An example discrimination tree
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200211
Status on French CVsBy Dec. 2001
64,000 terms 210,000 CVs Average of 3.3 senses/term
Method robot to access web lexicon servers large coverage French analyzer by J.Chauché in Sigmart
See more details on http://www.lirmm.fr/~lafourca
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200212
Disambiguation in French
Recook the vectors attached to a document tree– Take a document– Analyze it with Sigmart analyzer into ONE possibly big
tree (30 pages OK as a unit)– Use the same process as for processing definitions– Final CV(root) usable as thematic classifier of document– Final CV (lexemes) used as « sense in context »
Place each recooked vector in the discrimination tree– Walk down the discrimination tree, using vector distance– Stop at nearest node:
If leave node, full disambiguation (relative to available sense set) If internal node, partial disambigation (subset of senses)
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200213
Example with some ambiguities
•The white ants strike rapidly the trusses of the roof
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200214
Initialize: attach CVs to lexemes
• The white ants strike rapidly the trusses of the roof
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200215
Up / Down propagation of
the CVs
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200216
Result: sense selection
•The white ants strike rapidly the trusses of the roof
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200217
Disambiguation in UNL-French deconversion
Our set-upExample input UNL-graph
Outline of the process Two usages of DCV (disambiguation with CV)
Finding the known UW nearest to an unknown UW
Finding the best French lemma for a given UW
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200218
A UNL input graph
agt ins plt
obj mod
Ronaldo head(pof>body)corner
leftgoal(icl>thing)
score(icl>event,agt>human,fld>sport).@entry.@past.@complete
objpos
•Ronaldo has headed the ball into the left corner of the goal”
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200219
Corresponding UNL-treewith CVs attached: localization DCV
1- Ronaldo: agt
corner: plt
left: mod
1- goal(icl>thing): obj
score(icl>event,agt>human,fld>sport).@entry.@past.@complete
1- goal(icl>thing): objVthing(goal)
Vthing(goal)
V(human)
Vplace(corner)
V(left)
V = Vevent(score)+ Vhuman(score)+ Vsport(score)
2- Ronaldo: pos
V(human)
Vbody(head)
head(pof>body): ins
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200220
Result of first step: the « best » UWs
The vector contextualization generalizes both kinds of localization (lexical and cultural).
On each node, the selected UW is the one in the UNL-French database which vector is the closest to the contextualized vector.
Formulas used for up and dow propagation:
↑ V ' ( N ) = V ( N ) ⊕
i = 0
n
∑ V ( ni
)
€
↓V'(ni)=V(ni)⊕V(N)⊗V(ni)
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200221
Second step: select the « best » LUs
Depending on the strategy of the generator, a lexical unit (LU) may be a lemma
a whole derivational family
(pay, payment, payable…)
Dictionay: <UW, CVdict> {<LUi, CVi>}
Input: <UW,CVcontext>
Output: LU i with nearest CVi
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas,
31/5/200222
Conclusion
Another case of fruitful integration of symbolic & numerical methods
Further work plannedintegration into running UNL-FR serverwork on feed-back (Pr SU’s line of thought)
if user corrects the choice of LU for chosen UW or worse, if user chooses a LU corresponding to another
UW!==> then recompute vectors by giving more weight to
chosen CVs