oana adriana Şoica

25
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network

Upload: bryson

Post on 24-Feb-2016

69 views

Category:

Documents


0 download

DESCRIPTION

Building and Ordering a SenDiS Lexicon Network. Oana Adriana Şoica. SenDiS operates on a specific lexicon network ( LexNet ) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet : - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Oana  Adriana  Şoica

Oana Adriana Şoica

Building and Ordering a SenDiS Lexicon Network

Page 2: Oana  Adriana  Şoica

Page 2

SenDiS

SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations

lexicon networks obtained from other semantic / lexical relations

obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet

(manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008)

preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet

Outline

Page 3: Oana  Adriana  Şoica

Page 3

SenDiS

o hypernyms

o hyponyms

o similar to

o has part

o synonyms

o antonyms

o holonyms

o meronyms

o coordinate terms

o troponyms

o entailment

Semantic/Lexical Relations

Page 4: Oana  Adriana  Şoica

Page 4

SenDiS

An excerpt of the WordNet semantic network* Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009)

Semantic/Lexical relations: WordNet

Page 5: Oana  Adriana  Şoica

Page 5

SenDiSSemantic/Lexical relations: GRAALANTail of relation Head of relation Relation type

{synonym } {synonym} Bidirectional, symmetric

{antonym } {antonym} Bidirectional, symmetric

{paronym} {paronym} Bidirectional, symmetric

{ hypernym } {hyponym} Bidirectional, asymmetric

{connotation} - Unidirectional

{holonym} {meronym} Bidirectional, asymmetric

{homonym} {homonym} Bidirectional, symmetric

{heteronym} {heteronym} Bidirectional, symmetric

{homophone} {homophone} Bidirectional, symmetric

{diminutive of} {diminutive by} Bidirectional, asymmetric

{augmentative of} {augmentative by} Bidirectional, asymmetric

{extension from} {extension into} Bidirectional, asymmetric

{reduction from} {reduction into} Bidirectional, asymmetric

{generalization from} {generalization into} Bidirectional, asymmetric

{specialization from} {specialization into} Bidirectional, asymmetric

{figurative of} {literal for} Bidirectional, asymmetric

{reference to} - Unidirectional

{derived from} {derived into} Bidirectional, asymmetric

{back formatted form} {back formats} Bidirectional, asymmetric

{abstract for} {concretized from} Bidirectional, asymmetric

{with variant} {variant for} Bidirectional, asymmetric

Page 6: Oana  Adriana  Şoica

Page 6

SenDiS

manually annotating the glosses from a lexicon(using a specific tool that can ease the process)

importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses

Obtaining a SenDiS LexNet

Page 7: Oana  Adriana  Şoica

Page 7

SenDiS

o implied a significant effort, usually measured in months, involving several trained linguists

o using a specialized collaborative tool(BuildLNTool – Build Lexicon Network Tool)

o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word

o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language

Creating the SenDiS LexNet

Page 8: Oana  Adriana  Şoica

Page 8

SenDiS

o BuildLNTool (Build Lexicon Network Tool) provides:

a visual and effective mechanism to manually annotate the lexicon glosses

a synchronized overview of the already created relations

a browsing mechanism for inspecting the already tagged glosses and relations

BuildLNTool

Page 9: Oana  Adriana  Şoica

Page 9

SenDiS

“Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees”

“Root & Leaf Meanings” Messages and progress

BuildLNTool - Sections

Page 10: Oana  Adriana  Şoica

Page 10

SenDiS

o “Lemmas & MWEs”: list of lexicon entries

o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network

o “Lemma/MWE Info”: current lexicon entry being analyzed

o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net

o section for messages and progress

BuildLNTool – Sections II

Page 11: Oana  Adriana  Şoica

Page 11

SenDiS

selection of lexicon entry type

selection of unfinished lexicon entries filter

selection of viewing interval

text filter

lexicon entry text

lexicon entry status

BuildLNTool – Lemmas & MWEs

Page 12: Oana  Adriana  Şoica

Page 12

SenDiS

double click

BuildLNTool – Selection of a current lexicon entry

Page 13: Oana  Adriana  Şoica

Page 13

SenDiS

lexicon entry text morphologic interpretation

list of meanings filters

meaning/gloss fully tagged

meaning/gloss partially tagged

meaning/gloss not tagged

BuildLNTool – Browsing the meanings of the current lexicon entry

Page 14: Oana  Adriana  Şoica

Page 14

SenDiS

double click

BuildLNTool – Selection of a current meaning for tagging

Page 15: Oana  Adriana  Şoica

Page 15

SenDiS

unrecognizedgloss constituent

‘Enter’

BuildLNTool – Gloss constituent without interpretations

Page 16: Oana  Adriana  Şoica

Page 16

SenDiS

Default setting: Medium

BuildLNTool – Degrees of relevance (in gloss context)

Page 17: Oana  Adriana  Şoica

Page 17

SenDiS

‘Strong’ tokens

‘Medium’ tokens

‘Weak’ tokens

Ignored (X) tokens

BuildLNTool – Degrees of relevance II

Page 18: Oana  Adriana  Şoica

Page 18

SenDiS

Unsavedannotations

Savedannotations

BuildLNTool – Gloss tagging

Page 19: Oana  Adriana  Şoica

Page 19

SenDiS

view of meaning tagging tree

selection of constituent / group of gloss constituents

set / modifyrelevance degree

edit textof gloss constituent

select / modify the sense for the gloss constituent

further annotate meaning / save annotations

chose the next meaning

further on

save annotations

current gloss constituent

withoutsense interpretations

BuildLNTool – Gloss tagging protocol

Page 20: Oana  Adriana  Şoica

Page 20

SenDiS

LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R

LL_Romanian - 99% 1,528,819 1,191,942 691,010 720,420 686,210

LL_English - 2% 36,828 30,350 18,523 17,641 17,505

LexNets Glosses Tagged Glosses Targeted Glosses Tags Density

LL_Romanian - 99% 130,087 118,536 58,976 0.5757

LL_English - 2% 259,651 3,496 7,551 0.5767

Built LexNets for Romanian and English

Page 21: Oana  Adriana  Şoica

Page 21

SenDiS

o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples)

o the synsets were split and transformed in to a classical lexicon format

o the lexicon network imported:

LexNets Glosses Tagged Glosses Targeted Glosses Tags Density

WordNet 206,941 206,938 59,251 0.3486

WordNet_extendedGlosses 206,941 206,941 83,174 0.3006

LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R

WordNet 2,394,190 2,394,190 2,394,189 834,803 834,803WordNet_extendedGlosses 3,114,968 3,114,968 3,114,967 936,397 936,397

Imported WordNet tagged glosses

Page 22: Oana  Adriana  Şoica

Page 22

SenDiS

o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs

o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net

o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices

Ordering a SenDiS LexNet

Page 23: Oana  Adriana  Şoica

Page 23

SenDiS

e9

e4 e5 e6 e7

e8

e1 e2 e3

A minimal lexicon net in the original form

Unordered LexNet

Page 24: Oana  Adriana  Şoica

Page 24

SenDiS

9

1

2

3

4

5

6

7

8

V

e11

e1

e2

e3

e4

e5

e6

e7

e8

e9

10

e10

11

B

The same minimal lexicon net leveled

Ordered (leveled) LexNet

Page 25: Oana  Adriana  Şoica

Page 25

SenDiS

LNs Vertices Edges InOLN

Algorithm Edges Out Edges Removed Levels Time (s)

wn 202,361 834,803 Patentv1 821,048 13,755 192 4.5

wn_ex 205,188 936,397 Patentv1 936,397 74,526 382 5.7

ro_48% 72,067 318,741 Patentv1 308,592 10,149 195 1.6

ro_78% 100,175 523,192 Patentv1 504,210 18,982 244 2.3

ro_99% 120,472 686,784 Patentv1 659,030 27,754 291 2.8

ro_48% 130,407 318,741 NT_eades 308,334 10,407 58 60

ro_99% 130,099 686,784 NT_eades 654,025 32,759 70 330

wn_ex 206,941 936,397 NT_eades 904,992 31,405 46 1,315

Results on leveling experimental LexNets