the topology of wordnet: some metrics ann devitt and carl vogel computational linguistics group...
TRANSCRIPT
![Page 1: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/1.jpg)
The Topology of WordNet:some metrics
Ann Devitt and Carl VogelAnn Devitt and Carl Vogel
Computational Linguistics GroupComputational Linguistics Group
Trinity College Dublin, IrelandTrinity College Dublin, Ireland
![Page 2: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/2.jpg)
Ann Devitt, TCD
Introduction
MeasuresMeasures WordNet “sub-hierarchies”WordNet “sub-hierarchies” Multiple inheritanceMultiple inheritance Branching FactorBranching Factor Depth Depth versus versus HeightHeight Cluster coefficientsCluster coefficients
Specificity pilot studySpecificity pilot study
![Page 3: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/3.jpg)
Ann Devitt, TCD
Terminology
WordNet as directed acyclic graphWordNet as directed acyclic graph
Node and synset interchangeableNode and synset interchangeable
![Page 4: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/4.jpg)
Ann Devitt, TCD
Dimensional distribution
![Page 5: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/5.jpg)
Ann Devitt, TCD
Overlap between hierarchies
2072 synsets: more than 1 top hierarchy2072 synsets: more than 1 top hierarchy
35 synsets: more than 2 top hierarchies35 synsets: more than 2 top hierarchies
![Page 6: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/6.jpg)
Ann Devitt, TCD
Some overlap examples
Abstraction and EventAbstraction and Event948 synsets948 synsets
group actiongroup action Entity and GroupEntity and Group
250 nodes250 nodesweaponryweaponry
![Page 7: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/7.jpg)
Ann Devitt, TCD
Multiple inheritance
2.6% of nodes2.6% of nodes Normal distribution throughout depthNormal distribution throughout depth Significantly different in different Significantly different in different
taxonomies: taxonomies: χχ22 (8, N=75180)=324.27, p≤0.001 (8, N=75180)=324.27, p≤0.001
![Page 8: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/8.jpg)
Ann Devitt, TCD
Specificity examples
Parents = 1, depth < 3Parents = 1, depth < 3 damnationdamnation officeoffice
Parents = 1, depth > 8Parents = 1, depth > 8 beagle beagle palominopalomino
Parents > 1, depth < 3Parents > 1, depth < 3 personperson artefactartefact
Parents > 1, depth > 8Parents > 1, depth > 8 sea basssea bass self-self-
condemnationcondemnation bombardonbombardon
![Page 9: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/9.jpg)
Ann Devitt, TCD
Branching Factor
Number of children + 1Number of children + 1 Including leaf nodesIncluding leaf nodes
Range: 1 – 573Range: 1 – 573 Average: 2.023Average: 2.023
Excluding leaf nodes: Excluding leaf nodes: Average: 5.793Average: 5.793 97% less than 2097% less than 20
![Page 10: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/10.jpg)
Ann Devitt, TCD
Branching factor
Overall low branching factorOverall low branching factor Same distribution in all sub-hierarchiesSame distribution in all sub-hierarchies Large number of nodes in totalLarge number of nodes in total Greater overall depth in pathsGreater overall depth in paths Not a shallow structure Not a shallow structure
despite 55,000 leaf nodesdespite 55,000 leaf nodes
![Page 11: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/11.jpg)
Ann Devitt, TCD
Depth vs Height
Depth: Depth: Maximum = 18Maximum = 18 Normal distributionNormal distribution
Height: Height: Maximum = 5Maximum = 5 93.6% 1 or 2 nodes from a leaf node93.6% 1 or 2 nodes from a leaf node Zipfian distributionZipfian distribution
![Page 12: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/12.jpg)
Ann Devitt, TCD
Depth vs Height
Reported distributionsReported distributions the same across the different sub the same across the different sub
hierarchieshierarchies
Depth is a more informative measureDepth is a more informative measure
![Page 13: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/13.jpg)
Ann Devitt, TCD
Clustering coefficient
Measure of graph connectivityMeasure of graph connectivity Ratio: Ratio:
Number of connections btwn nodesNumber of connections btwn nodesPossible number of connectionsPossible number of connections
2 2 ΣΣii
kkii (k (kii – 1) – 1)
![Page 14: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/14.jpg)
Ann Devitt, TCD
Cluster coefficients
First-order measure First-order measure Not useful for WordNetNot useful for WordNet Only 62 nodes have a coefficient > 0Only 62 nodes have a coefficient > 0 Does not form clusters readilyDoes not form clusters readily
![Page 15: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/15.jpg)
Ann Devitt, TCD
Cluster coefficients
Second-order measureSecond-order measure Average 0.337Average 0.337 Normal distributionNormal distribution May form clusters of wider diameterMay form clusters of wider diameter
![Page 16: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/16.jpg)
Ann Devitt, TCD
Pilot Study Aims
1.1. Do people have a notion of Do people have a notion of generality/specificity for concepts? generality/specificity for concepts?
2.2. Do people agree on what is more/less Do people agree on what is more/less general/specific? general/specific?
3.3. What features of WordNet do these What features of WordNet do these judgments correlate with?judgments correlate with?
![Page 17: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/17.jpg)
Ann Devitt, TCD
Sample ranking task I
Axis, axis of rotation – (the center around Axis, axis of rotation – (the center around which something rotateswhich something rotates
River boat – (a boat used on rivers or to ply River boat – (a boat used on rivers or to ply a river)a river)
Remains – (any object that is left unused or Remains – (any object that is left unused or still extant; “I threw out the remains of my still extant; “I threw out the remains of my dinner”dinner”
![Page 18: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/18.jpg)
Ann Devitt, TCD
Sample ranking task II
rational motive - (a motive that can be rational motive - (a motive that can be defended by reasoning or logical argumentdefended by reasoning or logical argument
disapproval - (the act of disapproving or disapproval - (the act of disapproving or condemning)condemning)
harmony, concord, concordance - harmony, concord, concordance - (agreement of opinions)(agreement of opinions)
![Page 19: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/19.jpg)
Ann Devitt, TCD
Do people agree on what is more/less general/specific?
YESYES Cochran Q statistic (Cochran 1950)Cochran Q statistic (Cochran 1950) HH0 0 : that any agreement between respondents is : that any agreement between respondents is
due to chancedue to chance Overall: for 11 respondentsOverall: for 11 respondents
Cochran's QCochran's Q 165.859165.859 44 degrees of freedom44 degrees of freedom Asymp. Sig.Asymp. Sig. .000.000
![Page 20: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/20.jpg)
Ann Devitt, TCD
What WN features correlate?
DepthDepth Less deep = more generalLess deep = more general
ChildrenChildren InconclusiveInconclusive
SistersSisters Less sisters = more generalLess sisters = more general
Sub-hierarchySub-hierarchy Did not seem to affect judgmentsDid not seem to affect judgments Did increase the difficulty of the taskDid increase the difficulty of the task
![Page 21: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/21.jpg)
Ann Devitt, TCD
Conclusion
WordNet metricsWordNet metrics Inheritance: Sub-hierarchy and parentageInheritance: Sub-hierarchy and parentage Branching FactorBranching Factor Distance: depth and heightDistance: depth and height ClusteringClustering
Pilot studyPilot study Suggests where to go with a larger studySuggests where to go with a larger study
![Page 22: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/22.jpg)
Ann Devitt, TCD
Bibliography
W. G. Cochran: The comparison of percentages in W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950matched samples. Biometrika, 37:256-266, 1950
David Touretsky: The Mathematics of Inheritance David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann Systems, Los Altos, CA: Morgan Kaufmann (1986)(1986)
D. J. Watts and S. H. Strogatz: Collective D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, dynamics of small world networks, Nature 401, 130 (1999)130 (1999)
![Page 23: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland](https://reader036.vdocuments.mx/reader036/viewer/2022070413/5697bf9a1a28abf838c9226f/html5/thumbnails/23.jpg)
Ann Devitt, TCD
Multiple Inheritance vs Depth