an empirical study of vocabulary relatedness and its application to recommender systems

36
.nju.edu.cn An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems Gong Cheng, Saisai Gong, Yuzhong Qu State Key Laboratory for Novel Software Technology, Nanjing University, China [email protected] Presented at ISWC2011

Upload: gong-cheng

Post on 03-Jul-2015

590 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

.nju.edu.cn

An Empirical Study of Vocabulary Relatedness

and Its Application to Recommender Systems

Gong Cheng, Saisai Gong, Yuzhong Qu

State Key Laboratory for Novel Software Technology, Nanjing University, China

[email protected]

Presented at ISWC2011

Page 2: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 2 of 36

ws .nju.edu.cn

Vocabulary matching

Measuring term similarity

FullProfessor

FacultyMember

AssistantProfessor

Professor

Faculty

AssistantProfessor

0.9

0.8

1.0

Page 3: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 3 of 36

ws .nju.edu.cn

Vocabulary matching

Vocabulary distance

Measuring vocabulary similarity

Semantic Web for Research

Communities (SWRC)

eBiquity Person

Foundational Model of

Anatomy (FMA)

GALEN

NCBI organismal classification

(NCBITaxon)

0.8

0.5

0.5

0.60.02

Page 4: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 4 of 36

ws .nju.edu.cn

Vocabulary matching

Vocabulary distance

Vocabulary relatedness

Measuring vocabulary relatedness

FullProfessor

FacultyMember

AssistantProfessorPhD

Postgraduate-Research-

Degree

EngD

not that similar, but somewhat related

Page 5: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 5 of 36

ws .nju.edu.cn

Contributions

How to measure vocabulary relatedness?

6 measures, from 4 aspects

How about vocabulary relatedness in real-life cases?

Empirical analysis of 2,996 vocabularies and other 4 billion RDF triples

Where to apply vocabulary relatedness?

Post-selection vocabulary recommendation in vocabulary search

Page 6: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 6 of 36

ws .nju.edu.cn

Outline

Data set

Vocabulary relatedness

Post-selection vocabulary recommendation

Conclusions

Page 7: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 7 of 36

ws .nju.edu.cn

Data set statistics

Crawled from February 2010 to May 2011 by

Page 8: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 8 of 36

ws .nju.edu.cn

Data set distributions

RDF documents over pay-level domains

Page 9: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 9 of 36

ws .nju.edu.cn

Data set distributions

Vocabularies over top-level domains

Page 10: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 10 of 36

ws .nju.edu.cn

Outline

Data set

Vocabulary relatedness

Post-selection vocabulary recommendation

Conclusions

Page 11: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 11 of 36

ws .nju.edu.cn

Vocabulary relatedness

6 numerical measures, from 4 aspects

Semantic relatedness

Explicit

Implicit

Hybrid

Content similarity

Expressivity closeness

Distributional relatedness

Comparison

Page 12: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 12 of 36

ws .nju.edu.cn

Measure 1: explicit semantic relatedness

owl:imports

v1 v2 v3

1 2

Eji

ji

E

SGvv

vvRin and between path shortest a ofweight

1,

GE

v1 v2

v3

rdfs:seeAlso

owl:priorVersion

Page 13: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 13 of 36

ws .nju.edu.cn

Measure 2: implicit semantic relatedness

owl:inverseOf

v2 v3 v4

1 2GI

t2 t3t4

owl:inverseOf

rdfs:subClassOf

Iji

ji

I

SGvv

vvRin and between path shortest a ofweight

1,

v2 v3 v4

Page 14: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 14 of 36

ws .nju.edu.cn

Measure 3: hybrid semantic relatedness

v1

v2

v3

1

2

IEji

ji

IE

SGvv

vvRin and between path shortest a ofweight

1,

v4

1

GE+I

Page 15: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 15 of 36

ws .nju.edu.cn

Statistical properties of GE, GI and GE+I

Empirical analysis (1)

Page 16: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 16 of 36

ws .nju.edu.cn

Empirical analysis (2)

Explicit relations between vocabularies

Page 17: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 17 of 36

ws .nju.edu.cn

Measure 4: content similarity

Harmonic mean

Maximum similarity between their labels

Page 18: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 18 of 36

ws .nju.edu.cn

Empirical analysis (3)

86 label-like properties

rdfs:label, dc:title, and their subproperties (e.g. skos:prefLabel)

and local name

63.67%

36.33%

Terms and their labels

w/

w/o

36.21%

63.79%

Vocabulary distribution

w/

w/o

Page 19: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 19 of 36

ws .nju.edu.cn

Measure 5: expressivity closeness

tq

tp

tr

MetaTerms

rdfs:domain

owl:inverseOf

owl:TransitiveProperty

owl:TransitiveProperty

rdf:type

Jaccard

Page 20: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 20 of 36

ws .nju.edu.cn

Empirical analysis (4)

4,978 meta-level terms, 469 (9.42%) in >1 vocabulary

Most popular meta-level terms

1. rdf:type

2. rdfs:domain

3. rdfs:range

4. …

and after excluding language constructs

10.13 meta-level terms per vocabulary

≤20 meta-level terms in 92.96% vocabularies

but hundreds in Cyc

Page 21: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 21 of 36

ws .nju.edu.cn

Measure 6: distributional relatedness

Distributional profile

vvp

vvp

vvp

v

n |

...

|

|

DP2

1

jijiD vvvvR DP,DPcos,

Page 22: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 22 of 36

ws .nju.edu.cn

Empirical analysis (5)

Instantiation found for 1,874 (62.55%) vocabularies

Most popular vocabularies (excluding languages)

Page 23: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 23 of 36

ws .nju.edu.cn

Empirical analysis (6)

Co-instantiation found for 9,763 pairs of vocabularies

Most popular vocabulary co-instantiation (excluding languages)

Page 24: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 24 of 36

ws .nju.edu.cn

Vocabulary relatedness

6 numerical measures, from 4 aspects

Semantic relatedness

Explicit

Implicit

Hybrid

Content similarity

Expressivity closeness

Distributional relatedness

Comparison

Page 25: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 25 of 36

ws .nju.edu.cn

Agreement between measures

Spearman’s rank correlation coefficient (ρ∈[-1,1])

Single-link hierarchical clustering

Page 26: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 26 of 36

ws .nju.edu.cn

Outline

Data set

Vocabulary relatedness

Post-selection vocabulary recommendation

Conclusions

Page 27: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 27 of 36

ws .nju.edu.cn

Ranking by single measure:

Ranking by multiple measures:

Relatedness-based ranking

Page 28: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 28 of 36

ws .nju.edu.cn

Popularity-based re-ranking

Number of pay-level domains instantiating vi

Degree of influence of popularity

Page 29: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 29 of 36

ws .nju.edu.cn

Evaluation settings

20 “selections” randomly selected from 1,302 moderate-sized vocabularies

Depth-10 pooling with

2 experts

Ratings

Closely related: 2

Somewhat related: 1

Unrelated: 0

Metric: NDCG

Page 30: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 30 of 36

ws .nju.edu.cn

Gold standard

739 assessments

Agreement between experts

80%

or 91% when “closely related = somewhat related = related”

7.85%10.55%

81.60%

Assessments

Closely related

Somewhat related

Unrelated

Page 31: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 31 of 36

ws .nju.edu.cn

Evaluation results --- individual measures

56.88% isolated vocabularies in GE 37.45% uninstantiated vocabularies

Page 32: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 32 of 36

ws .nju.edu.cn

Evaluation results --- combinations of measures

Page 33: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 33 of 36

ws .nju.edu.cn

Relatedness vs. popularity

NDCG@1 vs. number of pay-level domains instantiating it

Page 34: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 34 of 36

ws .nju.edu.cn

Outline

Data set

Vocabulary relatedness

Post-selection vocabulary recommendation

Conclusions

Page 35: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 35 of 36

ws .nju.edu.cn

Conclusions

Vocabulary-level relatedness

4 aspects, 6 measures

Empirical analysis

Statistical findings

Comparison

Post-selection vocabulary recommendation

Relatedness-based ranking

Popularity-based re-ranking

Evaluation

Falcons Ontology Search

http://ws.nju.edu.cn/falcons/ontologysearch/

Page 36: An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Gong Cheng (程龚) [email protected] 36 of 36

ws .nju.edu.cn

Take away

Vocabulary meta-descriptions are incomplete.

Terms lack labels.

Co-instantiated ∝ explicitly related

http://ws.nju.edu.cn/falcons/ontologysearch/