from dynamic to unbalanced ontology...

48
1 From Dynamic to Unbalanced Ontology Matching Jie Tang Knowledge Engineering Group, Dept. of Computer Science and Technology Tsinghua University May 22 th 2009

Upload: others

Post on 04-Nov-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

1

From Dynamic to Unbalanced

Ontology Matching

Jie Tang

Knowledge Engineering Group,

Dept. of Computer Science and Technology

Tsinghua University

May 22th 2009

Page 2: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

2

ObjectThing

Washington_course Cornell_course

College_of_Arts_and_Sciences

College_of_Arts_and_Sciences

Linguistics

Linguistics

Asian_Studies

Asian_Languages_and_Literature

French_Linguistics_FRLINGLinguistics_LING

Romance_Linguistics_ROLING

Spanish_Linguistics_SPLING

1O本体 2O本体

What is Ontology Matching?

Page 3: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

3

Ontology Matching

inst1

inst1

inst1

attr1

attrn

inst1

inst1

inst1

inst1

attrn

attrn

Page 4: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

4

Problem Definition

}{),},({ 2211 ii eOOeMap

Cardinality O1 O2 Mapping Expression

1:1 Faculty Academic staff O1.Faculty= O2.Academic staff

1:n Name First name, Last name O1.Name= O2.First name+O2.Last name

n:1 Cost, Tax ratio Price O1.Cost*(1+ O1.Tax ratio)= O2.Price

1:null AI

null:1 AI

n:m BookTitle, BookaNo,

PublisherNo,

PublisherName

Book, Publisher O1.BookTitle + O1.BookaNo +

O1.PublisherNo + O1.PublisherName =

O2.Book + O2.Publisher

Matching Function:

Page 5: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

5

Ontology Matching

• Our work

RiMOM: Risk minimization based approach – Jie Tang, et al. Journal of Web Semantics. 2006, Dec. (JoWS, IF:3.41)

Dynamic ontology matching framework – Juanzi Li, Jie Tang, et al. TKDE, 2009

Unbalanced ontology matching – Qian Zhong, Hanyu Li, Juanzi Li, Guotong Xie, Jie Tang. SIGMOD’2009.

Page 6: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

6

RiMOM—A tool for ontology matching

• OAEI(2006-2008):an international contest on ontology alignment

0

0.5

1

Benchmark Results

Precsion

Recall

F-measure 0

0.2

0.4

0.6

0.8

1 Anatomy Results

Precision

Recall

Recall+

F-measure

0 0.2 0.4 0.6 0.8

1

agrafsa Subtrack Results

Precision

Msg from Chair:“I’m really

surprised by the good results of

these years RiMOM, you can

compete with the top systems

that make use of such

background knowledge.”

Page 7: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

7

RiMOM—A tool for ontology matching

http://keg.cs.tsinghua.edu.cn/project/RiMOM/

Page 8: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

8

Outline

• Dynamic Multi-strategy Ontology Matching

• Unbalanced Ontology Matching

• Discussion

Page 9: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

9

A Dynamic Multi-strategy Ontology

Alignment Framework

• Matching = Multi-strategies + Strategy selection

• Concept/Attribute name

• Concept/Attribute path

• Concept/Attribute’s description

• Instance

• Structure

• Associate a loss for each candidate matching

• Strategy selection: determine if we should use the strategies

- Linguistic similarity factor

- Structural similarity factor

1 2

# __

max(# ,# )

same labelF LS

c c

1 2

# __

max(# _ ,# _ )

common conceptF SS

nonleaf c nonleaf c

Page 10: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

10

A General Processing Flow

Strategy pool

Similarity factor

Page 11: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

11

ObjectThing

Washington_course Cornell_course

College_of_Arts_and_Sciences

College_of_Arts_and_Sciences

Linguistics

Linguistics

Asian_Studies

Asian_Languages_and_Literature

French_Linguistics_FRLINGLinguistics_LING

Romance_Linguistics_ROLING

Spanish_Linguistics_SPLING

1OSchema 2OSchema

Multiple Strategies

• Concept name: similarity(washington_course, cornell_course)

• Concept path: similarity(/object/washington_course, /thing/cornell_course)

• Concept description: classifier = train(O2) and classify (O1, classifier)

• Instance: classifier = train(O2) and classify (O1, classifier)

• Structure: taxonomy information. E.g. Hypernyms and Hyponyms

Asian languages

CHIN

Asian studies

THAI

Thai

HINDI

Korean

1O 2O

KOREAN

Hindi

Thaixyz

Page 12: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

12

11

00

1 1 0 1

1 0 1 0 1

0 1

001

11

11

Query

vector

Doc1

vector

Doc3

vector

Doc4 vector

Multiple Linguistic Strategies

• Edit distance on entity’s label

• WordNet:

• Vector-based similarity

Conferece

Conference

label

The location of an

event, An event

presenting work

description

Spg04

(label:)

SemPGrid 04 Workshop

(name:)

SemPGrid 04 Workshop

(location:)

New-York NY US

(date:)

--05 2004

instances

Page 13: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

13

Similarity Propagation

Thing Object

location place

subClassOf

hasProperty range

subClassOf

hasProperty range

Reference Address DirectionsEntry

Thing

Object

Reference

Directions

Address

Direction

Reference

Entry

Address

Entry

location

place

subClassOf

hasProperty range

The construction of an intermediate graph from original ontologies

Ontology 1 Ontology 2

Page 14: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

14

Similarity Propagation (cont.)

• Propagate similarities along edges

• Three types of edges:

– Class to Class (CCP)

– Class to Property (CPP)

– Property to Property (PPP)

Thing

Object

Reference

Directions

Address

Direction

Reference

Entry

Address

Entry

location

place

subClassOf

hasProperty range

0.7

0.3 0.6 0.5 0.2

0.9

weight=0.5

0.6+0.7*0.5+0.9*0.5=1.4

Page 15: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

15

• weighted vector generation

• content feature

• structure feature

• cosine similarity

Strategy Pool

Strategy pool

Edit-distance

Sim = 1-ED(label1, label2)

Vector-similarity

Path-similarity

• entity path

• path similarity definition

Background-knowledge

• external knowledge

• similarity definition

Similarity-combination

1 21...

1 2

1...

,,

k kk n

kk n

w Map e eMap e e

w

Similarity-propagation

• three propagation strategies

• CCP, PPP, CPP

Page 16: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

17

Strategy Selection—Similarity factor

• Label similarity factor

• Structure similarity factor

1 2

# __

max(# ,# )

same labelF LS

c c

1 2

# __

max(# _ ,# _ )

common conceptF SS

nonleaf c nonleaf c

Part

Chapter

InBook

InCollection

InProceedings

JournalPart

Article

Review

Editorial

Letter

Part

Chapter

InBook

InCollection

InProceedings

Article

Ontology 1 Ontology 2

F_LS = 6/10

F_SS = 1/2

max(#c1, #c2) = 10 max(#nonleaf_c1, #nonleaf_c2) = 2

Page 17: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

18

Strategy Selection

• Strategy Selection

– Selection with the two similarity factors

– Determining whether a strategy is to be used in the

alignment process

– E.g. if F_SS>0.25, we use CCP, CPP, and PPP for

propagation. …

• Linguistic Strategy

– Adding structural features in vector-based similarity

Page 18: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

20

Outline

• Dynamic Multi-strategy Ontology Matching

– Experimental Results

• Unbalanced Ontology Matching

• Discussion

Page 19: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

21

Data Sets

• OAEI 2006

– Benchmark (15-69), 53 alignment tasks

– Directory: (4,500), Yahoo and ODP

– Food: (16,000 vs. 41,000), two SKOS thesaurus

• OAEI 2007

• Comparison methods

Page 20: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

22

Statistics on the Data Set

Data set Ontology #concept #attribute #alignment

(ground truth) #instance

Benchmark

Reference Ontology 33 59 -- 76

101 33 61 91 111

103 33 61 91 111

104 33 61 91 111

201 34 62 91 111

202 34 62 91 111

204 33 61 91 111

205 34 61 91 111

221 34 61 91 111

222 29 61 91 111

223 68 61 91 111

224 33 59 91 0

225 33 61 91 111

228 33 0 33 55

230 25 54 75 83

301 15 40 61 0

302 15 31 48 0

303 54 72 49 0

304 39 49 76 0

Page 21: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

23

Similarity between Ontologies

Page 22: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

24

Results on OAEI2006

Page 23: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

25

RiMOM vs. RiMOM-SP

Page 24: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

26

RiMOM vs. RiMOM-SS

Page 25: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

27

Relationship with Several Classical

Methods

Page 26: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

28

Results on OAEI 2006

Page 27: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

29

Results on OAEI2006

• Directory

• Food

Page 28: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

30

Results on OAEI 2007

Page 29: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

31

Result on OAEI 2008

0

0.5

1

Benchmark Results

Precsion

Recall

F-measure 0

0.2

0.4

0.6

0.8

1 Anatomy Results

Precision

Recall

Recall+

F-measure

0 0.2 0.4 0.6 0.8

1

agrafsa Subtrack Results

Precision

Page 30: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

32

Experiences

• Structure information is very important in many

alignment tasks for achieving high performance

• An effective method for combining the multiple

strategies can enhance alignment performance

– Investigate more factors to describe the

characteristics of the ontologies

– Exploit new strategies for ontology alignment

Page 31: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

33

Outline

• Dynamic Multi-strategy Ontology Matching

• Unbalanced Ontology Matching

• Discussion

Page 32: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

34

Unbalanced Ontology

Several challenges: • Single domain vs. multiple domains

• Small size vs. large-size ontology

Page 33: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

35

Key Problems

• Linguistic-based strategy

– |O1| x |O2|

• Structure-based strategy

– In memory graphs

– Iterative propagation

Thing Object

location place

subClassOf

hasProperty range

subClassOf

hasProperty range

Reference Address DirectionsEntry

Onto1 Onto2 Thing

Object

Reference

Directions

Address

Direction

Reference

Entry

Address

Entry

location

place

subClassOf

hasProperty range

Page 34: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

36

Our Approach

Lightweight ontology

Heavyweight ontology

Sub-ontology

2. construct

1.Select candidates

Page 35: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

37

Step 1: Select Candidates

Similarity between ci and Ol

Edit-distance

e.g. site vs. cite

WordNet

Complexity:

|O1| x |O2|

Page 36: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

38

Step 2: Construct Sub-ontology

influence similarity

||

||

V

E

Page 37: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

39

Step 3: Finding Matching Results

Thing Object

location place

subClassOf

hasProperty range

subClassOf

hasProperty range

Reference Address DirectionsEntry

Onto1 Onto2 Thing

Object

Reference

Directions

Address

Direction

Reference

Entry

Address

Entry

location

place

subClassOf

hasProperty range

Page 38: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

40

Outline

• Dynamic Multi-strategy Ontology Matching

• Unbalanced Ontology Matching

– Experimental Results

• Discussion

Page 39: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

41

Data Set

• OAEI 2007

– GEMET: (5,280) The European Environment Agency

GEMET ontology.

– AGROVOC: (28,439) AGROVOC thesaurus provided by

Food and Agriculture Organization of the United Nations.

– NAL: (42,326) The Agricultural thesaurus released by the

National Agricultural Library.

• Evaluation Measures

– Precision

– Recall

– F1-Measure

– CPU Time

Page 40: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

42

Data Statistics

Page 41: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

43

Precision

Page 42: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

44

Recall

Page 43: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

45

F1-Measure

Page 44: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

46

CPU Time

Page 45: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

47

Outline

• Dynamic Multi-strategy Ontology Matching

• Unbalanced Ontology Matching

• Discussion

Page 46: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

48

Discussion

• Large-scale ontology matching

– Both ontologies are very large

• Group ontology matching

– A large number of sub ontologies

• Social ontology integration

– Folksonomies

• Active learning for ontology matching

– User interactions

• Beyond one-one alignment

• Beyond alignment

Page 47: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

49

Related Publications

• Jie Tang, Juanzi Li, Bangyong Liang, Xiaotong Huang, Yi Li, and Kehong

Wang. Using Bayesian Decision for Ontology Mapping. Journal of Web

Semantics, Vol(4) 4:243-262, December 2006. (Top 10 cited papers in

JWS's history)

• Juanzi Li, Jie Tang, Yi Li, and Qiong Luo. RiMOM: A Dynamic Multi-Strategy

Ontology Alignment Framework. IEEE Transaction on Knowledge and Data

Engineering (TKDE). August 2009 (vol. 21 no. 8) pp. 1218-1232. (one of

top cited papers among TKDE 2009's 100+ papers)

• Qian Zhong, Hanyu Li, Juanzi Li, Guotong Xie, Jie Tang, and Lizhu Zhou. A

Gauss Function based Approach for Unbalanced Ontology Matching. In

Proceedings of the 2009 ACM SIGMOD international conference on

Management of data (SIGMOD'2009). pp.669-680.

• Feng Shi, Juanzi Li, and Jie Tang. Actively Learning Ontology Matching via

User Interaction. In Proceedings of the 8th International Conference of

Semantic Web (ISWC'2009). pp. 585-600.

Page 48: From Dynamic to Unbalanced Ontology Matchingkeg.cs.tsinghua.edu.cn/jietang/publications/Jie-Tang-Ontology-Match… · Recall F-measure 0 0.2 0.6 0.8 1 Anatomy Results Precision Recall

50

Thanks!

Q&A HP:

http://keg.cs.tsinghua.edu.cn/persons/tj/