leveraging data and structure in ontology integration · pdf fileleveraging data and structure...

52
Leveraging Data and Structure in Ontology Integration O. Udrea L. Getoor R.J. Miller Group 15 Enrico Savioli Andrea Reale Andrea Sorbini DEIS University of Bologna Searching Information in Large Spaces Conference March 11, 2009 Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 1 / 52

Upload: lamnguyet

Post on 06-Feb-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Leveraging Data and Structure inOntology Integration

O. Udrea L. Getoor R.J. Miller

Group 15Enrico Savioli Andrea Reale Andrea Sorbini

DEISUniversity of Bologna

Searching Information in Large Spaces ConferenceMarch 11, 2009

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 1 / 52

Page 2: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Outline

1 Brief Introduction to OntologiesWhat is an Ontology

2 Motivations and State of the ArtBringing data togetherExisting Solutions

3 ILIADSOverviewThe AlgorithmExperimental ResultsConclusions and Future Developments

4 Demo

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 2 / 52

Page 3: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Brief Introduction to Ontologies What is an Ontology

Ontologiesa quest for meaning

A very common problem in IT is data modelingI Critical in the field of Information SystemsI A good data model enables good data usage

Tools used to describe concepts cannot express the implicitinterpretation of dataSemantic knowledge is lost after the modeling process

Example<musician name="F. Mercury" sings-for="Queen" /><musician name="B. May" plays-for="Queen" />

How can we find all of Queen’s members?

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 3 / 52

Page 4: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Brief Introduction to Ontologies What is an Ontology

Ontologiesa formal model

OntologyAn ontology is a formal representation of

concepts within a domain (Song, Performer, Composer...)

relations between those concepts (plays-for, member-of, written-by...)

individual instances (queen, f.mercury, b.may...)

Well-defined constructs to enrich descriptions with semanticsFormal models to enable automatic reasoning (DL, FOL)(member(Y,Z) :- sings-for(Y,Z) ; plays-for(Y,Z))

OWL is the W3C standard for an ontology language

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 4 / 52

Page 5: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Brief Introduction to Ontologies What is an Ontology

OWLA simple example

Whole lotta love

subClassOf subClassOfsubClassOf

Musician Writer

Iron MaidenVienna Philharmonic S. Harris

subClassOfsubClassOf

type typetypetype

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

typetype

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-ofplays

Performer

performs

composer-of

Smoke onthe water

type

Music

type

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 5 / 52

Page 6: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Motivations and State of the Art Bringing data together

Motivating Scenario

The AAA principleAnyone can say Anything about Any topic

The same thing might be described in different waysThere is a strong need to integrate heterogeneous schemas

I Not only in a web scenario

ExampleDB schemasXML schemasOntology schemas

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 6 / 52

Page 7: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Motivations and State of the Art Bringing data together

An Integration Problem

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

The Phantom of the Opera

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

composed-by

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 7 / 52

Page 8: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Motivations and State of the Art Bringing data together

Integration Techniques

Structure Based MatchingUses schema meta-data (e.g.informations about tables orconcepts models) to discovermapping elements amongthem, both on a structural andelement level

Instance Based MatchingUses informations about datainstances (e.g. contents of atable or individuals of anontology) to discover mappingsamong entities representingthem

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 8 / 52

Page 9: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Motivations and State of the Art Bringing data together

Ontology Integration

Ontologies offer further ways to uncover aligning relationsbetween entitiesExplicit theoretic model semantics can be leveraged to improveintegration quality

ExampleAssume that composed-by is a functional property“Requiem K626” and “Requiem in D minor” are the sameIf we have:

I composed-by(‘Requiem K626’, ‘Mozart’).I composed-by(‘Requiem in D minor’, ‘W.A. Mozart’).

Then we might say that aligning “W.A. Mozart” and “Mozart” is agood choice

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 9 / 52

Page 10: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Motivations and State of the Art Existing Solutions

FCA-Merge and COMA++

FCA-Merge (human-aided tool to merge ontologies)Collects domain related natural language documentsSearches those documents for ontologies’ concepts occurrencesDerives an alignment that has to be validated by an operator

COMA++ (framework with multiple match strategies)Fragment based matchingReuse of previous matching resultsComprehensive GUI for results evaluation

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 10 / 52

Page 11: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Motivations and State of the Art Existing Solutions

What’s missing

Both COMA++ and FCA-Merge use only structure level matchingMoreover none of them makes use of the semantic potentialoffered by ontologiesThere exist other solutions using reasoning support, however ...

I it is used only for an a-posteriori consistency check of the result

An interesting improvement might be to leverage it for the actualmatching processThat’s where ILIADS kicks in!

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 11 / 52

Page 12: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS Overview

ILIADSIntegrated Learning In Alignment of Data and Schema

Takes two OWL Lite ontologies as inputCombines "traditional" schema matching approaches with alogical inference algorithm

I Inference results are used to influence confidence in a presumedmapping

Makes use of both schema (structure) and data (individuals)Outputs a set of axioms (the alignment) that tights the inputontologies together

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 12 / 52

Page 13: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

Algorithm Overview

INPUT: Consistent Ontologies O1 and O2OUTPUT: Alignment A∗

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 13 / 52

Page 14: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 14 / 52

Page 15: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStructures initialization

The algorithms groups equivalent entities in clustersClusters are classified by the type of their entities

I Clusters of ClassesI Clusters of PropertiesI Clusters of Individuals

A new alignment can result inI Merging of clustersI A new subsumption relationship between clusters

At the beginning a cluster is created for each entity

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 15 / 52

Page 16: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 16 / 52

Page 17: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSimilarity Computation

Entities similarity scoresim(e,e′) = λx · simlex(e,e′) + λs · simstruct(e,e′) + λe · simext(e,e′)

lexical: Jaro-Winkler (similar to edit distance) and thesauristructural: Jaccard for neighborhoods (Jacd (S1, S2) = |S1∩S2|

|S1∪S2|)

extensional: Jaccard on extensions

The set of parameters {λx , λs, λe} is different for each entity type

Similarity between clusters is computed combining the similaritiesof their entities

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 17 / 52

Page 18: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 18 / 52

Page 19: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 19 / 52

Page 20: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSelecting a relationship

Iterates over each couple (c, c′) such that sim(c, c′) is above athreshold λt

For each of those a candidate relationship is chosen between:I Equivalence

(Two concepts are said to be equivalent if they denote the same concept)I Subsumption

(A concept subsumes another concept if it always denotes a superset of the second)

The selection is done by looking at the intersection of entities’extensions:

I The set of its instance individuals, for a classI The couples of individuals involved, for a property

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 20 / 52

Page 21: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSelecting a relationship - Example (1)

MusicArtist

The Black Sabbath

W.A. Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

composer-of

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 21 / 52

Page 22: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSelecting a relationship - Example (2)

MusicArtist

The Black Sabbath

W.A. Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

composer-of

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAs

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 22 / 52

Page 23: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSelecting a relationship - Example (3)

MusicArtist

The Black Sabbath

W.A. Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

composer-of

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAs

Number of instances belonging only to "Rock Music" over the

total number of instances considered:

2/3

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 23 / 52

Page 24: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSelecting a relationship - Example (4)

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 24 / 52

Page 25: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmSelecting a relationship - Example (5)

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 25 / 52

Page 26: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 26 / 52

Page 27: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference

The inference step is used to:I Look for inconsistencies of the candidate relationshipI Infer logical consequences of the new axiomI Possibly enforce the confidence in it

The inference is not complete (it would be EXPTIME in OWL Lite)I Only a small number of steps is actually performedI However this may cause inconsistencies not to be found

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 27 / 52

Page 28: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference - Example (1)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

composed-by

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 28 / 52

Page 29: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference - Example (2)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAscomposed-by

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 29 / 52

Page 30: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference - Example (3)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAscomposed-by

composed-byinverseOf

composer-of

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 30 / 52

Page 31: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference - Example (4)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAscomposed-by

composed-by

composer-of

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 31 / 52

Page 32: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference - Example (5)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAscomposed-by

composed-by

composer-of

composed-byis a

FunctionalProperty

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 32 / 52

Page 33: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmIncremental Logical Inference - Example (6)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

sameAscomposed-by

composed-by

composer-of

composed-byis a

FunctionalProperty

sameAs

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 33 / 52

Page 34: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 34 / 52

Page 35: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmUpdate similarity score

This is the key point of the algorithmI A value f - the influence factor of the inference - is computed

The “f ” factor

f =∏

(e1,e2)∈Q

sim(e1,e2)

1− sim(e1,e2)

Q: the set of entity pairs that became equivalent as a consequence of inference

f is used to update the similarity score for the current coupleI siminf (c, c′) = min(f · sim(c, c′),1)

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 35 / 52

Page 36: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmUpdate similarity score - Example (1)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

composed-by

composed-by

composer-of

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 36 / 52

Page 37: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmUpdate similarity score - Example (2)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

composed-by

composed-by

composer-of

Similarity :

0.5

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 37 / 52

Page 38: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmUpdate similarity score - Example (3)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

composed-by

composed-by

composer-of

Similarity :

0.5

Similarity :

0.6

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 38 / 52

Page 39: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmUpdate similarity score - Example (4)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

composed-by

composed-by

composer-of

Similarity :

0.5

Similarity :

0.6

f = 0.6 / (1 - 0.6) = 1.5

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 39 / 52

Page 40: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmUpdate similarity score - Example (5)

MusicArtist

The Black Sabbath

Mozart

Music Piece

Requiem K626

Smoke onthe water

subClassOf

type

type

Classic Rock

type

subClassOfMetal

subClassOf

Fusion

Quadrant Four

Whole lotta love

Paranoid

Orchestral

subClassOf

type

plays

type

type

type

type

subClassOf subClassOf

subClassOf

MusicianWriter

Iron Maiden

Vienna Philharmonic

S. Harris

subClassOf

subClassOf

type

type

type

type

Classical Music

Ride of the Valkyries

Requiem inD minor

Take the "A" Train

The Phantom of the Opera

type

type

type

Jazz Music

type

Rock Music

W.A. Mozart

composer-of

plays

Performer

performs

composer-of

Smoke onthe water

typeMusic

Iron Maiden

type

Billy Cobham

typecomposer-of

plays

plays

The Phantom of the Opera

composed-by

composed-by

composer-of

Similarity :

0.75

f = 0.6 / (1 - 0.6) = 1.5

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 40 / 52

Page 41: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 41 / 52

Page 42: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmBuilding the alignment

By now for each candidate pair of clusters of a given typeI A possible relationship has been exploredI A set of consequences has been inferredI An “inference-weighted“ similarity has been computed

Before restarting the loop:I The axiom a∗(c,c′) with the highest similarity score is chosenI It is added to the output alignment A∗I If a∗(c,c′) is an equivalence c and c′ are merged

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 42 / 52

Page 43: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmStep by step

01: Initialize algorithm’s structures (O is O1 ∪O2)02: repeat:03: Compute similarity scores between clusters04: Heuristically select a type of clusters05: for each couple (c, c′) of that type do06: Determine a candidate relationship a(c,c′)

07: Perform incremental inference08: Update similarity score09: Select the best couple, update O and A∗

10: until there are clusters with similarity > λt11: return A∗

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 43 / 52

Page 44: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS The Algorithm

The AlgorithmThe End(ing)

The algorithm halts when there are no more candidate clustersI When there are no clusters having similarity greater than the

threshold λtI Remaining clusters are not likely to share any relationship

Intuitively ILIADS is guaranteed to terminate becauseI Previously used cluster pairs are not re-used unless their score

changesI The merging process decreases the number of clusters

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 44 / 52

Page 45: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS Experimental Results

Comparative resultsPrecision, Recall and F-1

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 45 / 52

Page 46: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS Experimental Results

Tests Analysis

The interplay between structure and instance integration delivershigher quality

I Significant improvement of recallI Tests on ontologies without instance data showed comparable

results with the other systems

Lambda-tuning allows ILIADS to adapt itself better to particularpairs of ontologies

Inconsistent alignments due to limited inference steps were foundonly in the .5% of tests

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 46 / 52

Page 47: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

ILIADS Conclusions and Future Developments

Summary

Explicit semantics provided by ontologies can improve thequality of data integrationInstance data exploitation could significantly enhance traditionalmatching techniquesILIADS leverages both these opportunities and shows promisingresults

Future developmentsIntra-ontology differentiated λ parametersAlignment of ”distant” sections of the ontologies in parallel

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 47 / 52

Page 48: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Demo

Demo

And now an ultra-fancy demo

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 48 / 52

Page 49: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Demo

Thank you.

Group 15

A. Sorbini E. Savioli A. Reale

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 49 / 52

Page 50: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Appendix For Further Reading

For Further Reading I

W3C OWL resources.http://www.w3.org/2004/OWL/.

D. Aumueller, H.H. Do, S. Massmann, and E. Rahm.Schema and ontology matching with COMA++.In Proceedings of the 2005 ACM SIGMOD internationalconference on Management of data, pages 906–908. ACM NewYork, NY, USA, 2005.

I. Horrocks, P.F. Patel-Schneider, and F. van Harmelen.From SHIQ and RDF to OWL: the making of a Web OntologyLanguage.Web Semantics: Science, Services and Agents on the World WideWeb, 1(1):7–26, 2003.

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 50 / 52

Page 51: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Appendix For Further Reading

For Further Reading II

Y. Kalfoglou and M. Schorlemmer.Ontology mapping: the state of the art.The knowledge engineering review, 18(01):1–31, 2003.

E. Rahm and P.A. Bernstein.A survey of approaches to automatic schema matching.The VLDB Journal The International Journal on Very Large DataBases, 10(4):334–350, 2001.

P. Shvaiko and J. Euzenat.A survey of schema-based matching approaches.Lecture Notes in Computer Science, 3730:146–171, 2005.

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 51 / 52

Page 52: Leveraging Data and Structure in Ontology Integration · PDF fileLeveraging Data and Structure in Ontology Integration ... Iron Maiden Vienna Philharmonic ... Compute similarity scores

Appendix For Further Reading

For Further Reading III

G. Stumme and A. Maedche.FCA-MERGE: Bottom-Up Merging of Ontologies.In INTERNATIONAL JOINT CONFERENCE ON ARTIFICIALINTELLIGENCE, volume 17, pages 225–234. LAWRENCEERLBAUM ASSOCIATES LTD, 2001.

O. Udrea, L. Getoor, and R.J. Miller.HOMER: Ontology alignment visualization and analysis.

F. Baader.The description logic handbook: theory, implementation, andapplications.Cambridge University Press, 2003.

Savioli, Reale, Sorbini (DEIS) UGM07 SI-LS 2009 52 / 52