mpri internship defense advances in holistic ontology ... · mpri internship defense advances in...

42
Background Paris Performance Joins Theory Literals Application to IE Conclusion Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart el´ ecom ParisTech 1/32

Upload: others

Post on 07-Jun-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Mpri Internship DefenseAdvances in Holistic Ontology Alignment

Antoine AmarilliSupervised by Pierre Senellart

Telecom ParisTech

1/32

Page 2: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

The Semantic Web

Paris FrancecapitalOf

Facts on the Web

<p><b>Paris</b> is the <a href="Capital_city"> capital</a> of <a href="France">France</a></p>

Facts on the semantic Web

The Web. Lots of information in semi-structured HTMLdocuments.

The semantic Web. An effort to represent information in astructured and semantic way.

Uses. Interoperability, integration of sources, constraints,complex queries, inference.

2/32

Page 3: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Ontologies

dbp:Paris

dbp:Francedbp:capital

http://www.paris.fr/foaf:homepage

'Paris'foaf:name

Ontologies are the information sources of the Semantic Web.

Vertices are entities or literals.

Edges are facts labeled with a relation.

Sources : manual creation, existing databases, informationextraction.

3/32

Page 4: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Linked Data Cloud

Pfam

ChEMBL

DBpedia

MediCare

ERA

OS

Affy-metrix

SmartLink

UniProt(Bio2RDF)

TCMGeneDIT

OMIM

SIDER

ProjectGuten-berg

ProDom

HGNC

GeneOntology

Eurécom

UniRef

DrugBank

PubChem

LinkedOpenColors

LinkedCT

SISVU

dbpedialite

BNBiServe

PubMed

data-open-ac-uk

PRO-SITE

DailyMed

Taxo-nomy

Google

BibBase

STITCH

PDB

UniParc UniSTS

MGI

DBLP(L3S)

GeneID

datadcs

Disea-some

SGD

UniProt

UN/LOCODE

DBLP(FUBerlin)

InterPro

Enipedia

Many ontologies are createdindependently: different entitiesand relations express the samethings.

Linked Data: integrate existingontologies in a network structuredby equality links betweenequivalent concepts.

To automatically derive those links,we need to perform ontologyalignment.

4/32

Page 5: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Ontology Alignment

Sometimes URIs do not help us and literals are ambiguous or haveminor differences...

dbp:Charles_Brackett 'Charles William Brackett'foaf:name

dbp:Titanic_(1953_film)

dbp:producer

'Titanic'foaf:name

imdb:p138992 'Charles Brackett'imdb:label

imdb:tt0046435

imdb:producerOf

'Titanic'imdb:label

Sometimes the structures of the two ontologies do not match...

dbp:Douglas_Adams '1952-03-11'dbp:birthDate

bnb:AdamsDouglas1952-2001 bnb:AdamsDouglas1952-2001/birthbio:eventbio:Birthrdf:type

'1952'bio:date

5/32

Page 6: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

6/32

Page 7: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris

Paris: Probabilistic Alignment of Relations, Instances, andSchema.To bootstrap a matching, Paris uses an equality function onliterals and applies propagation rules.

x’ y’r’

x yr

= ⊆ =

x’ y’r’

x yr

= ⊆ =

The rules are represented as a system of equations which weiterate until a fixpoint is reached:

Prn+1(x ≡ x′) = 1−∏

r(x,y)

r′(x′,y′)

(1− Prn(r′ ⊆ r)× fun-1(r)× Prn(y ≡ y′)

)×(

1− Prn(r ⊆ r′)× fun-1(r′)× Prn(y ≡ y′))

Prn+1(r ⊆ r′) =

∑r(x,y)

(1−

∏r′(x′,y′)

(1− (Prn(x ≡ x′)× Prn(y ≡ y′))

))∑

r(x,y)

(1−

∏x′,y′ (1− Prn(x ≡ x′)× Prn(y ≡ y′))

)7/32

Page 8: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis

'Elvis Presley'b:name

'1935-01-08'b:birthdate

b:Priscillab:spouse

'Priscilla Presley'b:name

a:Elvis

'Elvis Presley'a:name

'1935-01-08'a:birthdate

a:Priscillaa:spouse

'Priscilla Presley'a:name

8/32

Page 9: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis

'Elvis Presley'b:name

'1935-01-08'b:birthdate

b:Priscillab:spouse

'Priscilla Presley'b:name

a:Elvis

'Elvis Presley'a:name

'1935-01-08'a:birthdate

a:Priscillaa:spouse

'Priscilla Presley'a:name

8/32

Page 10: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis

'Elvis Presley'b:name

'1935-01-08'b:birthdate

b:Priscillab:spouse

'Priscilla Presley'b:name

a:Elvis

'Elvis Presley'a:name

'1935-01-08'a:birthdate

a:Priscillaa:spouse

'Priscilla Presley'a:name

8/32

Page 11: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Paris by Example

b:Elvis

'Elvis Presley'b:name

'1935-01-08'b:birthdate

b:Priscillab:spouse

'Priscilla Presley'b:name

a:Elvis

'Elvis Presley'a:name

'1935-01-08'a:birthdate

a:Priscillaa:spouse

'Priscilla Presley'a:name

8/32

Page 12: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Relation Functionalities

A

'Eiffel tower'name

'48.8583°N 2.2945°E'position

B

'Tour Eiffel'name

'48.8583°N 2.2945°E'position

Two instances should be aligned when they share the samevalues for aligned functional relations.

In theory, the ontology schema should indicate which relationsare functional.

In practice, no schema, and no “strict” functionality: computea fuzzy functionality in [0, 1] from the data.

9/32

Page 13: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Existing Implementation and Previous Results

Paris is implemented in Java.

Paris was evaluated on:

toy datasets from the OAEI,DBpedia and Yago (two ontologies extracted from Wikipedia)Yago and IMDb

The evaluation is done in terms of precision, recall andF-measure.

Instances Classes Relations

Prec Rec F Prec Rec F Prec Rec F

OAEI person 100% 100% 100% 100% 100% 100% 100% 100% 100%OAEI restaurant 95% 88% 91% 100% 100% 100% 100% 66% 88%DBpedia–Yago 90% 73% 81% 94% - - 93% - -IMDb–Yago 94% 90% 92% 28% - - 100% 80% 89%

10/32

Page 14: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

11/32

Page 15: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

12/32

Page 16: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Performance Improvements

The original Paris takes a few hours per iteration.

Ways to improve this:

Replace BerkeleyDB by an in-memory representation of theontologies.Parallelize the propagation of entity alignment scores over allentities. Aggregate results at the end to avoid races.Change the hardware (now that the computation isCPU-bound).

13/32

Page 17: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Performance Improvement Results

Iteration Original PARIS New PARIS (1 thread) New PARIS (4 threads)

Startup 0h00 0h27 0h101 4h04 0h40 0h272 5h06 3h00 1h023 5h00 0h34 0h244 5h30 0h29 0h16

Total 20h 5h 2h

Table: Running times for the DBpedia–Yago alignment task. Theoriginal Paris was run on an Intel Xeon E5620 CPU clocked at 2.40 Ghzon a machine with 12 GB of RAM. The new Paris was run on an IntelCore i7-3820 CPU clocked at 3.60 Ghz with 48 GB of RAM.

14/32

Page 18: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

15/32

Page 19: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Join Relations

a:Douglas_Adams a:UKa:countryOfBirth

b:Douglas_Adams

b:Cambridgeb:birthPlace

b:UK(b:birthPlace, b:country)

b:country

The simplest possible difference in structure betweenontologies: relations of one ontology correspond to joinrelations in the other ontology.

The terminology is motivated by the “join” operator ofrelational algebra.

We see the join as a binary predicate: the intermediate nodesare existentially quantified but projected away.

16/32

Page 20: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Support in Paris

We must keep the representation of joins implicit in Paris(memory constraints).

We must recursively enumerate all possible join facts insteadof enumerating all possible facts.

We must avoid duplicate facts caused by multiple possiblechoices for the intermediate nodes.

We cannot afford to enumerate all possible relations anymore(many possible joins).

⇒ New algorithm to compute the entity and relation alignmentssimultaneously.

17/32

Page 21: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Practical Issues

How to determine the functionality of join relations?

How to select interesting joins to perform without exploringall joins?

How to achieve acceptable running time on large ontologies?

⇒ We only perform the join alignment on small ontologies.

18/32

Page 22: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

19/32

Page 23: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Log-transformation and Product Graph

Prn+1(x ≡ x ′) = 1−∏r(x,y)

r′(x′,y′)

(1− Prn(r ′ ⊆ r)× fun-1(r)× Prn(y ≡ y ′)

)×(1− Prn(r ⊆ r ′)× fun-1(r ′)× Prn(y ≡ y ′)

)The entity alignment equation is justified by a probabilisticmodel (independent choices).If the relation functionalities and alignments are in {0, 1}, wecan apply a log-transformation:

LPrn(x ≡ x ′) ··= − log(1− Prn(x ≡ x ′))

By looking at propagation in the product graph, we get anicer equation, for some matrix M and a constant literalalignment vector L:

LPrn+1 = M LPrn +L20/32

Page 24: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Green Measures

LPrn+1 = M LPrn +L

This equation is similar to PageRank (LPrn+1 = M LPrn

where M is a stochastic matrix) except:1 The matrix is not stochastic.2 Diverging to +∞ means convergence (because of the

log-transformation).3 L is pouring alignment weight to the aligned couples of literals.

This last point can be linked to the use of Green measures tofocus the PageRank computation.

This interpretation suggests possible changes to the entityalignment equation (but we lose the probabilisticinterpretation).

21/32

Page 25: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

22/32

Page 26: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Literal Similarity Functions

Edgar R. Burroughs Douglas Adams and Constance Garnett

Edgar Rice Burroughs Adams, Douglas Constance Garnett

The original Paris uses an exact literal equality function.

Possible refinements: adjust for case, strip special characters,etc.

Yet, we would need a better equality function giving > 0weight to the alignment of similar literals.

Approximate dictionary searching problem: given a literal, tofind quickly all similar literals in the other ontology.

23/32

Page 27: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Results

We use a shingling technique which was implemented byMayur Garg (who interned in the team from IIT Delhi).

I interfaced his code with Paris.

The performance of the shingling technique matches ad-hocnormalization on the OAEI restaurants dataset.

Precision Recall F-measure

Paris with exact equality 95% 88% 91%Paris with shingling 96% 95% 96%Paris with normalization 98% 96% 97%

24/32

Page 28: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

25/32

Page 29: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

The Deep Web

Many structured databases can onlybe queried through interfaces designedfor humans (Web forms and HTMLresult pages).

To access this structured information,an automated agent must probe theform and perform wrapper inductionon the result pages.

To understand the meaning of theextracted records and attributes, wecan use Paris (with a referenceontology).

26/32

Page 30: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 31: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 32: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 33: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 34: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 35: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 36: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 37: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Application to Form Understanding

ontologyalignment

ontologyenrichment

'Great Expectations''Charles Dickens'

'David Copperfield''by Charles Dickens'

'Dover Thrift Editions'?e1

?e2

rdfs:type

rdfs:type

Labeled graph

'Penguin Books'

?class

form probing

newprobing

terms

RDF triples

generation

Result page

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

List of records

Great ExpectationsCharles DickensDover Thrift Editions

The following results were found for your search:

David Copperfieldby Charles DickensPenguin Classics

wrapperinduction

y:hasName

y:hasName'Great Expectations'

'David Copperfield'

y:created

'Charles Dickens'y:created

y:hasName

CharlesDickens

rdfs:type

rdfs:type

rdfs:type

'Othello'y:hasName

y:created 'Shakespeare'y:hasNameOthello

Shakespeare

Book GreatExpectations

DavidCopperfield

(novel)

YagoForm

Author:

Title:

Submit

Publisher:

input andoutputschemamapping

27/32

Page 38: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Results

We experimented the approach on the Amazon book searchform.

The entity alignments with the best confidence were indeedbooks aligned through their title and author.

The system identified relations: y:hasPreferredName and(y:created, y:hasPreferredName).

It linked them to the result page DOM paths and form fields.

The support for join relations and approximate stringmatching is required in this setting.

The approach was presented as a vision paper in the VLDSworkshop of VLDB.

28/32

Page 39: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Table of Contents

1 Background: the Semantic Web

2 The Paris System

3 Performance Improvements

4 Join Relations

5 Theoretical Analysis

6 Approximate Literal Matching

7 Application to Information Extraction

8 Conclusion

29/32

Page 40: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Summary of Contributions

Performance improvements resulting in an 10-fold speedupover the original implementation.

Support of join relation alignments on small ontologies.

Insights on the relation between Paris andPageRank-inspired techniques.

Integration of approximate string matching to improve theliteral alignment.

Application of Paris for deep Web analysis.

30/32

Page 41: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Further Work

Performance. Further gains to be made, perform more completebenchmarks.

Join relations. Performance improvements, especially ways to onlyselect interesting joins. Arbitrary patterns?

Theory. Study the possible alternative choices and benchmarkthem. Understand the full model (we still have noproof of overall convergence!) and the effects ofimplementation tweaks. Find links with Max-SAT orMarkov Logic Networks?

Literal matching. Support of various datatypes such as numbersand dates (engineering work). Fix performance issuesto perform larger experiments.

Information Extraction. Try with more sources. Find links withnamed entity disambiguation techniques such asAIDA? Intensional use for large-scale integration.

31/32

Page 42: Mpri Internship Defense Advances in Holistic Ontology ... · Mpri Internship Defense Advances in Holistic Ontology Alignment Antoine Amarilli Supervised by Pierre Senellart T el ecom

Background Paris Performance Joins Theory Literals Application to IE Conclusion

Thanks!

Thanks for your attention!Questions ?

The research has been funded by the European Union’s seventh framework programme, in the setting of the EuropeanResearch Council grant Webdam, agreement 226513, and the FP7 grant ARCOMEM, agreement 270239.

Frame 4: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

32/32