towards transfer learning of link specifications


ICSC 2013 ( presentation on transfer learning


Towards Transfer Learning of Link Speci�cations

Axel-Cyrille Ngonga Ngomo Jens Lehmann Mofeed Hassan


1 Motivation

2 Transfer Learning Framework

3 Experimental Setup

4 Results

5 Conclusions and Future Work

Why Link Discovery?

1 Fourth Linked Dataprinciple

2 Links are central for

Cross-ontology QAData IntegrationReasoningFederated Queries...

3 2011 topology of theLOD Cloud:

31+ billion triples≈ 0.5 billion linksowl:sameAs in mostcases

Why is it di�cult?

De�nition (Link Discovery)

Given sets S and T of resources and relation RTask: Find M = {(s, t) ∈ S × T : R(s, t)}Common approaches:

Find M ′ = {(s, t) ∈ S × T : σ(s, t) ≥ θ}Find M ′ = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

1 Time complexity

Large number of triplesQuadratic a-priori runtime69 days for mapping cities fromDBpedia to Geonames (1ms percomparison)Decades for linking DBpedia and LGD. . .

Why is it di�cult?

De�nition (Link Discovery)

Given sets S and T of resources and relation RTask: Find M = {(s, t) ∈ S × T : R(s, t)}Common approaches:

Find M ′ = {(s, t) ∈ S × T : σ(s, t) ≥ θ}Find M ′ = {(s, t) ∈ S × T : δ(s, t) ≤ θ}

1 Time complexity

Large number of triplesQuadratic a-priori runtime69 days for mapping cities fromDBpedia to Geonames (1ms percomparison)Decades for linking DBpedia and LGD. . .

Why is it di�cult?

2 Complexity of speci�cations

Combination of several attributes required for high precisionTedious discovery of most adequate mappingDataset-dependent similarity functions

LIMES Framework

Link Speci�cation

Detection of accurate link speci�cation is keyLink Speci�cations has three components:

Two sets of restrictions RS1... RS

m resp. RT1... RT

kthat specify the

sets S resp. T ,A speci�cation of a complex similarity metric σ via the combination ofseveral atomic similarity measures σ1, ..., σn andA set of thresholds τ1, ..., τn such that τi is the threshold for σi .

Transfer Learning

Different Linking Tasks

Classical Learning of Link Specs Transfer Learning of Link Specs

Learning System Learning SystemLearning System

Current Linking Task

Transfer Learning System

spec accuracy: α

Task Repository

class similarity: ζproperty similarity: π

In our approach we use Transductive Transfer Learning

Class and property matching is assumed to be known already(numerous approaches from ontology matching can be employed) -the goal is to �nd the complex similarity metric

Transfer Learning Framework I

Transfer Learning of link speci�cations is reduce to three subproblems:

Restrictions/class similarity ζ : 2C × 2C 7→ [0, 1]e.g. ζ({City ,Village}, {Town}) = 0.6

Property similarity: ξ : 2P × 2P 7→ [0, 1]e.g. ξ({rdfs : label}, {rdfs : label}) = 1.0

Accuracy of link speci�cations: α : Q 7→ [0, 1]

Transfer Learning Framework II

Overall similarity measure for transfer learning:ω(t, t ′) = α(q′) · ζ(ψ(q′), C) · ζ(ψ′(q′), C′) · ξ(sp(q′),PL) · ξ(tp(q′),P ′

L)(details in paper)

Each similarity measure can be implemented in manifold approaches

Implementations of class similarity function ζ in framework:

label-based similarityname-based similarity (URI similarity)data-centric similarity

Properties similarities ξ are de�ned analogously

Similarities between single classes/properties can be extended to sets(e.g. using arithmetic / geometric mean of max. similarity)

Spec can be transferred by replacing properties with most similarproperties in PL and P ′


Transfer Learning Framework II

Overall similarity measure for transfer learning:ω(t, t ′) = α(q′) · ζ(ψ(q′), C) · ζ(ψ′(q′), C′) · ξ(sp(q′),PL) · ξ(tp(q′),P ′

L)(details in paper)

Each similarity measure can be implemented in manifold approaches

Implementations of class similarity function ζ in framework:

label-based similarityname-based similarity (URI similarity)data-centric similarity

Properties similarities ξ are de�ned analogously

Similarities between single classes/properties can be extended to sets(e.g. using arithmetic / geometric mean of max. similarity)

Spec can be transferred by replacing properties with most similarproperties in PL and P ′


Transfer Learning Framework II

Overall similarity measure for transfer learning:ω(t, t ′) = α(q′) · ζ(ψ(q′), C) · ζ(ψ′(q′), C′) · ξ(sp(q′),PL) · ξ(tp(q′),P ′

L)(details in paper)

Each similarity measure can be implemented in manifold approaches

Implementations of class similarity function ζ in framework:

label-based similarityname-based similarity (URI similarity)data-centric similarity

Properties similarities ξ are de�ned analogously

Similarities between single classes/properties can be extended to sets(e.g. using arithmetic / geometric mean of max. similarity)

Spec can be transferred by replacing properties with most similarproperties in PL and P ′


Transfer Learning Framework II

Overall similarity measure for transfer learning:ω(t, t ′) = α(q′) · ζ(ψ(q′), C) · ζ(ψ′(q′), C′) · ξ(sp(q′),PL) · ξ(tp(q′),P ′

L)(details in paper)

Each similarity measure can be implemented in manifold approaches

Implementations of class similarity function ζ in framework:

label-based similarityname-based similarity (URI similarity)data-centric similarity

Properties similarities ξ are de�ned analogously

Similarities between single classes/properties can be extended to sets(e.g. using arithmetic / geometric mean of max. similarity)

Spec can be transferred by replacing properties with most similarproperties in PL and P ′


Example (New Link Task)

Example link speci�cation for mapping drugs in two datasets DBpedia andDrugbank (DBpedia-Drugbank.xml):

Example (Restriction part)

Three parts of link specs:

Restrictions part

Example (Properties Part)

Three parts of link specs:

Restrictions partProperties part

Example (Similarities Measures Part)

Three parts of link specs:

Restrictions part

Properties part

Similarity Measures part: similarity metric and thresholds

Example (Link Repository)

Transfer learning is applied using a repository → restrictions and relevantproperties are assumed to be known → �nd the similarity measure bycomparing with all specs in the repository, e.g. DBpedia-SiderDrugs.xml

Example (Restriction Similarities)

Restrictions in both speci�cations �les

Type DBpedia-Drugbank.xml DBpedia-SiderDrugs.xml

Source rdf:type dbpedia-owl:Drug rdf:type dbpedia-owl:DrugTarget rdf:type drug:drugs rdf:type sider:drugs

Straightforward label/URI similarityFor instance, trigram metric in URI similarity without pre�xes:

ζ({dbpedia-owl:Drug}, {dbpedia-owl:Drug}) = 1.0ζ({sider:drugs}, {drug:drugs}) = 1.0

Data-centric: ζd (s, s′) = 1




sim(x , y) where

P(s) = {x : s p x ∧ p rdf:type owl:DatatypeProperty}(extends similarity to instances)

Example (Restriction Similarities)

Restrictions in both speci�cations �les

Type DBpedia-Drugbank.xml DBpedia-SiderDrugs.xml

Source rdf:type dbpedia-owl:Drug rdf:type dbpedia-owl:DrugTarget rdf:type drug:drugs rdf:type sider:drugs

Straightforward label/URI similarityFor instance, trigram metric in URI similarity without pre�xes:

ζ({dbpedia-owl:Drug}, {dbpedia-owl:Drug}) = 1.0ζ({sider:drugs}, {drug:drugs}) = 1.0

Data-centric: ζd (s, s′) = 1




sim(x , y) where

P(s) = {x : s p x ∧ p rdf:type owl:DatatypeProperty}(extends similarity to instances)

Example (Property Similarities)

type DBpedia-Drugbank.xml DBpedia-SiderDrugs.xml

Source rdfs:label rdfs:labelfoaf:name

Target rdfs:label rdfs:labeldrug:genericName

Applying similarity function to all properties:For instance trigram based on URIs and arithmetic mean asaggregation:ξ({rdfs : label}, {rdfs : label , foaf : name}) = 0.9ξ({rdfs : label , drug : genericName}, {rdfs : label}) = 0.8

Example (Overall Similarity)

Based on, e.g. F-score assign quality value to q′ =DBpedia-SiderDrugs.xml, in our case α(q′) = 0.89

The �nal step is calculating the overall similarity measureω(DBpedia − Drugbank .xml ,DBpedia − SiderDrugs.xml) =

0.89 * 1.0 * 1.0 * 0.9 * 0.8 = 0.64

The steps are repeated for all link speci�cations in the repository

Most similar link spec can be transferred by replacing its propertieswith the most similar ones in the computed property matching

Experimental Setup I

The goal of evaluation is two-fold:

Evaluating whether transfer learning can be used to build templatesfor link spec

Discover whether the transferred templates can be used directly

113 speci�cations were retrieved from LATC, each has manual linksevaluation

10% 2%



1% 3%









Experimental Setup II

Leave-one-out evaluation

1.) Compare top-scored speci�cation (most similar) and checkwhether it uses the same combination of similarity functions � assign 1for match and 0 for no match

2.) Compute F-measure of learned link specs directly � works only onspecs with both endpoints alive (only 12 out of 113)

Used URI similarity

First Experiments Set Results

Detecting right speci�cation in 81% of all cases

In geo-spatial domain 91%

In persons domain 58%










ations Mis












Second Experiments Set Results

In the second Experiments series, source and target endpoints need tobe alive such that we can execute transferred link spec (12 out of 113)

In general low F-measures




















Detecting right template in 81% of all cases

Transfer learning cannot replace the learning of thresholds inspeci�cations

Future Work:

Combination with machine-learning approaches for link speci�cations(e.g., EAGLE, COALA), in particular for learning thresholds

More sophisticated class and property similarity approaches

The End

Jens [email protected]/Uni Leipzig

Questions GeoKnow

