rdf data clustering
DESCRIPTION
TRANSCRIPT
Towards a uni�ed framework for distributed data managementacross the Semantic Web
Silvia Giannini(Supervisor: Prof. Eugenio Di Sciascio)
Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI),Politecnico di Bari, Bari, Italy
8th ICCL Summer School Workshop (ICCL 2013)Semantic Web - Ontology Languages and Their Use
Dresden, Germany | 26 August, 2013
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clusteringMotivationsState of Art
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
The Linking Open Data (LOD) project
A global Uniform Resource Identi�er for each entity on the web (URIs)
A standardized access mechanism (HTTP URIs)
A machine-readable, open and standardized data format (RDF)
A mechanism for linking di�erent data sources (RDF-links)Relationship LinksIdentity LinksVocabulary Links
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
The Linking Open Data (LOD) project
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF: the big picture
DBpedia1 extract
dbpedia:Dresden
dbpedia-owl:country
328.8
dbpedia-owl:areaTotal
dbpedia:Germany
Graph-structured knowledge representation (data-model)
Resource: concrete or abstract entity of the real world, identi�ed bydereferenceable URIDescription: representation of properties or relationships among resourcesFramework: combination of web based protocols and formal semantics
Facts in Triple-form: subject - predicate - object<http://dbpedia.org/resource/Dresden> <http://dbpedia.org/property/country>
<http://dbpedia.org/resource/Germany>.
1http://dbpedia.org
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF: the big picture
DBpedia extract
dbpedia:Dresden
dbpedia-owl:country
328.8
dbpedia-owl:areaTotal
rdf:type rdf:type
rdf:type
rdfs:rangerdfs:domain
dbpedia-owl:country
RDF data model
RDF Schema
dbpedia:Germany
dbpedia-owl:PopulatedPlace dbpedia-owl:Country
owl:ObjectProperty
RDF Schema: Explicit semantics of content and links
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clusteringMotivationsState of Art
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Motivations
RDF Data Management Challenges
LOD cloud statistic: >31 billions facts, >500 million links, at October 2011
How to e�ciently:
Develop services on the top of the RDF data-model forbrowsing data;query answering;supporting expressive search (approximate matching);
Speed up data access and query response times over distributed machines
CLUSTERING
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Motivations
Contributions
Clustering semantic web resources (RDF graphs)
Discovering homogeneous groups of resources
Summarizing the original graph content in a meaningful way
Revealing possible hierachies of clusters
Identi�ng a concept description or discriminating features for each cluster
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarityand large inter-cluster dissimilarity
Data clustering methods
pairwise distance metricagglomerativepartitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarityand large inter-cluster dissimilarity
Data clustering methods
pairwise distance metricagglomerativepartitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarityand large inter-cluster dissimilarity
Data clustering methods
pairwise distance metricagglomerativepartitional (K-Means)
- Number or size of clusters to be set
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: data-based approach
A set of resources with large intra-cluster similarityand large inter-cluster dissimilarity
Data clustering methods
pairwise distance metricagglomerativepartitional (K-Means)
- Number or size of clusters to be set
RDF data-model not suited for traditional data-clustering techniquesapplication over real-life RDF datasets!
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
What is a cluster: graph-based approach
A set of resources with large intra-cluster similarityand large inter-cluster dissimilarity
Graph clustering methods
vertex connectivity
neighborhood similarity
spectral analysis of the adjacency matrix
- Number or size of clusters to be sethttp://sydney.edu.au/engineering/it/~shhong/img/cluster1.png
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extractionSubgraph relevant for a resource representation (DESCRIBE SPARQL2-query)
1 Immediate Properties+ simple, quick- loss of information
2 Concise Bounded Description (CBD)+ better body of knowledge- domain dependent (use of blanknodes)
3 Depth Limited Crawling+ stable over input data with well
limiting subgraph- �nd a tradeo� between size andinformation content (datadependent)
G.A. Grimnes, P. Edwards, and A. Preece. "Instance based clustering of semantic web resources." The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extractionSubgraph relevant for a resource representation (DESCRIBE SPARQL2-query)
1 Immediate Properties+ simple, quick- loss of information
2 Concise Bounded Description (CBD)+ better body of knowledge- domain dependent (use of blanknodes)
3 Depth Limited Crawling+ stable over input data with well
limiting subgraph- �nd a tradeo� between size andinformation content (datadependent)
G.A. Grimnes, P. Edwards, and A. Preece. "Instance based clustering of semantic web resources." The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instance extractionSubgraph relevant for a resource representation (DESCRIBE SPARQL2-query)
1 Immediate Properties+ simple, quick- loss of information
2 Concise Bounded Description (CBD)+ better body of knowledge- domain dependent (use of blanknodes)
3 Depth Limited Crawling+ stable over input data with well
limiting subgraph- �nd a tradeo� between size andinformation content (datadependent)
G.A. Grimnes, P. Edwards, and A. Preece. "Instance based clustering of semantic web resources." The
Semantic Web: Research and Applications. Springer Berlin Heidelberg, 2008. 303-317.
2http://www.w3.org/TR/rdf-sparql-query/
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
State of Art
RDF clustering: literature
Instances distance computation
Comparing two RDF graphs with the resources as root nodes
1 feature-vector basedmappings: (feature → shortest path; value → set of reachable nodes)similarity measure: e.g., Dice coe�cient
2 graph basedconceptual similarity : overlapping of nodesrelational similarity : overlapping of edges
3 ontology based3 (well de�ned ontology and conforming instance data)taxonomy similarity : semantic distance between metadata in a concepthierarchyrelation similarity : similarity of the instances related to the two consideredresourcesattribute similarity : similarity of attribute values (numeric, literal, etc.)
Determine the appropriate number of clusters
3A. Maedche, and V. Zacharias. "Clustering ontology-based metadata in the semanticweb." Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2002.348-360.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Requirements
Ideal clustering of graph-structured data:
cohesive intra-cluster structure
homogeneous intra-cluster properties
Parameter free algorithm:
number and size of partitions extracted from data
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
How does community detection algorithms behave over RDF(S) graphs?
Community Discovery Algorithms
Graph mining techniques for extracting knowledge from large graphs
Exploit native graph features (topology) of the RDF model
Why:If two sets of entities are strongly related, they exhibit more connectionsthan other sets of entities
Bene�ts:+ Automatically discover the number and size of modules
+ Can handle uncertainty in clustering (overlapping communities)
+ Faster than data-clustering inspired techniques (no instances extraction)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
What is a community
A subgraph of a network whose nodes are more tightly connected with each
other than with nodes outside the subgraph.
Similarity : cohesion degree of subsets of vertices
- No overlapping capabilitiesC = {C1, . . . , Cn}, Ci ∩ Cj = ∅ ∀i, j ∈ {1, . . . , n}, i 6= j
In labeled graphs (like RDF graphs), each link models only one speci�c relation
Overlapping Communities Analysis
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
From Node to Link Perspective
Community : A set of nodes with more external than internal connections, i.e.,a set of closely interrelated links.
Bene�ts:
+ Captures multiple memberships between nodes
+ Uni�es hierarchical and overlapping clustering
It is always possible to move from a link partition P = {P1, . . . , Pm},Pi ∩ Pj = ∅ ∀i, j ∈ {1, . . . ,m}, i 6= j to m nodes clusters, with possibleoverlapping.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Datasets
SP2Bench4: A SPARQL Performance Benchmark
data generator for arbitrarily large DBLP-like RDF documents creation
mirrors key characteristics and social-world distributions of original DBLPdataset
publicy available
4M. Schmidt, et al. "SP2Bench: SPARQL performance benchmark." Semantic WebInformation Management. Springer Berlin Heidelberg, 2010. 371-393.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Node communities
SP2Bench: 720 triples
Paul_ErdoesPaul_Erdoes
ArticleArticle
PersonPerson
ArticleArticle
Paul_ErdoesPaul_Erdoes
PersonPerson
V.D. Blondel, et al. "Fast unfolding of communitiesin large networks." Journal of Statistical Mechanics:Theory and Experiment 2008.10 (2008): P10008.
Tool: Gephi (https://gephi.org)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Link Communities
Given an undirected graph G = (V, E), the set of neighbors of node i isNi = {j ∈ V|eij ∈ E}.
Similarity5: S(eik, ejk) =|Ni∩Nj ||Ni∪Nj |
Link Dendrogram: hierarchical agglomerative algorithm
Optimization of Partition density : cut level optimizes link density insidecommunities
DP = 2M
∑c mc
mc−(nc−1)(nc−2)(nc−1)
,
5Y.Y. Ahn, J.P. Bagrow, and S. Lehmann. "Link communities reveal multiscale complexityin networks." Nature 466.7307 (2010): 761-764.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering6
Article1
_:x1dc:creator
Adamanta Schlitt
foaf:name
dc:title
richer dwelling scrapped
swrc:pages140
_:x1
_:x2
_:x3
foaf:Person
rdf:type
rdf:type
rdf:type
rdf:type
rdf:type
swrc:journal
swrc:journal
rdf:type
rdf:type
swrc:journal
dc:creator
dc:creator
dc:creator
SIGNATURE: <subject> SIGNATURE: (<predicate>, <object>) SIGNATURE: {(<predicate_1>, <object_1>), ... (<predicate_n>, <object_n>)}
Different background colours reveal the hierarchy of clusters
REPLICATED NODES REVEALING OVERLAPPING CLUSTERS
LINKS BELONGING TO OTHER CLUSTERS
rdf:type
Article20
Article13
Paul_Erdoes
swrc:journalswrc:journal
Article3
Article2
Article1
Journal1
bench:Article
TYPE 1. CLUSTER (a) TYPE 2. CLUSTER (b) TYPE 3. CLUSTER (c)
6S. Giannini, "RDF Data Clustering." Springer Berlin Heidelberg, 2013. BIS 2013Workshop, LNBIP 160: 220�231.
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (�xed subject)
Cluster of type 2.
Aggregation of resources (�xed predicate - �xed object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (�xed subject)
ex:Article15 swrc:pages 139ex:Article15 dc:title equalled bewitchment cheatersex:Article15 dc:creator ex:node17r3ptqpmx16ex:Article15 rdfs:seeAlso http://www.skeins.tld/sandwiching/bewitchment.htmlex:Article15 foaf:homepage http://www.sandwiching.tld/cheaters/ri�ed.html
Cluster of type 2.
Aggregation of resources (predicate - object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (�xed subject)
Cluster of type 2.
Aggregation of resources (�xed predicate - �xed object)
ex:Article9 swrc:journal http://localhost/publications/journals/Journal1/1945ex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1945ex:Article7 swrc:journal http://localhost/publications/journals/Journal1/1945ex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1945ex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1945ex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1945ex:Article10 swrc:journal http://localhost/publications/journals/Journal1/1945
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
RDF clustering
Cluster of type 1.
Instance extraction (�xed subject)
Cluster of type 2.
Aggregation of resources (�xed predicate - �xed object)
Mixed-type clusters
Set of clusters of type 1. (or equivalently, of type 2.)
ex:Article8 dc:creator http://localhost/persons/Paul_Erdoesex:Article8 rdf:type http://localhost/vocabulary/bench/Articleex:Article8 swrc:journal http://localhost/publications/journals/Journal1/1942ex:Article5 dc:creator http://localhost/persons/Paul_Erdoesex:Article5 rdf:type http://localhost/vocabulary/bench/Articleex:Article5 swrc:journal http://localhost/publications/journals/Journal1/1942ex:Article4 dc:creator http://localhost/persons/Paul_Erdoesex:Article4 rdf:type http://localhost/vocabulary/bench/Articleex:Article4 swrc:journal http://localhost/publications/journals/Journal1/1942ex:Article3 dc:creator http://localhost/persons/Paul_Erdoesex:Article3 rdf:type http://localhost/vocabulary/bench/Articleex:Article3 swrc:journal http://localhost/publications/journals/Journal1/1942ex:Article2 dc:creator http://localhost/persons/Paul_Erdoesex:Article2 rdf:type http://localhost/vocabulary/bench/Articleex:Article2 swrc:journal http://localhost/publications/journals/Journal1/1942ex:Article1 dc:creator http://localhost/persons/Paul_Erdoesex:Article1 rdf:type http://localhost/vocabulary/bench/Articleex:Article1 swrc:journal http://localhost/publications/journals/Journal1/1942
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Advantages and Emerging issues
Tests over 266, 720, and 5362 triples datasets
Number of obtained clusters: 53, 277, 3437
+ Good behaviour in presence of blank nodes
http://localhost/vocabulary/bench/PhDThesis rdfs:subClassOf foaf:Documenthttp://localhost/vocabulary/bench/Www rdfs:subClassOf foaf:Documenthttp://localhost/vocabulary/bench/Book rdfs:subClassOf foaf:Document_:node17rocfnblx296 rdf:_3 misc:UnknownDocument_c_:node17rocfnblx296 rdf:_2 misc:UnknownDocument_b_:node17rocfnblx296 rdf:_1 misc:UnknownDocument_amisc:UnknownDocument_c rdf:type foaf:Documentmisc:UnknownDocument_b rdf:type foaf:Documentmisc:UnknownDocument_a rdf:type foaf:Documenthttp://localhost/vocabulary/bench/MastersThesis rdfs:subClassOf foaf:Document
- A post-processing phase is needed (links replication)
If Paul Erdoes is a Person included in a type 2. cluster with signature (rdf:type -
pre�x:Person), this property will not appear in the cluster of type 1. describing the
resource Paul_Erdoes
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Outline
1 The scenario
2 RDF clustering
3 Proposal
4 Preliminary Results
5 Conclusions
Silvia Giannini RDF data clustering
The scenario RDF clustering Proposal Preliminary Results Conclusions
Conclusions and Future Works
Community detection algorithms are a promising candidate for:
semantic web resources clustering
instances extraction from RDF graphs
Ongoing and future works:
A more comprehensive experimental evaluation on di�erent datasets
Analysis of cut threshold
Better de�nition of post-processing phase
Comparison with existing approaches
Combination of (1) graph clustering techniques, and (2) reasoning services1 Identify communities of closely related resources2 Extract a semantic description of them
Experimentation of "property-driven" clustering
Dynamics and evolution of clusters
Silvia Giannini RDF data clustering